2004-11-24 13:00:29

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge

Hi everyone.

I know that I still have work to do on suspend2, but thought it was high
time I got around to properly submitting the code for review, so here
goes.

I have it split up into 51 patches, of which most are less than 20k,
although there are three 50k patches. Changes to the rest of the kernel
tree come first, then the core. The full tree can be found at

http://suspend2.bkbits.net:8080/merge-tree

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6


2004-11-24 13:01:52

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 7/51: Reboot handler hook.

Nice and simple.

We override swsusp's hook with the suspend2 one. It's not that I want to
step on Pavel's toes. Rather, people who go to the trouble of applying
suspend2, probably want to use it :>


diff -ruN 300-reboot-handler-hook-old/kernel/sys.c 300-reboot-handler-hook-new/kernel/sys.c
--- 300-reboot-handler-hook-old/kernel/sys.c 2004-11-03 21:51:17.000000000 +1100
+++ 300-reboot-handler-hook-new/kernel/sys.c 2004-11-06 09:23:26.887002384 +1100
@@ -502,10 +502,14 @@
machine_restart(buffer);
break;

-#ifdef CONFIG_SOFTWARE_SUSPEND
+#ifdef CONFIG_SOFTWARE_SUSPEND2
case LINUX_REBOOT_CMD_SW_SUSPEND:
{
- int ret = software_suspend();
+ int ret = -EINVAL;
+ if (!(test_suspend_state(SUSPEND_DISABLED))) {
+ suspend_try_suspend();
+ ret = 0;
+ }
unlock_kernel();
return ret;
}


2004-11-24 13:04:32

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 10/51: Exports for suspend built as modules.

New exports for suspend. I've cut them down some as a result of the last
review, but could perhaps do more? Would people prefer to see a single
struct wrapping exported functions?

The sys_ functions are exported because a while ago, people suggested I
use /dev/console to output text that doesn't need to be logged, and I
also use /dev/splash for the bootsplash support. These functions were
needed in order to get access to those files when we're resuming, and
the ioctl for setting up writing text (canon). I'd prefer to use
internal routines, but I suppose this way I get the text display on the
serial console too :>

Avenrun I'd gladly drop, but apparently sendmail has some ugliness where
it won't deliver your mail if the load average gets too high. Guess what
suspending to disk does to your load average? (So we save the avenrun
values at the start of the cycle and restore them at the end - anyone
got a better idea? I'd love to do something different here).

diff -ruN 400-exports-old/drivers/acpi/hardware/hwsleep.c 400-exports-new/drivers/acpi/hardware/hwsleep.c
--- 400-exports-old/drivers/acpi/hardware/hwsleep.c 2004-11-03 21:51:15.000000000 +1100
+++ 400-exports-new/drivers/acpi/hardware/hwsleep.c 2004-11-04 16:27:40.000000000 +1100
@@ -43,6 +43,7 @@
*/

#include <acpi/acpi.h>
+#include <linux/module.h>

#define _COMPONENT ACPI_HARDWARE
ACPI_MODULE_NAME ("hwsleep")
@@ -591,3 +592,6 @@

return_ACPI_STATUS (status);
}
+
+/* For suspend2 */
+EXPORT_SYMBOL(acpi_leave_sleep_state);
diff -ruN 400-exports-old/fs/bio.c 400-exports-new/fs/bio.c
--- 400-exports-old/fs/bio.c 2004-11-03 21:53:50.000000000 +1100
+++ 400-exports-new/fs/bio.c 2004-11-04 16:27:40.000000000 +1100
@@ -1002,3 +1002,4 @@
EXPORT_SYMBOL(bio_split_pool);
EXPORT_SYMBOL(bio_copy_user);
EXPORT_SYMBOL(bio_uncopy_user);
+EXPORT_SYMBOL(bio_set_pages_dirty);
diff -ruN 400-exports-old/fs/ioctl.c 400-exports-new/fs/ioctl.c
--- 400-exports-old/fs/ioctl.c 2004-11-03 21:51:52.000000000 +1100
+++ 400-exports-new/fs/ioctl.c 2004-11-04 16:27:40.000000000 +1100
@@ -138,8 +138,7 @@

/*
* Platforms implementing 32 bit compatibility ioctl handlers in
- * modules need this exported
+ * modules need this exported. So does Suspend2 (when made as
+ * modules), so the export_symbol is now unconditional.
*/
-#ifdef CONFIG_COMPAT
EXPORT_SYMBOL(sys_ioctl);
-#endif
diff -ruN 400-exports-old/fs/namei.c 400-exports-new/fs/namei.c
--- 400-exports-old/fs/namei.c 2004-11-03 21:53:11.000000000 +1100
+++ 400-exports-new/fs/namei.c 2004-11-04 16:27:40.000000000 +1100
@@ -1649,6 +1649,8 @@
return error;
}

+EXPORT_SYMBOL(sys_mkdir);
+
/*
* We try to drop the dentry early: we should have
* a usage count of 2 if we're the only user of this
diff -ruN 400-exports-old/fs/namespace.c 400-exports-new/fs/namespace.c
--- 400-exports-old/fs/namespace.c 2004-11-03 21:54:15.000000000 +1100
+++ 400-exports-new/fs/namespace.c 2004-11-04 16:27:40.000000000 +1100
@@ -490,6 +490,8 @@
return retval;
}

+EXPORT_SYMBOL(sys_umount);
+
#ifdef __ARCH_WANT_SYS_OLDUMOUNT

/*
@@ -1187,6 +1189,8 @@
return retval;
}

+EXPORT_SYMBOL(sys_mount);
+
/*
* Replace the fs->{rootmnt,root} with {mnt,dentry}. Put the old values.
* It can block. Requires the big lock held.
diff -ruN 400-exports-old/fs/proc/generic.c 400-exports-new/fs/proc/generic.c
--- 400-exports-old/fs/proc/generic.c 2004-11-03 21:55:03.000000000 +1100
+++ 400-exports-new/fs/proc/generic.c 2004-11-04 16:27:40.000000000 +1100
@@ -698,3 +698,5 @@
out:
return;
}
+
+EXPORT_SYMBOL(proc_match);
diff -ruN 400-exports-old/fs/read_write.c 400-exports-new/fs/read_write.c
--- 400-exports-old/fs/read_write.c 2004-11-03 21:54:14.000000000 +1100
+++ 400-exports-new/fs/read_write.c 2004-11-04 16:27:40.000000000 +1100
@@ -314,6 +314,7 @@

return ret;
}
+EXPORT_SYMBOL(sys_write);

asmlinkage ssize_t sys_pread64(unsigned int fd, char __user *buf,
size_t count, loff_t pos)
diff -ruN 400-exports-old/kernel/power/main.c 400-exports-new/kernel/power/main.c
--- 400-exports-old/kernel/power/main.c 2004-11-03 21:52:25.000000000 +1100
+++ 400-exports-new/kernel/power/main.c 2004-11-06 09:23:56.755461688 +1100
@@ -15,7 +15,7 @@
#include <linux/errno.h>
#include <linux/init.h>
#include <linux/pm.h>
-
+#include <linux/module.h>

#include "power.h"

@@ -262,3 +262,7 @@
}

core_initcall(pm_init);
+
+/* For Suspend2 ACPI support */
+EXPORT_SYMBOL(pm_ops);
+EXPORT_SYMBOL(pm_disk_mode);
diff -ruN 400-exports-old/kernel/sched.c 400-exports-new/kernel/sched.c
--- 400-exports-old/kernel/sched.c 2004-11-06 09:23:53.364977120 +1100
+++ 400-exports-new/kernel/sched.c 2004-11-06 09:23:56.627481144 +1100
@@ -3798,6 +3798,7 @@

read_unlock(&tasklist_lock);
}
+EXPORT_SYMBOL(show_state);

void __devinit init_idle(task_t *idle, int cpu)
{
diff -ruN 400-exports-old/kernel/sys.c 400-exports-new/kernel/sys.c
--- 400-exports-old/kernel/sys.c 2004-11-06 09:23:53.374975600 +1100
+++ 400-exports-new/kernel/sys.c 2004-11-05 21:36:13.000000000 +1100
@@ -523,6 +523,8 @@
return 0;
}

+EXPORT_SYMBOL(sys_reboot);
+
static void deferred_cad(void *dummy)
{
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
diff -ruN 400-exports-old/kernel/timer.c 400-exports-new/kernel/timer.c
--- 400-exports-old/kernel/timer.c 2004-11-03 21:54:44.000000000 +1100
+++ 400-exports-new/kernel/timer.c 2004-11-04 16:27:40.000000000 +1100
@@ -881,6 +881,7 @@
* Requires xtime_lock to access.
*/
unsigned long avenrun[3];
+EXPORT_SYMBOL(avenrun);

/*
* calc_load - given tick count, update the avenrun load estimates.
diff -ruN 400-exports-old/mm/page_alloc.c 400-exports-new/mm/page_alloc.c
--- 400-exports-old/mm/page_alloc.c 2004-11-03 21:51:15.000000000 +1100
+++ 400-exports-new/mm/page_alloc.c 2004-11-06 09:23:56.786456976 +1100
@@ -2069,3 +2069,7 @@

return table;
}
+
+/* Exported for Software Suspend 2 */
+EXPORT_SYMBOL(nr_free_highpages);
+EXPORT_SYMBOL(pgdat_list);
diff -ruN 400-exports-old/mm/swapfile.c 400-exports-new/mm/swapfile.c
--- 400-exports-old/mm/swapfile.c 2004-11-06 09:23:53.188004024 +1100
+++ 400-exports-new/mm/swapfile.c 2004-11-06 09:23:56.639479320 +1100
@@ -1710,3 +1710,13 @@
swap_device_unlock(swapdev);
return ret;
}
+
+/* Functions exported for Software Suspend's swapwriter */
+EXPORT_SYMBOL(swap_free);
+EXPORT_SYMBOL(swap_info);
+EXPORT_SYMBOL(sys_swapoff);
+EXPORT_SYMBOL(sys_swapon);
+EXPORT_SYMBOL(si_swapinfo);
+EXPORT_SYMBOL(map_swap_page);
+EXPORT_SYMBOL(get_swap_page);
+EXPORT_SYMBOL(get_swap_info_struct);
diff -ruN 400-exports-old/mm/vmscan.c 400-exports-new/mm/vmscan.c
--- 400-exports-old/mm/vmscan.c 2004-11-06 09:23:53.273990952 +1100
+++ 400-exports-new/mm/vmscan.c 2004-11-06 09:23:56.762460624 +1100
@@ -1221,6 +1221,9 @@
current->reclaim_state = NULL;
return ret;
}
+
+/* For Suspend2 */
+EXPORT_SYMBOL(shrink_all_memory);
#endif

#ifdef CONFIG_HOTPLUG_CPU


2004-11-24 13:05:04

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 8/51: /proc/acpi/sleep hook.

Same thing as the previous patch, but for /proc/acpi/sleep.

diff -ruN 301-proc-acpi-sleep-activate-hook-old/drivers/acpi/sleep/proc.c 301-proc-acpi-sleep-activate-hook-new/drivers/acpi/sleep/proc.c
--- 301-proc-acpi-sleep-activate-hook-old/drivers/acpi/sleep/proc.c 2004-11-03 21:54:15.000000000 +1100
+++ 301-proc-acpi-sleep-activate-hook-new/drivers/acpi/sleep/proc.c 2004-11-05 21:35:42.000000000 +1100
@@ -68,6 +68,17 @@
goto Done;
}
state = simple_strtoul(str, NULL, 0);
+#ifdef CONFIG_SOFTWARE_SUSPEND2
+ /*
+ * I used to put this after the CONFIG_SOFTWARE_SUSPEND
+ * test, but people who compile in suspend2 usually want
+ * to use it instead of swsusp. --NC
+ */
+ if (state == 4) {
+ suspend_try_suspend();
+ goto Done;
+ }
+#endif
#ifdef CONFIG_SOFTWARE_SUSPEND
if (state == 4) {
software_suspend();


2004-11-24 13:10:54

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 17/51: Disable MCE checking during suspend.

Avoid a potential SMP deadlock here.

diff -ruN 506-disable-mce-checking-during-suspend-avoid-smp-deadlock-old/arch/i386/kernel/cpu/mcheck/non-fatal.c 506-disable-mce-checking-during-suspend-avoid-smp-deadlock-new/arch/i386/kernel/cpu/mcheck/non-fatal.c
--- 506-disable-mce-checking-during-suspend-avoid-smp-deadlock-old/arch/i386/kernel/cpu/mcheck/non-fatal.c 2004-11-03 21:51:31.000000000 +1100
+++ 506-disable-mce-checking-during-suspend-avoid-smp-deadlock-new/arch/i386/kernel/cpu/mcheck/non-fatal.c 2004-11-04 16:27:40.000000000 +1100
@@ -17,6 +17,7 @@
#include <linux/interrupt.h>
#include <linux/smp.h>
#include <linux/module.h>
+#include <linux/suspend.h>

#include <asm/processor.h>
#include <asm/system.h>
@@ -57,7 +58,8 @@

static void mce_work_fn(void *data)
{
- on_each_cpu(mce_checkregs, NULL, 1, 1);
+ if (!test_suspend_state(SUSPEND_RUNNING))
+ on_each_cpu(mce_checkregs, NULL, 1, 1);
schedule_delayed_work(&mce_work, MCE_RATE);
}



2004-11-24 13:10:55

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 5/51: Workthread freezer support.

This thread adds freezer support for workthreads.

A new parameter in the create_ functions allows the thread to be marked
as PF_NOFREEZE. This should only be used for threads that may need to
run during writing the image.

In a later patch, you'll see a new SYNCTHREAD flag, used for processes
that need to run while we're syncing data to disk, but not for writing
the image. That isn't used here because all kernel threads are frozen
after we've synced, so it's irrelevant to them.


diff -ruN 210-workthreads-old/drivers/acpi/osl.c 210-workthreads-new/drivers/acpi/osl.c
--- 210-workthreads-old/drivers/acpi/osl.c 2004-11-03 21:54:44.000000000 +1100
+++ 210-workthreads-new/drivers/acpi/osl.c 2004-11-04 16:27:39.000000000 +1100
@@ -90,7 +90,7 @@
return AE_NULL_ENTRY;
}
#endif
- kacpid_wq = create_singlethread_workqueue("kacpid");
+ kacpid_wq = create_singlethread_workqueue("kacpid", 0);
BUG_ON(!kacpid_wq);

return AE_OK;
diff -ruN 210-workthreads-old/drivers/block/ll_rw_blk.c 210-workthreads-new/drivers/block/ll_rw_blk.c
--- 210-workthreads-old/drivers/block/ll_rw_blk.c 2004-11-03 21:51:17.000000000 +1100
+++ 210-workthreads-new/drivers/block/ll_rw_blk.c 2004-11-24 15:47:12.160377296 +1100
@@ -3021,7 +3021,7 @@

int __init blk_dev_init(void)
{
- kblockd_workqueue = create_workqueue("kblockd");
+ kblockd_workqueue = create_workqueue("kblockd", PF_NOFREEZE);
if (!kblockd_workqueue)
panic("Failed to create kblockd\n");

diff -ruN 210-workthreads-old/drivers/char/hvc_console.c 210-workthreads-new/drivers/char/hvc_console.c
--- 210-workthreads-old/drivers/char/hvc_console.c 2004-11-03 21:54:45.000000000 +1100
+++ 210-workthreads-new/drivers/char/hvc_console.c 2004-11-04 16:27:39.000000000 +1100
@@ -757,7 +757,7 @@

/* Always start the kthread because there can be hotplug vty adapters
* added later. */
- hvc_task = kthread_run(khvcd, NULL, "khvcd");
+ hvc_task = kthread_run(khvcd, NULL, PF_NOFREEZE, "khvcd");
if (IS_ERR(hvc_task)) {
panic("Couldn't create kthread for console.\n");
put_tty_driver(hvc_driver);
diff -ruN 210-workthreads-old/drivers/char/hvcs.c 210-workthreads-new/drivers/char/hvcs.c
--- 210-workthreads-old/drivers/char/hvcs.c 2004-11-03 21:55:05.000000000 +1100
+++ 210-workthreads-new/drivers/char/hvcs.c 2004-11-04 16:27:39.000000000 +1100
@@ -1420,7 +1420,7 @@
return -ENOMEM;
}

- hvcs_task = kthread_run(khvcsd, NULL, "khvcsd");
+ hvcs_task = kthread_run(khvcsd, NULL, PF_NOFREEZE, "khvcsd");
if (IS_ERR(hvcs_task)) {
printk(KERN_ERR "HVCS: khvcsd creation failed. Driver not loaded.\n");
kfree(hvcs_pi_buff);
diff -ruN 210-workthreads-old/drivers/macintosh/therm_adt746x.c 210-workthreads-new/drivers/macintosh/therm_adt746x.c
--- 210-workthreads-old/drivers/macintosh/therm_adt746x.c 2004-11-03 21:54:41.000000000 +1100
+++ 210-workthreads-new/drivers/macintosh/therm_adt746x.c 2004-11-04 16:27:39.000000000 +1100
@@ -394,7 +394,7 @@
write_both_fan_speed(th, -1);
}

- thread_therm = kthread_run(monitor_task, th, "kfand");
+ thread_therm = kthread_run(monitor_task, th, 0, "kfand");

if (thread_therm == ERR_PTR(-ENOMEM)) {
printk(KERN_INFO "adt746x: Kthread creation failed\n");
diff -ruN 210-workthreads-old/drivers/md/dm-crypt.c 210-workthreads-new/drivers/md/dm-crypt.c
--- 210-workthreads-old/drivers/md/dm-crypt.c 2004-11-03 21:51:10.000000000 +1100
+++ 210-workthreads-new/drivers/md/dm-crypt.c 2004-11-04 16:27:39.000000000 +1100
@@ -758,7 +758,7 @@
if (!_crypt_io_pool)
return -ENOMEM;

- _kcryptd_workqueue = create_workqueue("kcryptd");
+ _kcryptd_workqueue = create_workqueue("kcryptd", PF_NOFREEZE);
if (!_kcryptd_workqueue) {
r = -ENOMEM;
DMERR(PFX "couldn't create kcryptd");
diff -ruN 210-workthreads-old/drivers/md/dm-raid1.c 210-workthreads-new/drivers/md/dm-raid1.c
--- 210-workthreads-old/drivers/md/dm-raid1.c 2004-11-03 21:55:03.000000000 +1100
+++ 210-workthreads-new/drivers/md/dm-raid1.c 2004-11-04 16:27:39.000000000 +1100
@@ -1233,7 +1233,7 @@
if (r)
return r;

- _kmirrord_wq = create_workqueue("kmirrord");
+ _kmirrord_wq = create_workqueue("kmirrord", PF_SYNCTHREAD);
if (!_kmirrord_wq) {
DMERR("couldn't start kmirrord");
dm_dirty_log_exit();
diff -ruN 210-workthreads-old/drivers/md/kcopyd.c 210-workthreads-new/drivers/md/kcopyd.c
--- 210-workthreads-old/drivers/md/kcopyd.c 2004-11-03 21:51:14.000000000 +1100
+++ 210-workthreads-new/drivers/md/kcopyd.c 2004-11-04 16:27:39.000000000 +1100
@@ -609,7 +609,7 @@
return r;
}

- _kcopyd_wq = create_singlethread_workqueue("kcopyd");
+ _kcopyd_wq = create_singlethread_workqueue("kcopyd", PF_SYNCTHREAD);
if (!_kcopyd_wq) {
jobs_exit();
up(&kcopyd_init_lock);
diff -ruN 210-workthreads-old/drivers/message/i2o/driver.c 210-workthreads-new/drivers/message/i2o/driver.c
--- 210-workthreads-old/drivers/message/i2o/driver.c 2004-11-03 21:55:03.000000000 +1100
+++ 210-workthreads-new/drivers/message/i2o/driver.c 2004-11-04 16:27:39.000000000 +1100
@@ -80,7 +80,7 @@
pr_debug("Register driver %s\n", drv->name);

if (drv->event) {
- drv->event_queue = create_workqueue(drv->name);
+ drv->event_queue = create_workqueue(drv->name, 0);
if (!drv->event_queue) {
printk(KERN_ERR "i2o: Could not initialize event queue "
"for driver %s\n", drv->name);
diff -ruN 210-workthreads-old/drivers/net/wan/sdlamain.c 210-workthreads-new/drivers/net/wan/sdlamain.c
--- 210-workthreads-old/drivers/net/wan/sdlamain.c 2004-11-03 21:54:35.000000000 +1100
+++ 210-workthreads-new/drivers/net/wan/sdlamain.c 2004-11-04 16:27:39.000000000 +1100
@@ -240,7 +240,7 @@
printk(KERN_INFO "%s v%u.%u %s\n",
fullname, DRV_VERSION, DRV_RELEASE, copyright);

- wanpipe_wq = create_workqueue("wanpipe_wq");
+ wanpipe_wq = create_workqueue("wanpipe_wq", 0);
if (!wanpipe_wq)
return -ENOMEM;

diff -ruN 210-workthreads-old/drivers/s390/cio/device.c 210-workthreads-new/drivers/s390/cio/device.c
--- 210-workthreads-old/drivers/s390/cio/device.c 2004-11-03 21:54:36.000000000 +1100
+++ 210-workthreads-new/drivers/s390/cio/device.c 2004-11-04 16:27:39.000000000 +1100
@@ -151,15 +151,16 @@
init_waitqueue_head(&ccw_device_init_wq);
atomic_set(&ccw_device_init_count, 0);

- ccw_device_work = create_singlethread_workqueue("cio");
+ ccw_device_work = create_singlethread_workqueue("cio", 0);
if (!ccw_device_work)
return -ENOMEM; /* FIXME: better errno ? */
- ccw_device_notify_work = create_singlethread_workqueue("cio_notify");
+ ccw_device_notify_work = create_singlethread_workqueue("cio_notify",
+ 0);
if (!ccw_device_notify_work) {
ret = -ENOMEM; /* FIXME: better errno ? */
goto out_err;
}
- slow_path_wq = create_singlethread_workqueue("kslowcrw");
+ slow_path_wq = create_singlethread_workqueue("kslowcrw", 0);
if (!slow_path_wq) {
ret = -ENOMEM; /* FIXME: better errno ? */
goto out_err;
diff -ruN 210-workthreads-old/drivers/scsi/libata-core.c 210-workthreads-new/drivers/scsi/libata-core.c
--- 210-workthreads-old/drivers/scsi/libata-core.c 2004-11-03 21:51:09.000000000 +1100
+++ 210-workthreads-new/drivers/scsi/libata-core.c 2004-11-24 15:47:12.086388544 +1100
@@ -3591,7 +3591,7 @@

static int __init ata_init(void)
{
- ata_wq = create_workqueue("ata");
+ ata_wq = create_workqueue("ata", 0);
if (!ata_wq)
return -ENOMEM;

diff -ruN 210-workthreads-old/fs/aio.c 210-workthreads-new/fs/aio.c
--- 210-workthreads-old/fs/aio.c 2004-11-03 21:53:42.000000000 +1100
+++ 210-workthreads-new/fs/aio.c 2004-11-10 12:16:01.000000000 +1100
@@ -72,7 +72,7 @@
kioctx_cachep = kmem_cache_create("kioctx", sizeof(struct kioctx),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);

- aio_wq = create_workqueue("aio");
+ aio_wq = create_workqueue("aio", 0);

pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page));

diff -ruN 210-workthreads-old/fs/reiserfs/journal.c 210-workthreads-new/fs/reiserfs/journal.c
--- 210-workthreads-old/fs/reiserfs/journal.c 2004-11-03 21:53:50.000000000 +1100
+++ 210-workthreads-new/fs/reiserfs/journal.c 2004-11-24 15:47:12.119383528 +1100
@@ -2483,7 +2483,7 @@

reiserfs_mounted_fs_count++ ;
if (reiserfs_mounted_fs_count <= 1)
- commit_wq = create_workqueue("reiserfs");
+ commit_wq = create_workqueue("reiserfs", PF_SYNCTHREAD);

INIT_WORK(&journal->j_work, flush_async_commits, p_s_sb);
return 0 ;
diff -ruN 210-workthreads-old/fs/xfs/linux-2.6/xfs_buf.c 210-workthreads-new/fs/xfs/linux-2.6/xfs_buf.c
--- 210-workthreads-old/fs/xfs/linux-2.6/xfs_buf.c 2004-11-03 21:51:13.000000000 +1100
+++ 210-workthreads-new/fs/xfs/linux-2.6/xfs_buf.c 2004-11-24 15:47:12.121383224 +1100
@@ -1784,11 +1784,11 @@
{
int rval;

- pagebuf_logio_workqueue = create_workqueue("xfslogd");
+ pagebuf_logio_workqueue = create_workqueue("xfslogd", PF_SYNCTHREAD);
if (!pagebuf_logio_workqueue)
return -ENOMEM;

- pagebuf_dataio_workqueue = create_workqueue("xfsdatad");
+ pagebuf_dataio_workqueue = create_workqueue("xfsdatad", PF_SYNCTHREAD);
if (!pagebuf_dataio_workqueue) {
destroy_workqueue(pagebuf_logio_workqueue);
return -ENOMEM;
diff -ruN 210-workthreads-old/include/linux/kthread.h 210-workthreads-new/include/linux/kthread.h
--- 210-workthreads-old/include/linux/kthread.h 2004-11-03 21:51:12.000000000 +1100
+++ 210-workthreads-new/include/linux/kthread.h 2004-11-04 16:27:40.000000000 +1100
@@ -25,20 +25,26 @@
*/
struct task_struct *kthread_create(int (*threadfn)(void *data),
void *data,
+ unsigned long freezer_flags,
const char namefmt[], ...);

/**
* kthread_run: create and wake a thread.
* @threadfn: the function to run until signal_pending(current).
* @data: data ptr for @threadfn.
+ * @freezer_flags: process flags that should be used for freezing.
+ * PF_SYNCTHREAD if needed for syncing data to disk.
+ * PF_NOFREEZE if also needed for writing the image.
+ * 0 otherwise.
* @namefmt: printf-style name for the thread.
*
* Description: Convenient wrapper for kthread_create() followed by
* wake_up_process(). Returns the kthread, or ERR_PTR(-ENOMEM). */
-#define kthread_run(threadfn, data, namefmt, ...) \
+#define kthread_run(threadfn, data, freezer_flags, namefmt, ...) \
({ \
struct task_struct *__k \
- = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
+ = kthread_create(threadfn, data, freezer_flags, \
+ namefmt, ## __VA_ARGS__); \
if (!IS_ERR(__k)) \
wake_up_process(__k); \
__k; \
diff -ruN 210-workthreads-old/include/linux/workqueue.h 210-workthreads-new/include/linux/workqueue.h
--- 210-workthreads-old/include/linux/workqueue.h 2004-11-03 21:54:39.000000000 +1100
+++ 210-workthreads-new/include/linux/workqueue.h 2004-11-04 16:27:40.000000000 +1100
@@ -51,9 +51,10 @@
} while (0)

extern struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread);
-#define create_workqueue(name) __create_workqueue((name), 0)
-#define create_singlethread_workqueue(name) __create_workqueue((name), 1)
+ int singlethread,
+ unsigned long freezer_flag);
+#define create_workqueue(name, flags) __create_workqueue((name), 0, flags)
+#define create_singlethread_workqueue(name, flags) __create_workqueue((name), 1, flags)

extern void destroy_workqueue(struct workqueue_struct *wq);

diff -ruN 210-workthreads-old/kernel/kmod.c 210-workthreads-new/kernel/kmod.c
--- 210-workthreads-old/kernel/kmod.c 2004-11-03 21:52:56.000000000 +1100
+++ 210-workthreads-new/kernel/kmod.c 2004-11-04 16:27:40.000000000 +1100
@@ -274,6 +274,6 @@

void __init usermodehelper_init(void)
{
- khelper_wq = create_singlethread_workqueue("khelper");
+ khelper_wq = create_singlethread_workqueue("khelper", 0);
BUG_ON(!khelper_wq);
}
diff -ruN 210-workthreads-old/kernel/kthread.c 210-workthreads-new/kernel/kthread.c
--- 210-workthreads-old/kernel/kthread.c 2004-11-03 21:55:00.000000000 +1100
+++ 210-workthreads-new/kernel/kthread.c 2004-11-09 12:26:59.000000000 +1100
@@ -19,6 +19,7 @@
/* Information passed to kthread() from keventd. */
int (*threadfn)(void *data);
void *data;
+ unsigned long freezer_flags;
struct completion started;

/* Result passed back to kthread_create() from keventd. */
@@ -80,6 +81,10 @@
/* By default we can run anywhere, unlike keventd. */
set_cpus_allowed(current, CPU_MASK_ALL);

+ /* Set our freezer flags */
+ current->flags &= ~(PF_SYNCTHREAD | PF_NOFREEZE);
+ current->flags |= create->freezer_flags;
+
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_INTERRUPTIBLE);
complete(&create->started);
@@ -115,6 +120,7 @@

struct task_struct *kthread_create(int (*threadfn)(void *data),
void *data,
+ unsigned long freezer_flags,
const char namefmt[],
...)
{
@@ -123,6 +129,7 @@

create.threadfn = threadfn;
create.data = data;
+ create.freezer_flags = freezer_flags;
init_completion(&create.started);
init_completion(&create.done);

diff -ruN 210-workthreads-old/kernel/sched.c 210-workthreads-new/kernel/sched.c
--- 210-workthreads-old/kernel/sched.c 2004-11-24 15:47:06.402252664 +1100
+++ 210-workthreads-new/kernel/sched.c 2004-11-24 15:47:12.250363616 +1100
@@ -4146,10 +4146,10 @@

switch (action) {
case CPU_UP_PREPARE:
- p = kthread_create(migration_thread, hcpu, "migration/%d",cpu);
+ p = kthread_create(migration_thread, hcpu, 0,
+ "migration/%d",cpu);
if (IS_ERR(p))
return NOTIFY_BAD;
- p->flags |= PF_NOFREEZE;
kthread_bind(p, cpu);
/* Must be high prio: stop_machine expects to yield to it. */
rq = task_rq_lock(p, &flags);
diff -ruN 210-workthreads-old/kernel/softirq.c 210-workthreads-new/kernel/softirq.c
--- 210-workthreads-old/kernel/softirq.c 2004-11-03 21:52:39.000000000 +1100
+++ 210-workthreads-new/kernel/softirq.c 2004-11-16 14:48:20.000000000 +1100
@@ -328,7 +328,6 @@
static int ksoftirqd(void * __bind_cpu)
{
set_user_nice(current, 19);
- current->flags |= PF_NOFREEZE;

set_current_state(TASK_INTERRUPTIBLE);

@@ -338,6 +337,8 @@

__set_current_state(TASK_RUNNING);

+ try_to_freeze(PF_FREEZE);
+
while (local_softirq_pending()) {
/* Preempt disable stops cpu going offline.
If already offline, we'll be on wrong CPU:
@@ -430,7 +431,7 @@
case CPU_UP_PREPARE:
BUG_ON(per_cpu(tasklet_vec, hotcpu).list);
BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list);
- p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
+ p = kthread_create(ksoftirqd, hcpu, PF_NOFREEZE, "ksoftirqd/%d", hotcpu);
if (IS_ERR(p)) {
printk("ksoftirqd for %i failed\n", hotcpu);
return NOTIFY_BAD;
diff -ruN 210-workthreads-old/kernel/stop_machine.c 210-workthreads-new/kernel/stop_machine.c
--- 210-workthreads-old/kernel/stop_machine.c 2004-11-03 21:51:14.000000000 +1100
+++ 210-workthreads-new/kernel/stop_machine.c 2004-11-04 16:27:40.000000000 +1100
@@ -174,7 +174,7 @@
if (cpu == NR_CPUS)
cpu = smp_processor_id();

- p = kthread_create(do_stop, &smdata, "kstopmachine");
+ p = kthread_create(do_stop, &smdata, 0, "kstopmachine");
if (!IS_ERR(p)) {
kthread_bind(p, cpu);
wake_up_process(p);
diff -ruN 210-workthreads-old/kernel/workqueue.c 210-workthreads-new/kernel/workqueue.c
--- 210-workthreads-old/kernel/workqueue.c 2004-11-03 21:55:02.000000000 +1100
+++ 210-workthreads-new/kernel/workqueue.c 2004-11-10 09:46:21.000000000 +1100
@@ -186,8 +186,6 @@
struct k_sigaction sa;
sigset_t blocked;

- current->flags |= PF_NOFREEZE;
-
set_user_nice(current, -10);

/* Block and flush all signals */
@@ -208,6 +206,7 @@
schedule();
else
__set_current_state(TASK_RUNNING);
+ try_to_freeze(PF_FREEZE);
remove_wait_queue(&cwq->more_work, &wait);

if (!list_empty(&cwq->worklist))
@@ -277,7 +276,8 @@
}

static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq,
- int cpu)
+ int cpu,
+ unsigned long freezer_flags)
{
struct cpu_workqueue_struct *cwq = wq->cpu_wq + cpu;
struct task_struct *p;
@@ -292,9 +292,11 @@
init_waitqueue_head(&cwq->work_done);

if (is_single_threaded(wq))
- p = kthread_create(worker_thread, cwq, "%s", wq->name);
+ p = kthread_create(worker_thread, cwq, freezer_flags,
+ "%s", wq->name);
else
- p = kthread_create(worker_thread, cwq, "%s/%d", wq->name, cpu);
+ p = kthread_create(worker_thread, cwq, freezer_flags,
+ "%s/%d", wq->name, cpu);
if (IS_ERR(p))
return NULL;
cwq->thread = p;
@@ -302,7 +304,8 @@
}

struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread)
+ int singlethread,
+ unsigned long freezer_flags)
{
int cpu, destroy = 0;
struct workqueue_struct *wq;
@@ -320,7 +323,7 @@
lock_cpu_hotplug();
if (singlethread) {
INIT_LIST_HEAD(&wq->list);
- p = create_workqueue_thread(wq, 0);
+ p = create_workqueue_thread(wq, 0, freezer_flags);
if (!p)
destroy = 1;
else
@@ -330,7 +333,7 @@
list_add(&wq->list, &workqueues);
spin_unlock(&workqueue_lock);
for_each_online_cpu(cpu) {
- p = create_workqueue_thread(wq, cpu);
+ p = create_workqueue_thread(wq, cpu, freezer_flags);
if (p) {
kthread_bind(p, cpu);
wake_up_process(p);
@@ -513,7 +516,7 @@
void init_workqueues(void)
{
hotcpu_notifier(workqueue_cpu_callback, 0);
- keventd_wq = create_workqueue("events");
+ keventd_wq = create_workqueue("events", PF_NOFREEZE);
BUG_ON(!keventd_wq);
}

diff -ruN 210-workthreads-old/mm/pdflush.c 210-workthreads-new/mm/pdflush.c
--- 210-workthreads-old/mm/pdflush.c 2004-11-03 21:55:04.000000000 +1100
+++ 210-workthreads-new/mm/pdflush.c 2004-11-24 15:47:12.139380488 +1100
@@ -215,7 +215,7 @@

static void start_one_pdflush_thread(void)
{
- kthread_run(pdflush, NULL, "pdflush");
+ kthread_run(pdflush, NULL, 0, "pdflush");
}

static int __init pdflush_init(void)


2004-11-24 13:16:25

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 34/51: Includes

These are the include changes files for suspend2.

I seek to keep this swsusp compatible, but it might be a little out of sync with Pavel's changes.

diff -ruN 822-includes-old/include/asm-i386/suspend.h 822-includes-new/include/asm-i386/suspend.h
--- 822-includes-old/include/asm-i386/suspend.h 2004-11-24 09:53:09.000000000 +1100
+++ 822-includes-new/include/asm-i386/suspend.h 2004-11-24 18:51:50.270377720 +1100
@@ -3,6 +3,7 @@
* Based on code
* Copyright 2001 Patrick Mochel <[email protected]>
*/
+#include <linux/errno.h>
#include <asm/desc.h>
#include <asm/i387.h>

diff -ruN 822-includes-old/include/linux/suspend.h 822-includes-new/include/linux/suspend.h
--- 822-includes-old/include/linux/suspend.h 2004-11-03 21:52:41.000000000 +1100
+++ 822-includes-new/include/linux/suspend.h 2004-11-24 18:51:50.298373464 +1100
@@ -4,58 +4,125 @@
#ifdef CONFIG_X86
#include <asm/suspend.h>
#endif
-#include <linux/swap.h>
-#include <linux/notifier.h>
-#include <linux/config.h>
-#include <linux/init.h>
-#include <linux/pm.h>
-
-#ifdef CONFIG_PM
-/* page backup entry */
-typedef struct pbe {
- unsigned long address; /* address of the copy */
- unsigned long orig_address; /* original address of page */
- swp_entry_t swap_address;
- swp_entry_t dummy; /* we need scratch space at
- * end of page (see link, diskpage)
- */
-} suspend_pagedir_t;
-
-#define SWAP_FILENAME_MAXLENGTH 32
-
-
-#define SUSPEND_PD_PAGES(x) (((x)*sizeof(struct pbe))/PAGE_SIZE+1)
-
-/* mm/vmscan.c */
-extern int shrink_mem(void);
-
-/* mm/page_alloc.c */
-extern void drain_local_pages(void);

-/* kernel/power/swsusp.c */
-extern int software_suspend(void);
+#include <linux/kernel.h>
+extern char __nosave_begin, __nosave_end;

-#else /* CONFIG_SOFTWARE_SUSPEND */
-static inline int software_suspend(void)
-{
- printk("Warning: fake suspend called\n");
- return -EPERM;
-}
-#endif /* CONFIG_SOFTWARE_SUSPEND */
+#ifdef CONFIG_PM

+#include <linux/init.h>

-#ifdef CONFIG_PM
-extern void refrigerator(unsigned long);
-extern int freeze_processes(void);
-extern void thaw_processes(void);
+/* For swsusp */
+#include <linux/swap.h>

-extern int pm_prepare_console(void);
-extern void pm_restore_console(void);
+#define SUSPEND_CORE_VERSION "2.1.5.7"
+#ifndef KERNEL_POWER_SWSUSP_C
+#define name_suspend "Software Suspend " SUSPEND_CORE_VERSION ": "
+#endif

+extern unsigned long suspend_action;
+extern unsigned long suspend_result;
+extern unsigned long suspend_debug_state;
+
+#define TEST_RESULT_STATE(bit) (test_bit(bit, &suspend_result))
+#define SET_RESULT_STATE(bit) (test_and_set_bit(bit, &suspend_result))
+#define CLEAR_RESULT_STATE(bit) (test_and_clear_bit(bit, &suspend_result))
+
+#define TEST_ACTION_STATE(bit) (test_bit(bit, &suspend_action))
+#define SET_ACTION_STATE(bit) (test_and_set_bit(bit, &suspend_action))
+#define CLEAR_ACTION_STATE(bit) (test_and_clear_bit(bit, &suspend_action))
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+#define TEST_DEBUG_STATE(bit) (test_bit(bit, &suspend_debug_state))
+#define SET_DEBUG_STATE(bit) (test_and_set_bit(bit, &suspend_debug_state))
+#define CLEAR_DEBUG_STATE(bit) (test_and_clear_bit(bit, &suspend_debug_state))
#else
-static inline void refrigerator(unsigned long flag) {}
-#endif /* CONFIG_PM */
+#define TEST_DEBUG_STATE(bit) (0)
+#define SET_DEBUG_STATE(bit) (0)
+#define CLEAR_DEBUG_STATE(bit) (0)
+#endif

+/* first status register - this is suspend's return code. */
+#define SUSPEND_ABORTED 0
+#define SUSPEND_ABORT_REQUESTED 1
+#define SUSPEND_NOSTORAGE_AVAILABLE 2
+#define SUSPEND_INSUFFICIENT_STORAGE 3
+#define SUSPEND_FREEZING_FAILED 4
+#define SUSPEND_UNEXPECTED_ALLOC 5
+#define SUSPEND_KEPT_IMAGE 6
+#define SUSPEND_WOULD_EAT_MEMORY 7
+#define SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY 8
+
+/* second status register */
+#define SUSPEND_REBOOT 0
+#define SUSPEND_PAUSE 2
+#define SUSPEND_SLOW 3
+#define SUSPEND_NOPAGESET2 7
+#define SUSPEND_LOGALL 8
+/* Set to disable compression when compiled in */
+#define SUSPEND_NO_COMPRESSION 9
+//#define SUSPEND_ENABLE_KDB 10
+#define SUSPEND_CAN_CANCEL 11
+#define SUSPEND_KEEP_IMAGE 13
+#define SUSPEND_FREEZER_TEST 14
+#define SUSPEND_FREEZER_TEST_SHOWALL 15
+#define SUSPEND_SINGLESTEP 16
+#define SUSPEND_PAUSE_NEAR_PAGESET_END 17
+#define SUSPEND_USE_ACPI_S4 18
+#define SUSPEND_KEEP_METADATA 19
+#define SUSPEND_TEST_FILTER_SPEED 20
+#define SUSPEND_FREEZE_TIMERS 21
+#define SUSPEND_DISABLE_SYSDEV_SUPPORT 22
+
+/* debug sections - if debugging compiled in */
+#define SUSPEND_ANY_SECTION 0
+#define SUSPEND_FREEZER 1
+#define SUSPEND_EAT_MEMORY 2
+#define SUSPEND_PAGESETS 3
+#define SUSPEND_IO 4
+#define SUSPEND_BMAP 5
+#define SUSPEND_HEADER 6
+#define SUSPEND_WRITER 9
+#define SUSPEND_MEMORY 10
+#define SUSPEND_RANGES 11
+#define SUSPEND_SPINLOCKS 12
+#define SUSPEND_MEM_POOL 13
+#define SUSPEND_RANGE_PARANOIA 14
+#define SUSPEND_NOSAVE 15
+#define SUSPEND_INTEGRITY 16
+/* debugging levels. */
+#define SUSPEND_STATUS 0
+#define SUSPEND_ERROR 2
+#define SUSPEND_LOW 3
+#define SUSPEND_MEDIUM 4
+#define SUSPEND_HIGH 5
+#define SUSPEND_VERBOSE 6
+
+extern void __suspend_message(unsigned long section, unsigned long level, int log_normally,
+ const char *fmt, ...);
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+extern int suspend_memory_pool_level(int only_lowmem);
+extern int real_nr_free_pages(void);
+#define suspend_message(sn, lev, log, fmt, a...) \
+do { \
+ if (TEST_DEBUG_STATE(sn)) \
+ suspend2_core_ops->suspend_message(sn, lev, log, fmt, ##a); \
+} while(0)
+#define PRINTFREEMEM(desn) \
+ suspend_message(SUSPEND_MEMORY, SUSPEND_HIGH, 1, \
+ "Free memory %s: %d+%d.\n", desn, \
+ real_nr_free_pages() + suspend_amount_grabbed, \
+ suspend_memory_pool_level(0));
+#else /* CONFIG_SOFTWARE_SUSPEND_DEBUG */
+#define PRINTFREEMEM(desn) do { } while(0)
+#define suspend_message(sn, lev, log, fmt, a...) \
+do { \
+ if (lev == 0) \
+ suspend2_core_ops->suspend_message(sn, lev, log, fmt, ##a); \
+} while(0)
+#endif /* CONFIG_SOFTWARE_SUSPEND_DEBUG */
+
#ifdef CONFIG_SMP
extern void disable_nonboot_cpus(void);
extern void enable_nonboot_cpus(void);
@@ -64,10 +131,140 @@
static inline void enable_nonboot_cpus(void) {}
#endif

-void save_processor_state(void);
-void restore_processor_state(void);
-struct saved_context;
-void __save_processor_state(struct saved_context *ctxt);
-void __restore_processor_state(struct saved_context *ctxt);
+extern int software_suspend(void);
+
+/* Suspend 2 */
+
+#define SUSPEND_DISABLED 0
+#define SUSPEND_RUNNING 1
+#define SUSPEND_RESUME_DEVICE_OK 2
+#define SUSPEND_NORESUME_SPECIFIED 3
+#define SUSPEND_COMMANDLINE_ERROR 4
+#define SUSPEND_IGNORE_IMAGE 5
+#define SUSPEND_SANITY_CHECK_PROMPT 6
+#define SUSPEND_FREEZER_ON 7
+#define SUSPEND_DISABLE_SYNCING 8
+#define SUSPEND_BLOCK_PAGE_ALLOCATIONS 9
+#define SUSPEND_USE_MEMORY_POOL 10
+#define SUSPEND_STAGE2_CONTINUE 11
+#define SUSPEND_FREEZE_SMP 12
+#define SUSPEND_PAGESET2_NOT_LOADED 13
+#define SUSPEND_CONTINUE_REQ 14
+#define SUSPEND_RESUMED_BEFORE 15
+#define SUSPEND_RUNNING_INITRD 16
+#define SUSPEND_RESUME_NOT_DONE 17
+#define SUSPEND_BOOT_TIME 18
+#define SUSPEND_NOW_RESUMING 19
+#define SUSPEND_SLAB_ALLOC_FALLBACK 20
+#define SUSPEND_IGNORE_LOGLEVEL 21
+#define SUSPEND_TIMER_FREEZER_ON 22
+
+extern unsigned long software_suspend_state;
+#define test_suspend_state(bit) \
+ (test_bit(bit, &software_suspend_state))
+
+#define clear_suspend_state(bit) \
+ (clear_bit(bit, &software_suspend_state))
+
+#define set_suspend_state(bit) \
+ (set_bit(bit, &software_suspend_state))
+
+#define get_suspend_state() (software_suspend_state)
+#define restore_suspend_state(saved_state) \
+ do { software_suspend_state = saved_state; } while(0)
+
+/* kernel/suspend.c */
+extern void suspend_try_suspend(void);
+extern unsigned int suspend_task;
+
+/* Kernel threads are type 3 */
+#define FREEZER_ALL_THREADS 0
+#define FREEZER_KERNEL_THREADS 3
+
+extern int freeze_processes(int no_progress);
+extern void thaw_processes(int which_threads);
+
+extern int pm_prepare_console(void);
+extern void pm_restore_console(void);
+
+#define SUSPEND_KEY_KEYBOARD 1
+#define SUSPEND_KEY_SERIAL 2
+
+struct page;
+
+struct suspend2_core_ops {
+ /* Entry points for suspending & resuming */
+ void (* do_suspend) (void);
+ int (* do_resume) (void);
+
+ /* Pre and post lowlevel routines */
+ void (* suspend1) (void);
+ void (* suspend2) (void);
+ void (* resume1) (void);
+ void (* resume2) (void);
+
+ void (* free_pool_pages) (struct page *page, unsigned int order);
+ struct page * (* get_pool_pages) (unsigned int gfp_mask, unsigned int order);
+
+ unsigned long (* get_grabbed_pages) (int order);
+ void (* cleanup_finished_io) (void);
+
+ void (* suspend_message) (unsigned long, unsigned long, int, const char *, ...);
+ unsigned long (* update_status) (unsigned long value, unsigned long maximum,
+ const char *fmt, ...);
+ void (*prepare_status) (int printalways, int clearbar, const char *fmt, ...);
+ void (* schedule_message) (int message_number);
+ void (* early_boot_plugins) (void);
+ int (* keypress) (unsigned int keycode);
+
+ void (* verify_checksums) (void);
+};
+extern volatile struct suspend2_core_ops * suspend2_core_ops;
+#ifdef CONFIG_SOFTWARE_SUSPEND2
+extern void software_suspend_try_resume(void);
+extern void suspend_handle_keypress(unsigned int keycode, int source);
+#else
+#define software_suspend_try_resume() do { } while(0)
+#define suspend_handle_keypress(a, b) do { } while(0)
+#endif
+
+#define suspend2_free_pool_pages(page, order) suspend2_core_ops->free_pool_pages(page, order)
+#define suspend2_get_pool_pages(mask, order) suspend2_core_ops->get_pool_pages(mask, order)
+#define suspend2_get_grabbed_pages(order) suspend2_core_ops->get_grabbed_pages(order)
+#define suspend2_cleanup_finished_io() suspend2_core_ops->cleanup_finished_io()
+#define suspend2_verify_checksums() suspend2_core_ops->verify_checksums()
+
+#else /* CONFIG_PM off */
+
+#define suspend_try_suspend() do { } while(0)
+#define suspend_task (0)
+#define software_suspend_state (0)
+#define test_suspend_state(bit) (0)
+#define clear_suspend_state(bit) do { } while (0)
+#define set_suspend_state(bit) do { } while(0)
+#define get_suspend_state() (0)
+#define restore_suspend_state(saved_state) do { } while(0)
+#define software_suspend_try_resume() do { } while(0)
+
+static inline int suspend_bug(void)
+{
+ BUG();
+ return 0;
+}
+
+#define suspend2_free_pool_pages(page, order) suspend_bug()
+#define suspend2_get_pool_pages(mask, order) (struct page *) suspend_bug()
+#define suspend2_get_grabbed_pages(order) (struct page *) suspend_bug()
+#define suspend2_cleanup_finished_io() do { BUG(); } while(0)
+#define suspend2_verify_checksums() do { BUG(); } while(0)
+
+static inline int software_suspend(void)
+{
+ printk("Warning: fake suspend called\n");
+ return -EPERM;
+}
+#define software_resume() do { } while(0)
+#define suspend_handle_keypress(a, b) do { } while(0)
+#endif

#endif /* _LINUX_SWSUSP_H */
diff -ruN 822-includes-old/kernel/power/block_io.h 822-includes-new/kernel/power/block_io.h
--- 822-includes-old/kernel/power/block_io.h 1970-01-01 10:00:00.000000000 +1000
+++ 822-includes-new/kernel/power/block_io.h 2004-11-24 18:51:50.301373008 +1100
@@ -0,0 +1,52 @@
+/*
+ * block_io.h
+ *
+ * Copyright 2004 Nigel Cunningham <[email protected]>
+ *
+ * Distributed under GPLv2.
+ *
+ * This file contains declarations for functions exported from
+ * block_io.c, which contains low level io functions.
+ */
+
+/* 8192 4k pages = 32MB */
+#define MAX_READAHEAD (int) (8192)
+
+/* Forward Declarations */
+
+struct submit_params {
+ swp_entry_t swap_address;
+ struct page * page;
+ struct block_device * dev;
+ long blocks[PAGE_SIZE/512];
+ int blocks_used;
+ int readahead_index;
+ struct submit_params * next;
+};
+
+
+extern int max_async_ios;
+#define REAL_MAX_ASYNC ((max_async_ios ? max_async_ios : 128))
+
+/*
+ * Our exported interface so the swapwriter and NFS writer don't
+ * need these functions built in.
+ */
+struct suspend_bio_ops {
+ int (*set_block_size) (struct block_device * bdev, int size);
+ int (*get_block_size) (struct block_device * bdev);
+ int (*submit_io) (int rw,
+ struct submit_params * submit_info, int syncio);
+ int (*bdev_page_io) (int rw, struct block_device * bdev, long pos,
+ struct page * page);
+ void (*wait_on_readahead) (int readahead_index);
+ void (*check_io_stats) (void);
+ void (*reset_io_stats) (void);
+ void (*finish_all_io) (void);
+ int (*prepare_readahead) (int index);
+ void (*cleanup_readahead) (int index);
+ struct page ** readahead_pages;
+ int (*readahead_ready) (int readahead_index);
+};
+
+extern struct suspend_bio_ops suspend_bio_ops;
diff -ruN 822-includes-old/kernel/power/pageflags.h 822-includes-new/kernel/power/pageflags.h
--- 822-includes-old/kernel/power/pageflags.h 1970-01-01 10:00:00.000000000 +1000
+++ 822-includes-new/kernel/power/pageflags.h 2004-11-24 18:51:50.304372552 +1100
@@ -0,0 +1,80 @@
+/*
+ * kernel/power/pageflags.h
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Suspend2 needs a few pageflags while working that aren't otherwise
+ * used. To save the struct page pageflags, we dynamically allocate
+ * a bitmap and use that. These are the only non order-0 allocations
+ * we do.
+ */
+extern unsigned long * in_use_map;
+extern unsigned long * pageset2_map;
+extern unsigned long * checksum_map;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+extern unsigned long * unmap_map;
+#endif
+
+#define PAGENUMBER(page) (page-mem_map)
+#define PAGEINDEX(page) ((PAGENUMBER(page))/(8*sizeof(unsigned long)))
+#define PAGEBIT(page) ((int) ((PAGENUMBER(page))%(8 * sizeof(unsigned long))))
+
+/*
+ * freepagesmap is used in two ways:
+ * - During suspend, to tag pages which are not used (to speed up
+ * count_data_pages);
+ * - During resume, to tag pages which are in pagedir1. This does not tag
+ * pagedir2 pages, so !== first use.
+ */
+#define PageInUse(page) \
+ test_bit(PAGEBIT(page), &in_use_map[PAGEINDEX(page)])
+#define SetPageInUse(page) \
+ set_bit(PAGEBIT(page), &in_use_map[PAGEINDEX(page)])
+#define ClearPageInUse(page) \
+ clear_bit(PAGEBIT(page), &in_use_map[PAGEINDEX(page)])
+
+#define PagePageset2(page) \
+ (pageset2_map ? \
+ test_bit(PAGEBIT(page), &pageset2_map[PAGEINDEX(page)]) : \
+ 0)
+
+#define SetPagePageset2(page) \
+ set_bit(PAGEBIT(page), &pageset2_map[PAGEINDEX(page)])
+#define TestAndSetPagePageset2(page) \
+ test_and_set_bit(PAGEBIT(page), &pageset2_map[PAGEINDEX(page)])
+#define TestAndClearPagePageset2(page) \
+ test_and_clear_bit(PAGEBIT(page), &pageset2_map[PAGEINDEX(page)])
+#define ClearPagePageset2(page) \
+do { \
+ if (pageset2_map) \
+ clear_bit(PAGEBIT(page), &pageset2_map[PAGEINDEX(page)]); \
+} while(0)
+
+#define PageChecksumIgnore(page) \
+ (checksum_map ? \
+ test_bit(PAGEBIT(page), &checksum_map[PAGEINDEX(page)]) : \
+ 0)
+
+#define SetPageChecksumIgnore(page) \
+do { \
+ if (checksum_map) \
+ set_bit(PAGEBIT(page), &checksum_map[PAGEINDEX(page)]); \
+} while(0)
+
+#define ClearPageChecksumIgnore(page) \
+do { \
+ if (checksum_map) \
+ clear_bit(PAGEBIT(page), &checksum_map[PAGEINDEX(page)]); \
+} while(0)
+
+
+#define SetPageUnmap(page) \
+ set_bit(PAGEBIT(page), &unmap_map[PAGEINDEX(page)])
+#define PageUnmap(page) \
+ test_bit(PAGEBIT(page), &unmap_map[PAGEINDEX(page)])
+
+extern int allocate_local_pageflags(unsigned long ** pagemap, int setnosave);
+extern int free_local_pageflags(unsigned long ** pagemap);
+extern void clear_map(unsigned long * pagemap);
diff -ruN 822-includes-old/kernel/power/plugins.h 822-includes-new/kernel/power/plugins.h
--- 822-includes-old/kernel/power/plugins.h 1970-01-01 10:00:00.000000000 +1000
+++ 822-includes-new/kernel/power/plugins.h 2004-11-24 18:51:50.305372400 +1100
@@ -0,0 +1,205 @@
+/*
+ * kernel/power/plugin.h
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains declarations for plugins. Plugins are additions to
+ * suspend2 that provide facilities such as image compression or
+ * encryption, backends for storage of the image and user interfaces.
+ *
+ */
+
+/* This is the maximum size we store in the image header for a plugin name */
+#define SUSPEND_MAX_PLUGIN_NAME_LENGTH 30
+
+struct plugin_header {
+ char name[SUSPEND_MAX_PLUGIN_NAME_LENGTH];
+ int disabled;
+ int type;
+ int index;
+ int data_length;
+ unsigned long magic;
+};
+
+extern unsigned long memory_for_plugins(void);
+extern int num_plugins;
+
+#define FILTER_PLUGIN 1
+#define WRITER_PLUGIN 2
+#define UI_PLUGIN 3
+#define MISC_PLUGIN 4 // Block writer, eg.
+#define CHECKSUM_PLUGIN 5
+
+#define SUSPEND_ASYNC 0
+#define SUSPEND_SYNC 1
+
+#define SUSPEND_COMMON_IO_OPS \
+ /* Writing the image proper */ \
+ int (*write_init) (int stream_number); \
+ int (*write_chunk) (struct page * buffer_page); \
+ int (*write_cleanup) (void); \
+\
+ /* Reading the image proper */ \
+ int (*read_init) (int stream_number); \
+ int (*read_chunk) (struct page * buffer_page, int sync); \
+ int (*read_cleanup) (void); \
+\
+ /* Reset plugin if image exists but reading aborted */ \
+ void (*noresume_reset) (void);
+
+struct suspend_filter_ops {
+ SUSPEND_COMMON_IO_OPS
+ int (*expected_compression) (void);
+ struct list_head filter_list;
+};
+
+struct suspend_writer_ops {
+
+ SUSPEND_COMMON_IO_OPS
+
+ /* Calls for allocating storage */
+
+ long (*storage_available) (void); // Maximum size of image we can save
+ // (incl. space already allocated).
+
+ unsigned long (*storage_allocated) (void);
+ // Amount of storage already allocated
+ int (*release_storage) (void);
+
+ /*
+ * Header space is allocated separately. Note that allocation
+ * of space for the header might result in allocated space
+ * being stolen from the main pool if there is no unallocated
+ * space. We have to be able to allocate enough space for
+ * the header. We can eat memory to ensure there is enough
+ * for the main pool.
+ */
+ long (*allocate_header_space) (unsigned long space_requested);
+ int (*allocate_storage) (unsigned long space_requested);
+
+ /* Read and write the metadata */
+ int (*write_header_init) (void);
+ int (*write_header_chunk) (char * buffer_start, int buffer_size);
+ int (*write_header_cleanup) (void);
+
+ int (*read_header_init) (void);
+ int (*read_header_chunk) (char * buffer_start, int buffer_size);
+ int (*read_header_cleanup) (void);
+
+ /* Prepare metadata to be saved (relativise/absolutise ranges) */
+ int (*prepare_save_ranges) (void);
+ int (*post_load_ranges) (void);
+
+ /* Attempt to parse an image location */
+ int (*parse_image_location) (char * buffer, int only_writer);
+
+ /* Determine whether image exists that we can restore */
+ int (*image_exists) (void);
+
+ /* Mark the image as having tried to resume */
+ void (*mark_resume_attempted) (void);
+
+ /* Destroy image if one exists */
+ int (*invalidate_image) (void);
+
+ /* Wait on I/O */
+ int (*wait_on_io) (int flush_all);
+
+ struct list_head writer_list;
+};
+
+struct suspend_ui_ops {
+ void (*early_boot_message_prep) (void);
+ void (*prepare) (void);
+ void (*message) (
+ unsigned long type, unsigned long level,
+ int normally_logged,
+ const char *format, va_list args);
+ void (*log_level_change) (void);
+ unsigned long (*update_progress) (
+ unsigned long value, unsigned long maximum,
+ const char *fmt, va_list args);
+ void (*cleanup) (void);
+ int (*keypress) (unsigned int key);
+ void (*post_kernel_restore_redraw) (void);
+
+ struct list_head ui_list;
+};
+
+struct suspend_checksum_ops {
+ void (*calculate_checksums) (void);
+ void (*check_checksums) (void);
+ void (*print_differences) (void);
+ int (*allocate_pages) (void);
+};
+
+struct suspend_plugin_ops {
+ /* Functions common to all plugins */
+ int type;
+ char * name;
+ int disabled;
+ struct list_head plugin_list;
+ unsigned long (*memory_needed) (void);
+ unsigned long (*storage_needed) (void);
+ int (*print_debug_info) (char * buffer, int size);
+ int (*save_config_info) (char * buffer);
+ void (*load_config_info) (char * buffer, int len);
+
+ /* Initialise & cleanup - general routines called
+ * at the start and end of a cycle. */
+ int (*initialise) (void);
+ void (*cleanup) (void);
+
+ /* Set list of devices not to be suspended/resumed */
+ void (*dpm_set_devices) (void);
+
+ union {
+ struct suspend_filter_ops filter;
+ struct suspend_writer_ops writer;
+ struct suspend_ui_ops ui;
+ struct suspend_checksum_ops checksum;
+ } ops;
+};
+
+extern struct suspend_plugin_ops * active_writer;
+extern struct list_head suspend_filters, suspend_writers, suspend_plugins, suspend_ui;
+extern struct suspend_plugin_ops * checksum_plugin;
+extern void prepare_console_plugins(void);
+extern void cleanup_console_plugins(void);
+extern struct suspend_plugin_ops * find_plugin_given_name(char * name);
+extern struct suspend_plugin_ops * get_next_filter(struct suspend_plugin_ops *);
+extern int suspend_register_plugin(struct suspend_plugin_ops * plugin);
+extern void suspend_move_plugin_tail(struct suspend_plugin_ops * plugin);
+
+extern int initialise_suspend_plugins(void);
+extern void cleanup_suspend_plugins(void);
+extern unsigned long header_storage_for_plugins(void);
+extern unsigned long memory_for_plugins(void);
+extern int print_plugin_debug_info(char * buffer, int buffer_size);
+extern int suspend_register_plugin(struct suspend_plugin_ops * plugin);
+extern void suspend_unregister_plugin(struct suspend_plugin_ops * plugin);
+extern int initialise_suspend_plugins(void);
+extern void cleanup_suspend_plugins(void);
+extern void suspend_post_restore_redraw(void);
+
+static inline void suspend_checksum_calculate_checksums(void)
+{
+ if (checksum_plugin)
+ checksum_plugin->ops.checksum.calculate_checksums();
+}
+
+static inline void suspend_checksum_print_differences(void)
+{
+ if (checksum_plugin)
+ checksum_plugin->ops.checksum.print_differences();
+}
+
+static inline int suspend_allocate_checksum_pages(void)
+{
+ if (checksum_plugin)
+ return checksum_plugin->ops.checksum.allocate_pages();
+ else
+ return 0;
+}
diff -ruN 822-includes-old/kernel/power/power.h 822-includes-new/kernel/power/power.h
--- 822-includes-old/kernel/power/power.h 2004-11-24 18:52:16.759350784 +1100
+++ 822-includes-new/kernel/power/power.h 2004-11-24 18:51:50.306372248 +1100
@@ -1,6 +1,8 @@
#include <linux/suspend.h>
#include <linux/utsname.h>

+#include "suspend.h"
+
/* With SUSPEND_CONSOLE defined, it suspend looks *really* cool, but
we probably do not take enough locks for switching consoles, etc,
so bad things might happen.
diff -ruN 822-includes-old/kernel/power/proc.h 822-includes-new/kernel/power/proc.h
--- 822-includes-old/kernel/power/proc.h 1970-01-01 10:00:00.000000000 +1000
+++ 822-includes-new/kernel/power/proc.h 2004-11-24 18:51:50.307372096 +1100
@@ -0,0 +1,64 @@
+/*
+ * kernel/power/proc.h
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It provides declarations for suspend to use in managing
+ * /proc/software_suspend. When we switch to kobjects,
+ * this will become redundant.
+ *
+ */
+
+struct suspend_proc_data {
+ char * filename;
+ int permissions;
+ int type;
+ union {
+ struct {
+ unsigned long * bit_vector;
+ int bit;
+ } bit;
+ struct {
+ int * variable;
+ int minimum;
+ int maximum;
+ } integer;
+ struct {
+ unsigned long * variable;
+ unsigned long minimum;
+ unsigned long maximum;
+ } ul;
+ struct {
+ char * variable;
+ int max_length;
+ } string;
+ struct {
+ void * read_proc;
+ void * write_proc;
+ void * data;
+ } special;
+ } data;
+
+ /* Side effects routines. Used, eg, for reparsing the
+ * resume2 entry when it changes */
+ int (* read_proc) (void);
+ int (* write_proc) (void);
+ struct list_head proc_data_list;
+};
+
+#define SUSPEND_PROC_DATA_CUSTOM 0
+#define SUSPEND_PROC_DATA_BIT 1
+#define SUSPEND_PROC_DATA_INTEGER 2
+#define SUSPEND_PROC_DATA_UL 3
+#define SUSPEND_PROC_DATA_STRING 4
+
+#define PROC_WRITEONLY 0200
+#define PROC_READONLY 0400
+#define PROC_RW 0600
+
+struct proc_dir_entry * suspend_register_procfile(
+ struct suspend_proc_data * suspend_proc_data);
+void suspend_unregister_procfile(struct suspend_proc_data * suspend_proc_data);
+
diff -ruN 822-includes-old/kernel/power/range.h 822-includes-new/kernel/power/range.h
--- 822-includes-old/kernel/power/range.h 1970-01-01 10:00:00.000000000 +1000
+++ 822-includes-new/kernel/power/range.h 2004-11-24 18:51:50.308371944 +1100
@@ -0,0 +1,105 @@
+/*
+ * kernel/power/range.h
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains declarations related to ranges. Ranges (otherwise
+ * known as an extent, I'm told), is suspend's method of storing
+ * all of the metadata for the image. See range.c for more info.
+ *
+ */
+
+struct rangechain {
+ struct range * first;
+ struct range * last;
+ int size; /* size of the range ie sum (max-min+1) */
+ int allocs;
+ int frees;
+ int debug;
+ int timesusedoptimisation;
+ char * name;
+ struct range * lastaccessed, *prevtolastaccessed, *prevtoprev;
+};
+
+/*
+ * We rely on ranges not fitting evenly into a page.
+ * The last four bytes are used to store the number
+ * of the page, to make saving & reloading pages simpler.
+ */
+struct range {
+ unsigned long minimum;
+ unsigned long maximum;
+ struct range * next;
+};
+
+
+#define RANGES_PER_PAGE (PAGE_SIZE / (sizeof(struct range)))
+#define RANGEPAGELINK(x) ((unsigned long *) \
+ ((((unsigned long) x) & PAGE_MASK) + PAGE_SIZE - \
+ sizeof(unsigned long)))
+
+#define range_for_each(rangechain, rangepointer, value) \
+if ((rangechain)->first) \
+ for ((rangepointer) = (rangechain)->first, (value) = \
+ (rangepointer)->minimum; \
+ ((rangepointer) && ((rangepointer)->next || (value) <= \
+ (rangepointer)->maximum)); \
+ (((value) == (rangepointer)->maximum) ? \
+ ((rangepointer) = (rangepointer)->next, (value) = \
+ ((rangepointer) ? (rangepointer)->minimum : 0)) : \
+ (value)++))
+
+/*
+ * When using compression and expected_compression > 0,
+ * we allocate fewer swap entries, so GET_RANGE_NEXT can
+ * validly run out of data to return.
+ */
+#define GET_RANGE_NEXT(currentrange, currentval) \
+{ \
+ if (currentrange) { \
+ if ((currentval) == (currentrange)->maximum) { \
+ if ((currentrange)->next) { \
+ (currentrange) = (currentrange)->next; \
+ (currentval) = (currentrange)->minimum; \
+ } else { \
+ (currentrange) = NULL; \
+ (currentval) = 0; \
+ } \
+ } else \
+ currentval++; \
+ } \
+}
+
+extern int max_ranges_used;
+extern int num_range_pages;
+int add_to_range_chain(struct rangechain * chain, unsigned long value);
+void put_range_chain(struct rangechain * chain);
+void print_chain(int debuglevel, struct rangechain * chain, int printasswap);
+int free_ranges(void);
+int append_to_range_chain(int chain, unsigned long min, unsigned long max);
+void relativise_ranges(void);
+void relativise_chain(struct rangechain * chain);
+void absolutise_ranges(void);
+void absolutise_chain(struct rangechain * chain);
+int get_rangepages_list(void);
+void put_rangepages_list(void);
+unsigned long * get_rangepages_list_entry(int index);
+int relocate_rangepages(void);
+
+extern struct range * first_range_page, * last_range_page;
+
+#define RANGE_RELATIVE(x) (struct range *) ((((unsigned long) x) & \
+ (PAGE_SIZE - 1)) | \
+ ((*RANGEPAGELINK(x) & (PAGE_SIZE - 1)) << PAGE_SHIFT))
+#define RANGE_ABSOLUTE(entry) (struct range *) \
+ ((((unsigned long) (entry)) & (PAGE_SIZE - 1)) | \
+ (unsigned long) get_rangepages_list_entry(((unsigned long) (entry)) >> PAGE_SHIFT))
+
+/* swap_entry_to_range_val & range_val_to_swap_entry:
+ * We are putting offset in the low bits so consecutive swap entries
+ * make consecutive range values */
+#define swap_entry_to_range_val(swp_entry) (swp_entry.val)
+#define range_val_to_swap_entry(val) (swp_entry_t) { (val) }
+
diff -ruN 822-includes-old/kernel/power/smp.c 822-includes-new/kernel/power/smp.c
--- 822-includes-old/kernel/power/smp.c 2004-11-03 21:55:01.000000000 +1100
+++ 822-includes-new/kernel/power/smp.c 2004-11-24 18:51:50.315370880 +1100
@@ -15,6 +15,7 @@
#include <linux/module.h>
#include <asm/atomic.h>
#include <asm/tlbflush.h>
+#include "suspend.h"

static atomic_t cpu_counter, freeze;

diff -ruN 822-includes-old/kernel/power/suspend.h 822-includes-new/kernel/power/suspend.h
--- 822-includes-old/kernel/power/suspend.h 1970-01-01 10:00:00.000000000 +1000
+++ 822-includes-new/kernel/power/suspend.h 2004-11-24 18:51:50.316370728 +1100
@@ -0,0 +1,298 @@
+/*
+ * kernel/power/suspend2.h
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains declarations used throughout swsusp and suspend2.
+ *
+ */
+#ifndef KERNEL_POWER_SUSPEND_H
+#define KERNEL_POWER_SUSPEND_H
+
+#include <linux/delay.h>
+#include "range.h"
+
+/* ---------------------------- swsusp only ----------------------------- */
+
+typedef struct pbe {
+ unsigned long address; /* address of the copy */
+ unsigned long orig_address; /* original address of page */
+ swp_entry_t swap_address;
+ swp_entry_t dummy; /* we need scratch space at
+ * end of page (see link, diskpage)
+ */
+} suspend_pagedir_t;
+
+#define SUSPEND_PD_PAGES(x) (((x)*sizeof(struct pbe))/PAGE_SIZE+1)
+
+/* mm/page_alloc.c */
+extern void drain_local_pages(void);
+
+void save_processor_state(void);
+void restore_processor_state(void);
+struct saved_context;
+void __save_processor_state(struct saved_context *ctxt);
+void __restore_processor_state(struct saved_context *ctxt);
+
+/* ---------------------------- Suspend2 -------------------------------- */
+
+/* Page Backup Entry.
+ *
+ * This is an abstraction which contains the data for one
+ * page of the image. (The data is really stored in ranges).
+ */
+
+struct pbe2 {
+ struct page * origaddress; /* Original address of page */
+ struct page * address; /* Address of copy of page */
+ struct range * currentorigrange;
+ struct range * currentdestrange;
+
+ struct pagedir * pagedir;
+};
+
+/* Pagedir
+ *
+ * Contains the metadata for a set of pages saved in the image.
+ */
+struct pagedir {
+ int pagedir_num;
+ int pageset_size;
+ int lastpageset_size;
+ struct rangechain origranges;
+ struct rangechain destranges;
+ struct rangechain allocdranges;
+};
+
+/* Function for setting the chain names for a pagedir (used
+ * for debugging */
+void set_chain_names(struct pagedir * p);
+
+#define pageset1_size (pagedir1.pageset_size)
+#define pageset2_size (pagedir2.pageset_size)
+
+/* Pagedir_nosave is pagedir1, loaded back in at the beginning
+ * of resuming and relocated so we can do our atomic restoration
+ * of the original kernel.
+ * Pagedir1 is the metadata for pageset 1 pages. Ditto for pageset 2.
+ */
+extern suspend_pagedir_t *pagedir_nosave __nosavedata;
+extern struct pagedir pagedir1, pagedir2;
+
+/* Non-plugin data saved in our image header */
+struct suspend_header {
+ u32 version_code;
+ unsigned long num_physpages;
+ char machine[65];
+ char version[65];
+ int num_cpus;
+ int page_size;
+ unsigned long orig_mem_free;
+ int num_range_pages;
+ struct range * unused_ranges;
+ int pageset_2_size;
+ int param0;
+ int param1;
+ int param2;
+ int param3;
+ int param4;
+ int progress0;
+ int progress1;
+ int progress2;
+ int progress3;
+ int io_time[2][2];
+
+ /* Implementation specific variables */
+#ifdef KERNEL_POWER_SWSUSP_C
+ suspend_pagedir_t *suspend_pagedir;
+ unsigned int num_pbes;
+#else
+ struct pagedir pagedir;
+#endif
+};
+
+/* Suspend memory pool functions */
+struct page * get_suspend_pool_pages(unsigned int gfp_mask, unsigned int order);
+void free_suspend_pool_pages(struct page *page, unsigned int order);
+
+extern void schedule_suspend_message(int message_number);
+extern int suspend_min_free;
+
+extern void suspend_restore_avenrun(void);
+extern void suspend_save_avenrun(void);
+
+extern unsigned long get_highstart_pfn(void);
+
+#define SWAP_FILENAME_MAXLENGTH 32
+
+extern int suspend_default_console_level;
+extern int max_async_ios;
+extern int image_size_limit;
+
+struct pageset_sizes_result {
+ int size1; /* Can't be unsigned - breaks MAX function */
+ int size1low;
+ int size2;
+ int size2low;
+ int needmorespace;
+};
+
+#define MB(x) ((x) >> (20 - PAGE_SHIFT))
+
+extern int suspend_amount_grabbed;
+
+/*
+ * XXX: We try to keep some more pages free so that I/O operations succeed
+ * without paging. Might this be more?
+ */
+#ifdef CONFIG_HIGHMEM
+#define MIN_FREE_RAM (get_highstart_pfn() >> 7)
+#else
+#define MIN_FREE_RAM (max_mapnr >> 7)
+#endif
+
+extern void prepare_status(int printalways, int clearbar, const char *fmt, ...);
+extern void abort_suspend(const char *fmt, ...);
+
+extern int suspend_snprintf(char * buffer, int buffer_size,
+ const char *fmt, ...);
+
+/* ------ prepare_image.c ------ */
+extern unsigned long get_grabbed_pages(int order);
+
+/* ------ io.c ------ */
+int suspend_early_boot_message(int can_erase_image, char *reason, ...);
+
+/* ------ console.c ------ */
+void check_shift_keys(int pause, char * message);
+unsigned long update_status(unsigned long value, unsigned long maximum,
+ const char *fmt, ...);
+
+extern int expected_compression_ratio(void);
+
+#define MAIN_STORAGE_NEEDED(USE_ECR) \
+ ((pageset1_size + pageset2_size) * \
+ (USE_ECR ? expected_compression_ratio() : 100) / 100)
+
+#define HEADER_BYTES_NEEDED \
+ ((num_range_pages << PAGE_SHIFT) + \
+ sizeof(struct suspend_header) + \
+ sizeof(struct plugin_header) + \
+ (int) header_storage_for_plugins() + \
+ num_plugins * \
+ (sizeof(struct plugin_header) + sizeof(int)))
+
+#define HEADER_STORAGE_NEEDED ((HEADER_BYTES_NEEDED + (int) PAGE_SIZE - 1) >> PAGE_SHIFT)
+
+#define STORAGE_NEEDED(USE_ECR) \
+ (MAIN_STORAGE_NEEDED(USE_ECR) + HEADER_STORAGE_NEEDED)
+
+#define RAM_TO_SUSPEND (1 + max((pageset1_size - pageset2_sizelow), 0) + \
+ MIN_FREE_RAM + memory_for_plugins())
+
+#ifndef KERNEL_POWER_SWSUSP_C
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+#define cond_show_pcp_lists() \
+do { \
+ if (TEST_DEBUG_STATE(SUSPEND_FREEZER)) \
+ show_pcp_lists(); \
+} while(0)
+
+#define MDELAY(a) do { if (TEST_ACTION_STATE(SUSPEND_SLOW)) mdelay(a); } \
+ while (0)
+#define MAX_FREEMEM_SLOTS 25
+enum {
+ SUSPEND_FREE_BASE,
+ SUSPEND_FREE_CONSOLE_ALLOC,
+ SUSPEND_FREE_DRAIN_PCP,
+ SUSPEND_FREE_IN_USE_MAP,
+ SUSPEND_FREE_PS2_MAP,
+ SUSPEND_FREE_CHECKSUM_MAP,
+ SUSPEND_FREE_UNMAP_MAP,
+ SUSPEND_FREE_RELOAD_PAGES,
+ SUSPEND_FREE_INIT_PLUGINS,
+ SUSPEND_FREE_MEM_POOL,
+ SUSPEND_FREE_FREEZER,
+ SUSPEND_FREE_EAT_MEMORY,
+ SUSPEND_FREE_SYNC,
+ SUSPEND_FREE_GRABBED_MEMORY,
+ SUSPEND_FREE_RANGE_PAGES,
+ SUSPEND_FREE_EXTRA_PD1,
+ SUSPEND_FREE_WRITER_STORAGE,
+ SUSPEND_FREE_HEADER_STORAGE,
+ SUSPEND_FREE_CHECKSUM_PAGES,
+ SUSPEND_FREE_KSTAT,
+ SUSPEND_FREE_DEBUG_INFO,
+ SUSPEND_FREE_INVALIDATE_IMAGE,
+ SUSPEND_FREE_IO,
+ SUSPEND_FREE_IO_INFO,
+ SUSPEND_FREE_START_ONE
+};
+extern void suspend_store_free_mem(int slot, int side);
+extern int suspend_free_mem_values[MAX_FREEMEM_SLOTS][2];
+#else
+#define suspend_store_free_mem(a, b) do { } while(0)
+#define MDELAY(a) do { } while (0)
+#define cond_show_pcp_lists() do { } while(0)
+#endif
+#endif /* Not swsusp */
+
+extern int expected_compression_ratio(void);
+int print_module_list_to_buffer(char * buffer, int size);
+
+extern unsigned int nr_suspends;
+extern char resume2_file[];
+
+extern int suspend_wait_for_keypress(void);
+
+#ifdef CONFIG_SMP
+extern void smp_suspend(void);
+extern void smp_continue(void);
+#else
+#define smp_suspend() do { } while(0)
+#define smp_continue() do { } while(0)
+#endif
+
+/* For user interface */
+#include <linux/syscalls.h>
+extern asmlinkage ssize_t sys_write(unsigned int fd, const char __user * buf,
+ size_t count);
+
+#ifdef CONFIG_BOOTSPLASH
+#include <linux/console.h>
+#include <linux/fb.h>
+#include "../../drivers/video/console/fbcon.h"
+static inline struct splash_data * get_splash_data(int consolenr)
+{
+ BUG_ON(consolenr >= MAX_NR_CONSOLES);
+
+ if (vc_cons[consolenr].d)
+ return vc_cons[consolenr].d->vc_splash_data;
+
+ return NULL;
+}
+#endif
+
+extern asmlinkage ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count);
+
+extern struct pm_ops * pm_ops;
+extern dev_t name_to_dev_t(char *line);
+extern char _text[], _etext[], _edata[], __bss_start[], _end[];
+extern void signal_wake_up(struct task_struct *t, int resume);
+
+extern struct partial_device_tree * suspend_device_tree;
+
+/* Returns whether it was already in the requested state */
+extern int suspend_map_kernel_page(struct page * page, int enable);
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+extern void suspend_map_atomic_copy_pages(void);
+extern void suspend_unmap_atomic_copy_pages(void);
+#else
+#define suspend_map_atomic_copy_pages() do { } while(0)
+#define suspend_unmap_atomic_copy_pages() do { } while(0)
+#endif
+
+#endif
diff -ruN 822-includes-old/kernel/power/swsusp.c 822-includes-new/kernel/power/swsusp.c
--- 822-includes-old/kernel/power/swsusp.c 2004-11-24 18:52:16.550382552 +1100
+++ 822-includes-new/kernel/power/swsusp.c 2004-11-24 18:51:50.318370424 +1100
@@ -36,6 +36,8 @@
* For TODOs,FIXMEs also look in Documentation/power/swsusp.txt
*/

+#define KERNEL_POWER_SWSUSP_C
+
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/suspend.h>
@@ -51,9 +53,7 @@
#include <linux/keyboard.h>
#include <linux/spinlock.h>
#include <linux/genhd.h>
-#include <linux/kernel.h>
#include <linux/major.h>
-#include <linux/swap.h>
#include <linux/pm.h>
#include <linux/device.h>
#include <linux/buffer_head.h>
@@ -70,9 +70,7 @@
#include <asm/io.h>

#include "power.h"
-
-/* References to section boundaries */
-extern char __nosave_begin, __nosave_end;
+#include "suspend.h"

extern int is_head_of_free_region(struct page *);



2004-11-24 13:22:12

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 23/51: PPC support.

>From Steve.

Not updated for a while, so I'm not sure if it still works. If not, it
shouldn't take much to get it going again.

diff -ruN 701-mac-old/arch/ppc/Kconfig 701-mac-new/arch/ppc/Kconfig
--- 701-mac-old/arch/ppc/Kconfig 2004-11-03 21:55:01.000000000 +1100
+++ 701-mac-new/arch/ppc/Kconfig 2004-11-04 16:27:40.000000000 +1100
@@ -225,6 +225,8 @@

If in doubt, say Y here.

+source kernel/power/Kconfig
+
source arch/ppc/platforms/4xx/Kconfig
source arch/ppc/platforms/85xx/Kconfig

diff -ruN 701-mac-old/arch/ppc/kernel/signal.c 701-mac-new/arch/ppc/kernel/signal.c
--- 701-mac-old/arch/ppc/kernel/signal.c 2004-11-03 21:55:01.000000000 +1100
+++ 701-mac-new/arch/ppc/kernel/signal.c 2004-11-04 16:27:40.000000000 +1100
@@ -604,6 +604,15 @@
unsigned long frame, newsp;
int signr, ret;

+ if (current->flags & PF_FREEZE) {
+ refrigerator(PF_FREEZE);
+ signr = 0;
+ ret = regs->gpr[3];
+ recalc_sigpending();
+ if (!signal_pending(current))
+ goto no_signal;
+ }
+
if (!oldset)
oldset = &current->blocked;

@@ -626,6 +635,7 @@
regs->gpr[3] = EINTR;
/* note that the cr0.SO bit is already set */
} else {
+no_signal:
regs->nip -= 4; /* Back up & retry system call */
regs->result = 0;
regs->trap = 0;
diff -ruN 701-mac-old/arch/ppc/kernel/vmlinux.lds.S 701-mac-new/arch/ppc/kernel/vmlinux.lds.S
--- 701-mac-old/arch/ppc/kernel/vmlinux.lds.S 2004-11-03 21:55:04.000000000 +1100
+++ 701-mac-new/arch/ppc/kernel/vmlinux.lds.S 2004-11-04 16:27:40.000000000 +1100
@@ -74,6 +74,12 @@
CONSTRUCTORS
}

+ . = ALIGN(4096);
+ __nosave_begin = .;
+ .data_nosave : { *(.data.nosave) }
+ . = ALIGN(4096);
+ __nosave_end = .;
+
. = ALIGN(32);
.data.cacheline_aligned : { *(.data.cacheline_aligned) }

diff -ruN 701-mac-old/arch/ppc/Makefile 701-mac-new/arch/ppc/Makefile
--- 701-mac-old/arch/ppc/Makefile 2004-11-03 21:51:14.000000000 +1100
+++ 701-mac-new/arch/ppc/Makefile 2004-11-04 16:27:40.000000000 +1100
@@ -61,6 +61,7 @@
drivers-$(CONFIG_8xx) += arch/ppc/8xx_io/
drivers-$(CONFIG_4xx) += arch/ppc/4xx_io/
drivers-$(CONFIG_CPM2) += arch/ppc/8260_io/
+drivers-$(CONFIG_PM) += arch/ppc/power/

drivers-$(CONFIG_OPROFILE) += arch/ppc/oprofile/

diff -ruN 701-mac-old/arch/ppc/mm/init.c 701-mac-new/arch/ppc/mm/init.c
--- 701-mac-old/arch/ppc/mm/init.c 2004-11-03 21:51:56.000000000 +1100
+++ 701-mac-new/arch/ppc/mm/init.c 2004-11-04 16:27:40.000000000 +1100
@@ -31,6 +31,7 @@
#include <linux/bootmem.h>
#include <linux/highmem.h>
#include <linux/initrd.h>
+#include <linux/suspend.h>

#include <asm/pgalloc.h>
#include <asm/prom.h>
@@ -149,6 +150,7 @@

while (start < end) {
ClearPageReserved(virt_to_page(start));
+ ClearPageNosave(virt_to_page(start));
set_page_count(virt_to_page(start), 1);
free_page(start);
cnt++;
@@ -188,6 +190,7 @@

for (; start < end; start += PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
+ ClearPageNosave(virt_to_page(start));
set_page_count(virt_to_page(start), 1);
free_page(start);
totalram_pages++;
@@ -424,8 +427,10 @@
/* if we are booted from BootX with an initial ramdisk,
make sure the ramdisk pages aren't reserved. */
if (initrd_start) {
- for (addr = initrd_start; addr < initrd_end; addr += PAGE_SIZE)
+ for (addr = initrd_start; addr < initrd_end; addr += PAGE_SIZE) {
ClearPageReserved(virt_to_page(addr));
+ ClearPageNosave(virt_to_page(addr));
+ }
}
#endif /* CONFIG_BLK_DEV_INITRD */

@@ -451,6 +456,12 @@
addr += PAGE_SIZE) {
if (!PageReserved(virt_to_page(addr)))
continue;
+ /*
+ * Mark nosave pages
+ */
+ if (addr >= (void *)&__nosave_begin && addr < (void *)&__nosave_end)
+ SetPageNosave(virt_to_page(addr));
+
if (addr < (ulong) etext)
codepages++;
else if (addr >= (unsigned long)&__init_begin
@@ -468,6 +479,7 @@
struct page *page = mem_map + pfn;

ClearPageReserved(page);
+ ClearPageNosave(page);
set_bit(PG_highmem, &page->flags);
set_page_count(page, 1);
__free_page(page);
@@ -501,7 +513,6 @@
pg->index = addr;
}
}
-
mem_init_done = 1;
}

diff -ruN 701-mac-old/arch/ppc/platforms/pmac_feature.c 701-mac-new/arch/ppc/platforms/pmac_feature.c
--- 701-mac-old/arch/ppc/platforms/pmac_feature.c 2004-11-03 21:55:00.000000000 +1100
+++ 701-mac-new/arch/ppc/platforms/pmac_feature.c 2004-11-04 16:27:40.000000000 +1100
@@ -2146,7 +2146,10 @@
},
{ "PowerBook6,1", "PowerBook G4 12\"",
PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features,
- PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE,
+ PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE
+#ifdef CONFIG_SOFTWARE_REPLACE_SLEEP
+ | PMAC_MB_CAN_SLEEP,
+#endif
},
{ "PowerBook6,2", "PowerBook G4",
PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features,
diff -ruN 701-mac-old/arch/ppc/power/cpu.c 701-mac-new/arch/ppc/power/cpu.c
--- 701-mac-old/arch/ppc/power/cpu.c 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/arch/ppc/power/cpu.c 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,61 @@
+#include <linux/config.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/tty.h>
+#include <linux/string.h>
+#include <linux/adb.h>
+#include <linux/cuda.h>
+#include <linux/pmu.h>
+#include <linux/suspend.h>
+#include <linux/delay.h>
+#include <linux/module.h>
+
+#include <asm/mmu_context.h>
+
+extern void enable_kernel_altivec(void);
+
+static inline void do_pmu_resume(void)
+{
+ struct adb_request req;
+
+ printk("resume pmu");
+ /* Tell PMU we are ready */
+ pmu_request(&req, NULL, 2, PMU_SYSTEM_READY, 2);
+ pmu_wait_complete(&req);
+
+ /* Resume PMU event interrupts */
+ pmu_resume();
+ printk(".\n");
+}
+
+void save_processor_state(void)
+{
+ printk("suspend pmu");
+ pmu_suspend();
+ printk(".\n");
+ printk("current is 0x%p\n", current);
+}
+
+void restore_processor_state(void)
+{
+ printk("seting context, 0x%p", current);
+ local_irq_disable();
+ /* Restore userland MMU context */
+ set_context(current->active_mm->context, current->active_mm->pgd);
+ printk(".\n");
+
+#ifdef CONFIG_ALTIVEC
+ if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC)
+ enable_kernel_altivec();
+#endif
+ printk("enable kernel fp");
+ enable_kernel_fp();
+ printk(".\n");
+ do_pmu_resume();
+ local_irq_enable();
+}
+
+EXPORT_SYMBOL(save_processor_state);
+EXPORT_SYMBOL(restore_processor_state);
diff -ruN 701-mac-old/arch/ppc/power/cpu_reg.S 701-mac-new/arch/ppc/power/cpu_reg.S
--- 701-mac-old/arch/ppc/power/cpu_reg.S 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/arch/ppc/power/cpu_reg.S 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,325 @@
+/*
+ * This code base on pmdisk.S by Benjamin Herrenschmidt <[email protected]>
+ *
+ * changed for swsusp2 by Hu Gang <[email protected]>
+ */
+#include <linux/config.h>
+#include <linux/threads.h>
+#include <asm/processor.h>
+#include <asm/page.h>
+#include <asm/cputable.h>
+#include <asm/thread_info.h>
+#include <asm/ppc_asm.h>
+#include <asm/offsets.h>
+
+/*
+ * Structure for storing CPU registers on the save area.
+ */
+#define SL_SP 0
+#define SL_PC 4
+#define SL_MSR 8
+#define SL_SDR1 0xc
+#define SL_SPRG0 0x10 /* 4 sprg's */
+#define SL_DBAT0 0x20
+#define SL_IBAT0 0x28
+#define SL_DBAT1 0x30
+#define SL_IBAT1 0x38
+#define SL_DBAT2 0x40
+#define SL_IBAT2 0x48
+#define SL_DBAT3 0x50
+#define SL_IBAT3 0x58
+#define SL_TB 0x60
+#define SL_R2 0x68
+#define SL_CR 0x6c
+#define SL_LR 0x70
+#define SL_R12 0x74 /* r12 to r31 */
+#define SL_SIZE (SL_R12 + 80)
+
+#define CPU_REG_MEM_DEFINE \
+ .section .data ; \
+ .align 5 ; \
+\
+_GLOBAL(cpu_reg_save_area) ; \
+ .space SL_SIZE
+
+#define CPU_REG_MEM_SAVE \
+ lis r11,cpu_reg_save_area@h;\
+ ori r11,r11,cpu_reg_save_area@l;\
+;\
+ mflr r0 ; \
+ stw r0,SL_LR(r11);\
+ mfcr r0;\
+ stw r0,SL_CR(r11);\
+ stw r1,SL_SP(r11);\
+ stw r2,SL_R2(r11);\
+ stmw r12,SL_R12(r11);\
+;\
+ /* Save MSR & SDR1 */;\
+ mfmsr r4;\
+ stw r4,SL_MSR(r11);\
+ mfsdr1 r4;\
+ stw r4,SL_SDR1(r11);\
+;\
+ /* Get a stable timebase and save it */;\
+1: mftbu r4;\
+ stw r4,SL_TB(r11);\
+ mftb r5;\
+ stw r5,SL_TB+4(r11);\
+ mftbu r3;\
+ cmpw r3,r4;\
+ bne 1b;\
+;\
+ /* Save SPRGs */;\
+ mfsprg r4,0;\
+ stw r4,SL_SPRG0(r11);\
+ mfsprg r4,1;\
+ stw r4,SL_SPRG0+4(r11);\
+ mfsprg r4,2;\
+ stw r4,SL_SPRG0+8(r11);\
+ mfsprg r4,3;\
+ stw r4,SL_SPRG0+12(r11);\
+;\
+ /* Save BATs */;\
+ mfdbatu r4,0;\
+ stw r4,SL_DBAT0(r11);\
+ mfdbatl r4,0;\
+ stw r4,SL_DBAT0+4(r11);\
+ mfdbatu r4,1;\
+ stw r4,SL_DBAT1(r11);\
+ mfdbatl r4,1;\
+ stw r4,SL_DBAT1+4(r11);\
+ mfdbatu r4,2;\
+ stw r4,SL_DBAT2(r11);\
+ mfdbatl r4,2;\
+ stw r4,SL_DBAT2+4(r11);\
+ mfdbatu r4,3;\
+ stw r4,SL_DBAT3(r11);\
+ mfdbatl r4,3;\
+ stw r4,SL_DBAT3+4(r11);\
+ mfibatu r4,0;\
+ stw r4,SL_IBAT0(r11);\
+ mfibatl r4,0;\
+ stw r4,SL_IBAT0+4(r11);\
+ mfibatu r4,1;\
+ stw r4,SL_IBAT1(r11);\
+ mfibatl r4,1;\
+ stw r4,SL_IBAT1+4(r11);\
+ mfibatu r4,2;\
+ stw r4,SL_IBAT2(r11);\
+ mfibatl r4,2;\
+ stw r4,SL_IBAT2+4(r11);\
+ mfibatu r4,3;\
+ stw r4,SL_IBAT3(r11);\
+ mfibatl r4,3;\
+ stw r4,SL_IBAT3+4(r11);\
+ /* Backup various CPU config stuffs */;\
+ /* bl __save_cpu_setup; */
+
+#define CPU_REG_MEM_DISABLE_MMU \
+ /* Disable MSR:DR to make sure we don't take a TLB or ;\
+ * hash miss during the copy, as our hash table will ;\
+ * for a while be unuseable. For .text, we assume we are;\
+ * covered by a BAT. This works only for non-G5 at this ;\
+ * point. G5 will need a better approach, possibly using;\
+ * a small temporary hash table filled with large mappings,;\
+ * disabling the MMU completely isn't a good option for ;\
+ * performance reasons. ;\
+ * (Note that 750's may have the same performance issue as;\
+ * the G5 in this case, we should investigate using moving;\
+ * BATs for these CPUs);\
+ */;\
+ mfmsr r0 ;\
+ sync ;\
+ rlwinm r0,r0,0,28,26 /* clear MSR_DR */ ;\
+ mtmsr r0 ;\
+ sync ;\
+ isync
+
+#define CPU_REG_MEM_FLUSH_CACHE \
+ /* Do a very simple cache flush/inval of the L1 to ensure \
+ * coherency of the icache \
+ */ \
+ lis r3,0x0002 ;\
+ mtctr r3 ;\
+ li r3, 0 ;\
+1: ;\
+ lwz r0,0(r3) ;\
+ addi r3,r3,0x0020 ;\
+ bdnz 1b ;\
+ isync ;\
+ sync ;\
+;\
+ /* Now flush those cache lines */ ;\
+ lis r3,0x0002 ;\
+ mtctr r3 ;\
+ li r3, 0 ;\
+1:;\
+ dcbf 0,r3 ;\
+ addi r3,r3,0x0020 ;\
+ bdnz 1b
+
+#define CPU_REG_MEM_RESTORE \
+/* Ok, we are now running with the kernel data of the old;\
+ * kernel fully restored. We can get to the save area;\
+ * easily now. As for the rest of the code, it assumes the;\
+ * loader kernel and the booted one are exactly identical;\
+ */;\
+ lis r11,cpu_reg_save_area@h;\
+ ori r11,r11,cpu_reg_save_area@l;\
+ tophys(r11,r11);\
+ /* Restore various CPU config stuffs */;\
+ /* bl __restore_cpu_setup; */\
+ /* Restore the BATs, and SDR1. Then we can turn on the MMU. ;\
+ * This is a bit hairy as we are running out of those BATs,;\
+ * but first, our code is probably in the icache, and we are;\
+ * writing the same value to the BAT, so that should be fine,;\
+ * though a better solution will have to be found long-term;\
+ */;\
+ lwz r4,SL_SDR1(r11);\
+ mtsdr1 r4;\
+ lwz r4,SL_SPRG0(r11);\
+ mtsprg 0,r4;\
+ lwz r4,SL_SPRG0+4(r11);\
+ mtsprg 1,r4;\
+ lwz r4,SL_SPRG0+8(r11);\
+ mtsprg 2,r4;\
+ lwz r4,SL_SPRG0+12(r11);\
+ mtsprg 3,r4;\
+;\
+/* lwz r4,SL_DBAT0(r11);\
+ mtdbatu 0,r4;\
+ lwz r4,SL_DBAT0+4(r11);\
+ mtdbatl 0,r4;\
+ lwz r4,SL_DBAT1(r11);\
+ mtdbatu 1,r4;\
+ lwz r4,SL_DBAT1+4(r11);\
+ mtdbatl 1,r4;\
+ lwz r4,SL_DBAT2(r11);\
+ mtdbatu 2,r4;\
+ lwz r4,SL_DBAT2+4(r11);\
+ mtdbatl 2,r4;\
+ lwz r4,SL_DBAT3(r11);\
+ mtdbatu 3,r4;\
+ lwz r4,SL_DBAT3+4(r11);\
+ mtdbatl 3,r4;\
+ lwz r4,SL_IBAT0(r11);\
+ mtibatu 0,r4;\
+ lwz r4,SL_IBAT0+4(r11);\
+ mtibatl 0,r4;\
+ lwz r4,SL_IBAT1(r11);\
+ mtibatu 1,r4;\
+ lwz r4,SL_IBAT1+4(r11);\
+ mtibatl 1,r4;\
+ lwz r4,SL_IBAT2(r11);\
+ mtibatu 2,r4;\
+ lwz r4,SL_IBAT2+4(r11);\
+ mtibatl 2,r4;\
+ lwz r4,SL_IBAT3(r11);\
+ mtibatu 3,r4;\
+ lwz r4,SL_IBAT3+4(r11);\
+ mtibatl 3,r4;\
+; */ \
+BEGIN_FTR_SECTION;\
+ li r4,0;\
+ mtspr SPRN_DBAT4U,r4;\
+ mtspr SPRN_DBAT4L,r4;\
+ mtspr SPRN_DBAT5U,r4;\
+ mtspr SPRN_DBAT5L,r4;\
+ mtspr SPRN_DBAT6U,r4;\
+ mtspr SPRN_DBAT6L,r4;\
+ mtspr SPRN_DBAT7U,r4;\
+ mtspr SPRN_DBAT7L,r4;\
+ mtspr SPRN_IBAT4U,r4;\
+ mtspr SPRN_IBAT4L,r4;\
+ mtspr SPRN_IBAT5U,r4;\
+ mtspr SPRN_IBAT5L,r4;\
+ mtspr SPRN_IBAT6U,r4;\
+ mtspr SPRN_IBAT6L,r4;\
+ mtspr SPRN_IBAT7U,r4;\
+ mtspr SPRN_IBAT7L,r4;\
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS);\
+;\
+ /* Flush all TLBs */;\
+ lis r4,0x1000;\
+1: addic. r4,r4,-0x1000;\
+ tlbie r4;\
+ blt 1b;\
+ sync;\
+;\
+ /* restore the MSR and turn on the MMU */;\
+ lwz r3,SL_MSR(r11);\
+ bl turn_on_mmu;\
+ tovirt(r11,r11);\
+;\
+ /* Restore TB */;\
+ li r3,0;\
+ mttbl r3;\
+ lwz r3,SL_TB(r11);\
+ lwz r4,SL_TB+4(r11);\
+ mttbu r3;\
+ mttbl r4;\
+; \
+ lwz r0,SL_CR(r11);\
+ mtcr r0;\
+ lwz r2,SL_R2(r11);\
+ lmw r12,SL_R12(r11);\
+ lwz r1,SL_SP(r11);\
+ lwz r4,SL_LR(r11)
+
+#define CPU_REG_MEM_RESTORE_END \
+ /* Restore LR from the save area */ ; \
+ lis r11,cpu_reg_save_area@h ; \
+ ori r11,r11,cpu_reg_save_area@l ; \
+ lwz r0,SL_CR(r11) ; \
+ mtcr r0 ; \
+ lwz r2,SL_R2(r11) ; \
+ lmw r12,SL_R12(r11) ; \
+ lwz r1,SL_SP(r11)
+
+#define CPU_REG_TURN_ON_MMU \
+/* FIXME:This construct is actually not useful since we don't shut ; \
+ * down the instruction MMU, we could just flip back MSR-DR on. ; \
+ */ ; \
+turn_on_mmu: ; \
+ mflr r4 ; \
+ mtsrr0 r4 ; \
+ mtsrr1 r3 ; \
+ sync ; \
+ isync ; \
+ rfi
+
+#define CPU_REG_STACK_SAVE \
+ mflr r0 ; \
+ stw r0,4(r1) ; \
+ stwu r1,-SL_SIZE(r1) ; \
+ mfcr r0 ; \
+ stw r0,SL_CR(r1) ; \
+ stw r2,SL_R2(r1) ; \
+ stmw r12,SL_R12(r1) ; \
+ /* Save SPRGs */ ; \
+ mfsprg r4,0 ; \
+ stw r4,SL_SPRG0(r1) ; \
+ mfsprg r4,1 ; \
+ stw r4,SL_SPRG0+4(r1) ; \
+ mfsprg r4,2 ; \
+ stw r4,SL_SPRG0+8(r1) ; \
+ mfsprg r4,3 ; \
+ stw r4,SL_SPRG0+12(r1)
+
+#define CPU_REG_STACK_RESTORE \
+ lwz r4,SL_SPRG0(r1) ; \
+ mtsprg 0,r4 ; \
+ lwz r4,SL_SPRG0+4(r1) ; \
+ mtsprg 1,r4 ; \
+ lwz r4,SL_SPRG0+8(r1) ; \
+ mtsprg 2,r4 ; \
+ lwz r4,SL_SPRG0+12(r1) ; \
+ mtsprg 3,r4 ; \
+ lwz r0,SL_CR(r1) ; \
+ mtcr r0 ; \
+ lwz r2,SL_R2(r1) ; \
+ lmw r12,SL_R12(r1) ; \
+ addi r1,r1,SL_SIZE ; \
+ lwz r0,4(r1) ; \
+ mtlr r0 ; \
+ blr
diff -ruN 701-mac-old/arch/ppc/power/Makefile 701-mac-new/arch/ppc/power/Makefile
--- 701-mac-old/arch/ppc/power/Makefile 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/arch/ppc/power/Makefile 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,2 @@
+obj-$(CONFIG_PM) += cpu.o
+obj-$(CONFIG_SOFTWARE_SUSPEND2) += swsusp2-asm.o
diff -ruN 701-mac-old/arch/ppc/power/swsusp2-asm.S 701-mac-new/arch/ppc/power/swsusp2-asm.S
--- 701-mac-old/arch/ppc/power/swsusp2-asm.S 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/arch/ppc/power/swsusp2-asm.S 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,53 @@
+/*
+ * This code base on pmdisk.S by Benjamin Herrenschmidt <[email protected]>
+ *
+ * changed for swsusp2 by Hu Gang <[email protected]>
+ */
+#include <linux/config.h>
+#include <asm/processor.h>
+#include <asm/page.h>
+#include <asm/ppc_asm.h>
+#include <asm/cputable.h>
+#include <asm/cache.h>
+#include <asm/thread_info.h>
+#include <asm/offsets.h>
+#include "cpu_reg.S"
+
+ CPU_REG_MEM_DEFINE
+
+ .section .text
+ .align 5
+_GLOBAL(do_suspend2_lowlevel)
+ CPU_REG_STACK_SAVE
+ cmpwi 0,r3,0
+ bne do_resume
+ bl save_processor_state
+ bl do_suspend2_suspend_1
+ CPU_REG_MEM_SAVE
+ bl do_suspend2_suspend_2
+ CPU_REG_MEM_RESTORE_END
+ CPU_REG_STACK_RESTORE
+
+do_resume:
+ bl save_processor_state
+ bl do_suspend2_resume_1
+
+ /* Stop pending alitvec streams and memory accesses */
+BEGIN_FTR_SECTION
+ DSSALL
+END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
+ sync
+
+ CPU_REG_MEM_DISABLE_MMU
+#include "swsusp2-copyback.S"
+ CPU_REG_MEM_FLUSH_CACHE
+
+ CPU_REG_MEM_RESTORE
+ bl do_suspend2_resume_2
+ bl restore_processor_state
+ CPU_REG_MEM_RESTORE_END
+ CPU_REG_STACK_RESTORE
+
+ CPU_REG_TURN_ON_MMU
+
+ .section .text
diff -ruN 701-mac-old/arch/ppc/power/swsusp2.c 701-mac-new/arch/ppc/power/swsusp2.c
--- 701-mac-old/arch/ppc/power/swsusp2.c 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/arch/ppc/power/swsusp2.c 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,170 @@
+ /*
+ * Copyright 2003 Nigel Cunningham.
+ *
+ * This is the code that the code in swsusp2-asm.S for
+ * copying back the original kernel is based upon. It
+ * was based upon code that is...
+ * Copyright 2001-2002 Pavel Machek <[email protected]>
+ * Based on code
+ * Copyright 2001 Patrick Mochel <[email protected]>
+ * Copyright 2004 Hu Gang <[email protected]
+ * port to PowerPC
+ */
+#include <linux/config.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/poll.h>
+#include <linux/delay.h>
+#include <linux/sysrq.h>
+#include <linux/proc_fs.h>
+#include <linux/irq.h>
+#include <linux/pm.h>
+#include <linux/device.h>
+#include <linux/suspend.h>
+#include <linux/suspend-debug.h>
+#include <linux/suspend-common.h>
+#include <asm/uaccess.h>
+#if 0
+/* Local variables for do_swsusp2_suspend */
+volatile static int state1 __nosavedata = 0;
+volatile static int state2 __nosavedata = 0;
+volatile static int state3 __nosavedata = 0;
+volatile static int loop __nosavedata = 0;
+volatile static struct range *origrange __nosavedata;
+volatile static struct range *copyrange __nosavedata;
+volatile static int origoffset __nosavedata;
+volatile static int copyoffset __nosavedata;
+volatile static unsigned long * origpage __nosavedata;
+volatile static unsigned long * copypage __nosavedata;
+#endif
+
+//volatile static int orig_min_free __nosavedata;
+#ifndef CONFIG_SMP
+//static unsigned long c_loops_per_jiffy_ref __nosavedata = 0;
+//static unsigned long cpu_khz_ref __nosavedata = 0;
+#endif
+
+extern void do_swsusp2_suspend_1(void);
+extern void do_swsusp2_suspend_2(void);
+extern void do_swsusp2_resume_1(void);
+extern void do_swsusp2_resume_2(void);
+extern struct pagedir __nosavedata pagedir_resume;
+
+/*
+ * FIXME: This function should really be written in assembly. Actually
+ * requirement is that it does not touch stack, because %esp will be
+ * wrong during resume before restore_processor_context(). Check
+ * assembly if you modify this.
+ */
+#if 0
+static inline void pre_copyback(void)
+{
+#ifdef CONFIG_PREEMPT
+ /*
+ * Preempt disabled in kernel we're about to restore.
+ * Make sure we match state now.
+ */
+ preempt_disable();
+ PRINTPREEMPTCOUNT("Prior to copying old kernel back.");
+#endif
+
+ state1 = swsusp_action;
+ state2 = swsusp_debug_state;
+ state3 = console_loglevel;
+
+#ifndef CONFIG_SMP
+ //c_loops_per_jiffy_ref = cpu_data->loops_per_jiffy;
+ //cpu_khz_ref = cpu_khz;
+#endif
+}
+static inline void post_copyback(void)
+{
+#ifndef CONFIG_SMP
+ //cpu_data->loops_per_jiffy = c_loops_per_jiffy_ref;
+ //loops_per_jiffy = c_loops_per_jiffy_ref;
+ //cpu_khz = cpu_khz_ref;
+#endif
+ swsusp_action = state1;
+ swsusp_debug_state = state2;
+ console_loglevel = state3;
+ //swsusp_min_free = orig_min_free;
+
+}
+#endif
+static inline void do_swsusp2_copyback(void)
+{
+ /* PowerPC has a lots register, use local register is possible */
+ register int origoffset, copyoffset;
+ register unsigned long * origpage, * copypage;
+ register struct range *origrange, *copyrange;
+// register int pagesize;
+
+// pre_copyback();
+
+ origrange = pagedir_resume.origranges.first;
+// pagesize = pagedir_resume.pageset_size;
+// printk("%d\n", pagesize);
+ origoffset = origrange->minimum;
+ origpage = (unsigned long *) (page_address(mem_map + origoffset));
+
+ copyrange = pagedir_resume.destranges.first;
+ copyoffset = copyrange->minimum;
+ copypage = (unsigned long *) (page_address(mem_map + copyoffset));
+ //orig_min_free = swsusp_min_free;
+
+ while (origrange) {
+ register int loop;
+ for (loop = 0; loop < (PAGE_SIZE / sizeof(unsigned long)); loop++)
+ *(origpage + loop) = *(copypage + loop);
+
+ if (origoffset < origrange->maximum) {
+ origoffset++;
+ origpage += (PAGE_SIZE / sizeof(unsigned long));
+ } else {
+ origrange = origrange->next;
+ if (origrange) {
+ origoffset = origrange->minimum;
+ origpage = (unsigned long *) (page_address(mem_map + origoffset));
+ }
+ }
+
+ if (copyoffset < copyrange->maximum) {
+ copyoffset++;
+ copypage += (PAGE_SIZE / sizeof(unsigned long));
+ } else {
+ copyrange = copyrange->next;
+ if (copyrange) {
+ copyoffset = copyrange->minimum;
+ copypage = (unsigned long *) (page_address(mem_map + copyoffset));
+ }
+ }
+ }
+
+/* Ahah, we now run with our old stack, and with registers copied from
+ suspend time */
+
+// post_copyback();
+}
+
+void do_swsusp_lowlevel(int resume)
+{
+ if (!resume) {
+ do_swsusp2_suspend_1();
+ save_processor_state();
+ /* saving stack */
+
+ do_swsusp2_suspend_2();
+ return;
+ }
+
+ /* setup swapper_pg_dir in x86 */
+
+ do_swsusp2_resume_1();
+ do_swsusp2_copyback();
+ /* setup segment register */
+ restore_processor_state();
+ do_swsusp2_resume_2();
+}
diff -ruN 701-mac-old/arch/ppc/power/swsusp2-copyback.S 701-mac-new/arch/ppc/power/swsusp2-copyback.S
--- 701-mac-old/arch/ppc/power/swsusp2-copyback.S 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/arch/ppc/power/swsusp2-copyback.S 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,73 @@
+#define PAGE_TO_POINTER(in, out, p) \
+ lwz out,0(in) ; \
+ slwi r9,out,2 ; \
+ add r9,r9,out ; \
+ slwi r9,r9,3 ; \
+ mullw r9,r9,r4 ; \
+ slwi r9,r9,9 ; \
+ addis p,r9,0xc000 ; \
+ tophys(p,p)
+
+ .section ".text"
+swsusp2_copyback:
+ lis r20,pagedir_resume@ha /* can't ture this is right FIXME */
+ addi r20,r20,pagedir_resume@l
+ tophys(r20,r20)
+#if 0
+ lwz r4,4(r20)
+ twi 31,r0,0 /* triger trap */
+#endif
+ lis r4,0xcccc /* FIXME */
+ ori r4,r4,52429
+
+ lwz r6,12(r20) /* r6 is origranges.first */
+ cmpwi r6,0
+ beq- swsusp2_end_copyback
+
+ tophys(r6,r6)
+ PAGE_TO_POINTER(r6,r8,r10)
+
+ lwz r5,56(r20) /* r5 is copyranges.first */
+ tophys(r5,r5)
+ PAGE_TO_POINTER(r5,r7,r11)
+
+swsusp2_copy_one_page:
+ li r0,1024 /* r9 is loop */
+ mtctr r0 /* prepare for branch */
+ li r9,0
+swsusp2_copy_data:
+ lwzx r0,r9,r11
+ stwx r0,r9,r10
+ addi r9,r9,4
+
+ bdnz swsusp2_copy_data
+
+ lwz r0,4(r6) /* r0 is maximum */
+ cmplw r8,r0
+ bge- next_orig
+ addi r8,r8,1
+ addi r10,r10,4096
+ b end_orig
+next_orig:
+ lwz r6,8(r6) /* r6 origrange */
+ cmpwi r6,0
+ beq- end_orig
+ tophys(r6,r6)
+ PAGE_TO_POINTER(r6,r8,r10)
+end_orig:
+ lwz r0,4(r5) /* r0 is maximum */
+ cmplw r7,r0
+ bge- next_copy
+ addi r7,r7,1
+ addi r11,r11,4096
+ b end_copy
+next_copy:
+ lwz r5,8(r5) /* r5 is copypage */
+ cmpwi r5,0
+ beq- end_copy
+ tophys(r5,r5)
+ PAGE_TO_POINTER(r5,r7,r11)
+end_copy:
+ cmpwi 0,r6,0
+ bc r4,r2,swsusp2_copy_one_page
+swsusp2_end_copyback:
diff -ruN 701-mac-old/drivers/macintosh/Kconfig 701-mac-new/drivers/macintosh/Kconfig
--- 701-mac-old/drivers/macintosh/Kconfig 2004-11-03 21:53:37.000000000 +1100
+++ 701-mac-new/drivers/macintosh/Kconfig 2004-11-04 16:27:40.000000000 +1100
@@ -187,4 +187,8 @@
tristate "Support for ANS LCD display"
depends on ADB_CUDA && PPC_PMAC

+config SOFTWARE_REPLACE_SLEEP
+ bool "Using Software suspend replace broken sleep function"
+ depends on SOFTWARE_SUSPEND2
+
endmenu
diff -ruN 701-mac-old/drivers/macintosh/via-pmu.c 701-mac-new/drivers/macintosh/via-pmu.c
--- 701-mac-old/drivers/macintosh/via-pmu.c 2004-11-03 21:55:00.000000000 +1100
+++ 701-mac-new/drivers/macintosh/via-pmu.c 2004-11-04 16:27:40.000000000 +1100
@@ -2891,6 +2891,13 @@
return -EACCES;
if (sleep_in_progress)
return -EBUSY;
+#ifdef CONFIG_SOFTWARE_REPLACE_SLEEP
+ {
+ extern void software_suspend_pending(void);
+ software_suspend_pending();
+ return (0);
+ }
+#endif
sleep_in_progress = 1;
switch (pmu_kind) {
case PMU_OHARE_BASED:
diff -ruN 701-mac-old/include/asm-ppc/suspend.h 701-mac-new/include/asm-ppc/suspend.h
--- 701-mac-old/include/asm-ppc/suspend.h 1970-01-01 10:00:00.000000000 +1000
+++ 701-mac-new/include/asm-ppc/suspend.h 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,14 @@
+#ifndef _PPC_SUSPEND_H
+#define _PPC_SUSPEND_H
+
+static inline void flush_tlb_all(void)
+{
+ /* Flush all TLBs */
+ __asm__ __volatile__("lis 4, 0x1000");
+ __asm__ __volatile__("1: addic. 4,4,-0x1000");
+ __asm__ __volatile__("tlbie 4");
+ __asm__ __volatile__("blt 1b");
+ __asm__ __volatile__("sync");
+}
+
+#endif


2004-11-24 13:22:11

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

Here we add simple hooks so that the user can interact with suspend
while it is running. (Hmm. The serial console condition could be
simplified :>). The hooks allow you to do such things as:

- cancel suspending
- change the amount of detail of debugging info shown
- change what debugging info is shown
- pause the process
- single step
- toggle rebooting instead of powering down

diff -ruN 702-keyboard-and-8250-hooks-old/drivers/char/keyboard.c 702-keyboard-and-8250-hooks-new/drivers/char/keyboard.c
--- 702-keyboard-and-8250-hooks-old/drivers/char/keyboard.c 2004-11-24 18:50:00.959995424 +1100
+++ 702-keyboard-and-8250-hooks-new/drivers/char/keyboard.c 2004-11-24 18:03:32.040404608 +1100
@@ -33,6 +33,7 @@
#include <linux/string.h>
#include <linux/random.h>
#include <linux/init.h>
+#include <linux/suspend.h>
#include <linux/slab.h>

#include <linux/kbd_kern.h>
@@ -1091,6 +1092,10 @@
return;
}
#endif
+ if (down && (test_suspend_state(SUSPEND_RUNNING))) {
+ suspend_handle_keypress(keycode, SUSPEND_KEY_KEYBOARD);
+ return;
+ }
#if defined(CONFIG_SPARC32) || defined(CONFIG_SPARC64)
if (keycode == KEY_A && sparc_l1_a_state) {
sparc_l1_a_state = 0;
diff -ruN 702-keyboard-and-8250-hooks-old/drivers/serial/8250.c 702-keyboard-and-8250-hooks-new/drivers/serial/8250.c
--- 702-keyboard-and-8250-hooks-old/drivers/serial/8250.c 2004-11-24 18:50:00.962994968 +1100
+++ 702-keyboard-and-8250-hooks-new/drivers/serial/8250.c 2004-11-24 18:49:53.882071432 +1100
@@ -39,6 +39,7 @@
#include <linux/serial_core.h>
#include <linux/serial.h>
#include <linux/serial_8250.h>
+#include <linux/suspend.h>

#include <asm/io.h>
#include <asm/irq.h>
@@ -1068,6 +1069,15 @@
}
if (uart_handle_sysrq_char(&up->port, ch, regs))
goto ignore_char;
+
+#if defined(CONFIG_SERIAL_CORE_CONSOLE) && defined(CONFIG_SOFTWARE_SUSPEND2)
+ if (test_suspend_state(SUSPEND_SANITY_CHECK_PROMPT) ||
+ test_suspend_state(SUSPEND_RUNNING)) {
+ suspend_handle_keypress(ch, SUSPEND_KEY_SERIAL);
+ goto ignore_char;
+ }
+#endif
+
if ((lsr & up->port.ignore_status_mask) == 0) {
tty_insert_flip_char(tty, ch, flag);
}


2004-11-24 13:27:55

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 19/51: Remove MTRR sysdev support.

This patch removes sysdev support for MTRRs (potential SMP hang and
shouldn't be done with interrupts done anyway). Instead, we save and
restore MTRRs when entering and exiting the processor freezers (ie when
saving the registers & context for each CPU via an SMP call).

diff -ruN 510-mtrr-remove-sysdev-old/arch/i386/kernel/cpu/mtrr/main.c 510-mtrr-remove-sysdev-new/arch/i386/kernel/cpu/mtrr/main.c
--- 510-mtrr-remove-sysdev-old/arch/i386/kernel/cpu/mtrr/main.c 2004-11-03 21:51:13.000000000 +1100
+++ 510-mtrr-remove-sysdev-new/arch/i386/kernel/cpu/mtrr/main.c 2004-11-04 16:27:40.000000000 +1100
@@ -167,7 +167,6 @@
atomic_dec(&data->count);
local_irq_restore(flags);
}
-
#endif

/**
@@ -564,7 +563,7 @@

static struct mtrr_value * mtrr_state;

-static int mtrr_save(struct sys_device * sysdev, u32 state)
+int mtrr_save(void)
{
int i;
int size = num_var_ranges * sizeof(struct mtrr_value);
@@ -584,28 +583,27 @@
return 0;
}

-static int mtrr_restore(struct sys_device * sysdev)
+/* Restore mtrrs on this CPU only.
+ * Done with interrupts disabled via __smp_lowlevel_suspend
+ */
+int mtrr_restore_one_cpu(void)
{
int i;

for (i = 0; i < num_var_ranges; i++) {
if (mtrr_state[i].lsize)
- set_mtrr(i,
+ mtrr_if->set(i,
mtrr_state[i].lbase,
mtrr_state[i].lsize,
mtrr_state[i].ltype);
}
- kfree(mtrr_state);
return 0;
}

-
-
-static struct sysdev_driver mtrr_sysdev_driver = {
- .suspend = mtrr_save,
- .resume = mtrr_restore,
-};
-
+void mtrr_restore_finish(void)
+{
+ kfree(mtrr_state);
+}

/**
* mtrr_init - initialize mtrrs on the boot CPU
@@ -692,8 +690,7 @@
init_table();
init_other_cpus();

- return sysdev_driver_register(&cpu_sysdev_class,
- &mtrr_sysdev_driver);
+ return 0;
}
return -ENXIO;
}


2004-11-24 13:27:57

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 30/51: Enable slab alloc fallback to suspend memory pool

When we are preparing the image and have eaten all available memory, but
before page allocations have been switched over to the memory pool, we
sometimes need to allocate memory from slab for the image metadata (swap
header information). This code allows the slab allocator to fall back to
the memory pool in such circumstances. There is some extra debugging
code there at the moment while I seek to diagnose intermittent slab
corruption (not sure if it's suspend related).

diff -ruN 817-enable-slab-alloc-fallback-to-suspend-memory-pool-old/mm/slab.c 817-enable-slab-alloc-fallback-to-suspend-memory-pool-new/mm/slab.c
--- 817-enable-slab-alloc-fallback-to-suspend-memory-pool-old/mm/slab.c 2004-11-24 15:48:55.066733152 +1100
+++ 817-enable-slab-alloc-fallback-to-suspend-memory-pool-new/mm/slab.c 2004-11-23 07:11:42.000000000 +1100
@@ -874,14 +874,30 @@
flags |= cachep->gfpflags;
if (likely(nodeid == -1)) {
addr = (void*)__get_free_pages(flags, cachep->gfporder);
+ if (unlikely((!addr) && (current->pid == suspend_task) &&
+ test_suspend_state(SUSPEND_SLAB_ALLOC_FALLBACK))) {
+ addr = (void *) suspend2_get_grabbed_pages(0);
+ printk("!! Slab addition satisfied via fallback code.\n");
+ }
if (!addr)
return NULL;
+ if (unlikely(test_suspend_state(SUSPEND_RUNNING)))
+ printk("Order %d allocation %p added to slab %p.\n",
+ cachep->gfporder, addr, cachep);
page = virt_to_page(addr);
} else {
page = alloc_pages_node(nodeid, flags, cachep->gfporder);
+ if (unlikely((!page) && (current->pid == suspend_task) &&
+ test_suspend_state(SUSPEND_SLAB_ALLOC_FALLBACK))) {
+ page = virt_to_page(suspend2_get_grabbed_pages(0));
+ printk("!! Slab addition satisfied via fallback code.\n");
+ }
if (!page)
return NULL;
addr = page_address(page);
+ if (unlikely(test_suspend_state(SUSPEND_RUNNING)))
+ printk("Order %d allocation %p added to slab %p.\n",
+ cachep->gfporder, addr, cachep);
}

i = (1 << cachep->gfporder);


2004-11-24 13:33:44

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 35/51: Code always built in to the kernel.

These are the files containing code that is always built in to the
kernel when suspend support is compiled in.

- /proc/software_suspend simplified interface. This is to save me
repeating the same code everywhere. Perhaps there are similar routines
that others have written and I've missed. If so, feel free to point me
to them (haven't looked much).
- basic support for complaining if the core isn't loaded
- support for loading the core and interfacing with it
- __init routines
- some routines to save exporting variables

diff -ruN 824-builtin-old/kernel/power/proc.c 824-builtin-new/kernel/power/proc.c
--- 824-builtin-old/kernel/power/proc.c 1970-01-01 10:00:00.000000000 +1000
+++ 824-builtin-new/kernel/power/proc.c 2004-11-05 21:31:49.000000000 +1100
@@ -0,0 +1,359 @@
+/*
+ * /kernel/power/proc.c
+ *
+ * Copyright (C) 2002-2003 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains support for proc entries for tuning Software Suspend.
+ *
+ * We have a generic handler that deals with the most common cases, and
+ * hooks for special handlers to use.
+ *
+ * Versions:
+ * 1: /proc/sys/kernel/suspend the only tuning interface
+ * 2: Initial version of this file
+ * 3: Removed kernel debugger parameter.
+ * Added checkpage parameter (for checking checksum of a page over time).
+ * 4: Added entry for maximum granularity in splash screen progress bar.
+ * (Progress bar is slow, but the right setting will vary with disk &
+ * processor speed and the user's tastes).
+ * 5: Added enable_escape to control ability to cancel aborting by pressing
+ * ESC key.
+ * 6: Removed checksumming and checkpage parameter. Made all debugging proc
+ * entries dependant upon debugging being compiled in.
+ * Meaning of some flags also changed in this version.
+ * 7: Added header_locations entry to simplify getting the resume= parameter for
+ * swapfiles easy and swapfile entry for automatically doing swapon/off from
+ * swapfiles as well as partitions.
+ * 8: Added option for marking process pages as pageset 2 (processes_pageset2).
+ * 9: Added option for keep image mode.
+ * Enumeration patch from Michael Frank applied.
+ * 10: Various corrections to when options are disabled/enabled;
+ * Added option for specifying expected compression.
+ * 11: Added option for freezer testing. Debug only.
+ * 12: Removed test entries no_async_[read|write], processes_pageset2 and
+ * NoPageset2.
+ * 13: Make default_console_level available when debugging disabled, but limited
+ * to 0 or 1.
+ * 14: Rewrite to allow for dynamic registration of proc entries and smooth the
+ * transition to kobjects in 2.6.
+ * 15: Add setting resume2 parameter without rebooting (still need to run lilo
+ * though!). Add support for generic string handling and switch resume2 to use
+ * it.
+ */
+
+#define SUSPEND_PROC_C
+
+static int suspend_proc_version = 15;
+static int suspend_proc_initialised = 0;
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <asm/uaccess.h>
+
+#include "suspend.h"
+#include "proc.h"
+
+static struct list_head suspend_proc_entries;
+static struct proc_dir_entry *suspend_dir;
+
+extern char resume2_file[256]; /* For resume= kernel option */
+
+/*
+ * proc_try_suspend
+ *
+ * This routine initiates a suspend cycle when /proc/software_suspend/do_suspend is
+ * written to. The value written is ignored.
+ */
+
+static int proc_try_suspend(struct file *file, const char *buffer,
+ unsigned long count, void *data)
+{
+ suspend_try_suspend();
+ return count;
+}
+
+/*
+ * proc_try_resume
+ *
+ * This routine initiates a suspend cycle when /proc/software_suspend/do_resume is
+ * written to. The value written is ignored.
+ */
+
+static int proc_try_resume(struct file *file, const char *buffer,
+ unsigned long count, void *data)
+{
+ software_suspend_try_resume();
+ return count;
+}
+
+/*
+ * generic_read_proc
+ *
+ * Generic handling for reading the contents of bits, integers,
+ * unsigned longs and strings.
+ */
+static int generic_read_proc(char * page, char ** start, off_t off, int count,
+ int *eof, void *data)
+{
+ int len = 0;
+ struct suspend_proc_data * proc_data = (struct suspend_proc_data *) data;
+
+ switch (proc_data->type) {
+ case SUSPEND_PROC_DATA_CUSTOM:
+ printk("Error! /proc/suspend/%s marked as having custom"
+ " routines, but the generic read routine has"
+ " been invoked.\n",
+ proc_data->filename);
+ break;
+ case SUSPEND_PROC_DATA_BIT:
+ len = sprintf(page, "%d\n",
+ -test_bit(proc_data->data.bit.bit,
+ proc_data->data.bit.bit_vector));
+ break;
+ case SUSPEND_PROC_DATA_INTEGER:
+ {
+ int * variable = proc_data->data.integer.variable;
+ len = sprintf(page, "%d\n", *variable);
+ break;
+ }
+ case SUSPEND_PROC_DATA_UL:
+ {
+ long * variable = proc_data->data.ul.variable;
+ len = sprintf(page, "%lu\n", *variable);
+ break;
+ }
+ case SUSPEND_PROC_DATA_STRING:
+ {
+ char * variable = proc_data->data.string.variable;
+ len = sprintf(page, "%s\n", variable);
+ break;
+ }
+ }
+ /* Side effect routine? */
+ if (proc_data->read_proc)
+ proc_data->read_proc();
+ *eof = 1;
+ return len;
+}
+/*
+ * generic_write_proc
+ *
+ * Generic routine for handling writing to files representing
+ * bits, integers and unsigned longs.
+ */
+
+static int generic_write_proc(struct file *file, const char * buffer,
+ unsigned long count, void * data)
+{
+ struct suspend_proc_data * proc_data = (struct suspend_proc_data *) data;
+ char * my_buf = (char *) get_zeroed_page(GFP_ATOMIC);
+ int result = count;
+
+ if (!my_buf)
+ return -ENOMEM;
+
+ if (count > PAGE_SIZE)
+ count = PAGE_SIZE;
+
+ if (copy_from_user(my_buf, buffer, count))
+ return -EFAULT;
+
+ my_buf[count] = 0;
+
+ switch (proc_data->type) {
+ case SUSPEND_PROC_DATA_CUSTOM:
+ printk("Error! /proc/suspend/%s marked as having custom"
+ " routines, but the generic write routine has"
+ " been invoked.\n",
+ proc_data->filename);
+ break;
+ case SUSPEND_PROC_DATA_BIT:
+ {
+ int value = simple_strtoul(my_buf, NULL, 0);
+ if (value)
+ set_bit(proc_data->data.bit.bit,
+ (proc_data->data.bit.bit_vector));
+ else
+ clear_bit(proc_data->data.bit.bit,
+ (proc_data->data.bit.bit_vector));
+ }
+ break;
+ case SUSPEND_PROC_DATA_INTEGER:
+ {
+ int * variable = proc_data->data.integer.variable;
+ int minimum = proc_data->data.integer.minimum;
+ int maximum = proc_data->data.integer.maximum;
+ *variable = simple_strtol(my_buf, NULL, 0);
+ if (((*variable) < minimum))
+ *variable = minimum;
+
+ if (((*variable) > maximum))
+ *variable = maximum;
+ break;
+ }
+ case SUSPEND_PROC_DATA_UL:
+ {
+ unsigned long * variable = proc_data->data.ul.variable;
+ unsigned long minimum = proc_data->data.ul.minimum;
+ unsigned long maximum = proc_data->data.ul.maximum;
+ *variable = simple_strtoul(my_buf, NULL, 0);
+
+ if (minimum && ((*variable) < minimum))
+ *variable = minimum;
+
+ if (maximum && ((*variable) > maximum))
+ *variable = maximum;
+ break;
+ }
+ break;
+ case SUSPEND_PROC_DATA_STRING:
+ {
+ int copy_len =
+ (count >
+ proc_data->data.string.max_length) ?
+ proc_data->data.string.max_length :
+ count;
+ char * variable =
+ proc_data->data.string.variable;
+ strncpy(variable, my_buf, copy_len);
+ if ((copy_len) &&
+ (my_buf[copy_len - 1] == '\n'))
+ variable[count - 1] = 0;
+ variable[count] = 0;
+ }
+ break;
+ }
+ free_pages((unsigned long) my_buf, 0);
+ /* Side effect routine? */
+ if (proc_data->write_proc) {
+ int routine_result = proc_data->write_proc();
+ if (routine_result < 0)
+ result = routine_result;
+ }
+ return result;
+}
+
+/*
+ * Non-plugin proc entries.
+ *
+ * This array contains entries that are automatically registered at
+ * boot. Plugins and the console code register their own entries separately.
+ */
+
+static struct suspend_proc_data proc_params[] = {
+ { .filename = "do_suspend",
+ .permissions = PROC_WRITEONLY,
+ .type = SUSPEND_PROC_DATA_CUSTOM,
+ .data = {
+ .special = {
+ .write_proc = proc_try_suspend
+ }
+ }
+ },
+
+ { .filename = "do_resume",
+ .permissions = PROC_WRITEONLY,
+ .type = SUSPEND_PROC_DATA_CUSTOM,
+ .data = {
+ .special = {
+ .write_proc = proc_try_resume
+ }
+ }
+ },
+
+
+ { .filename = "interface_version",
+ .permissions = PROC_READONLY,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &suspend_proc_version,
+ }
+ }
+ },
+};
+
+/*
+ * suspend_initialise_proc
+ *
+ * Initialise the /proc/suspend tree.
+ *
+ */
+
+static void suspend_initialise_proc(void)
+{
+ int i;
+ int numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ if (suspend_proc_initialised)
+ return;
+
+ suspend_dir = proc_mkdir("software_suspend", NULL);
+
+ BUG_ON(!suspend_dir);
+
+ INIT_LIST_HEAD(&suspend_proc_entries);
+
+ suspend_proc_initialised = 1;
+
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&proc_params[i]);
+}
+
+/*
+ * suspend_register_procfile
+ *
+ * Helper for registering a new /proc/suspend entry.
+ */
+
+struct proc_dir_entry * suspend_register_procfile(
+ struct suspend_proc_data * suspend_proc_data)
+{
+ struct proc_dir_entry * new_entry;
+
+ if (!suspend_proc_initialised)
+ suspend_initialise_proc();
+
+ new_entry = create_proc_entry(
+ suspend_proc_data->filename,
+ suspend_proc_data->permissions,
+ suspend_dir);
+ if (new_entry) {
+ list_add_tail(&suspend_proc_data->proc_data_list, &suspend_proc_entries);
+ if (suspend_proc_data->type) {
+ new_entry->read_proc = generic_read_proc;
+ new_entry->write_proc = generic_write_proc;
+ } else {
+ new_entry->read_proc = suspend_proc_data->data.special.read_proc;
+ new_entry->write_proc = suspend_proc_data->data.special.write_proc;
+ }
+ new_entry->data = suspend_proc_data;
+ } else {
+ printk("Error! create_proc_entry returned NULL.\n");
+ INIT_LIST_HEAD(&suspend_proc_data->proc_data_list);
+ }
+ return new_entry;
+}
+
+/*
+ * suspend_unregister_procfile
+ *
+ * Helper for removing unwanted /proc/suspend entries.
+ *
+ */
+void suspend_unregister_procfile(struct suspend_proc_data * suspend_proc_data)
+{
+ if (list_empty(&suspend_proc_data->proc_data_list))
+ return;
+
+ remove_proc_entry(
+ suspend_proc_data->filename,
+ suspend_dir);
+ list_del(&suspend_proc_data->proc_data_list);
+}
+
+EXPORT_SYMBOL(suspend_register_procfile);
+EXPORT_SYMBOL(suspend_unregister_procfile);
diff -ruN 824-builtin-old/kernel/power/suspend_builtin.c 824-builtin-new/kernel/power/suspend_builtin.c
--- 824-builtin-old/kernel/power/suspend_builtin.c 1970-01-01 10:00:00.000000000 +1000
+++ 824-builtin-new/kernel/power/suspend_builtin.c 2004-11-18 08:21:41.000000000 +1100
@@ -0,0 +1,541 @@
+/*
+ * kernel/power/suspend2-builtin.c
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains the functions for suspend2 that are built into the kernel even if
+ * suspend is configured as modules.
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/reboot.h>
+#include <asm/highmem.h>
+#include <asm/uaccess.h>
+
+#include "suspend.h"
+/*
+ *--------------------- Variables ---------------------------
+ *
+ * The following are used by the arch specific low level routines
+ * and only needed if suspend2 is compiled in. Other variables,
+ * used by the freezer even if suspend2 is not compiled in are
+ * found in process.c
+ */
+volatile int suspend_io_time[2][2];
+struct pagedir __nosavedata pagedir_resume;
+struct pagedir pagedir1 = { 1, 0, 0}, pagedir2 = {2, 0, 0};
+static unsigned long avenrun_save[3];
+
+char suspend_print_buf[1024];
+EXPORT_SYMBOL(suspend_print_buf);
+
+/* Suspend2 variables used by built-in routines. */
+unsigned int nr_suspends = 0;
+int suspend_act_used = 0;
+int suspend_lvl_used = 0;
+int suspend_dbg_used = 0;
+int suspend_default_console_level = 0;
+
+/*
+ * For resume2= kernel option. It's pointless to compile
+ * suspend2 without any writers, but compilation shouldn't
+ * fail if you do.
+ */
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEFAULT_RESUME2
+char resume2_file[256] = CONFIG_SOFTWARE_SUSPEND_DEFAULT_RESUME2;
+#else
+char resume2_file[256];
+#endif
+
+void (* exclusive_handler) (int) = NULL;
+
+/* --------------- Basic user interface functions ---------------
+ *
+ * These need to be available even if none of the core is
+ * loaded.
+ */
+
+DECLARE_WAIT_QUEUE_HEAD(suspend_wait_for_key);
+static int last_key;
+
+int suspend_wait_for_keypress(void)
+{
+ interruptible_sleep_on(&suspend_wait_for_key);
+ return last_key;
+}
+
+/*
+ * Basic keypress handler for suspend. This is extensible
+ * via the user interface modules.
+ */
+
+/* For simplicity, we convert keyboard key codes to ascii,
+ * except in the case of function keys, which are mapped
+ * to 1-12. We can then use the same case statement for
+ * serial keyboards (and from a serial keyboard, you can
+ * press Control-A..L to toggle sections.
+ */
+static unsigned int kbd_keytable[] = {
+ 0, 27, 49, 50, 51, 52, 53, 54, 55, 56,
+ 57, 48, 0, 0, 0, 0, 0, 0, 0, 114,
+ 116, 0, 0, 0, 0, 112, 0, 0, 0, 0,
+ 0, 115, 0, 0, 0, 0, 0, 0, 108, 0,
+ 0, 122, 0, 0, 0, 0, 99, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 32, 0, 1,
+ 2, 3, 4, 5, 6, 7, 8, 9, 10, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 11, 12, 0,
+};
+
+/*
+ * keycode_to_action
+ *
+ * Convert a keycode (serial or keyboard) into our
+ * internal code (ascii, except for function keys).
+ */
+static unsigned int keycode_to_action(unsigned int keycode, int source)
+{
+ if (source == SUSPEND_KEY_SERIAL) {
+ if (keycode > 64)
+ return (keycode | 32);
+ else
+ return keycode;
+ }
+
+ /* Local keyboard - use table above */
+ if (keycode > sizeof(kbd_keytable))
+ return 0;
+
+ return kbd_keytable[keycode];
+}
+
+/* get_keyboard_exclusive
+ *
+ * Give a plugin exclusive access to the keyboard
+ * if it's not already claimed. Used for an encryption
+ * plugin to get the passphrase, for example.
+ */
+
+int suspend_get_keyboard_exclusive(void (* handler) (int))
+{
+ if (exclusive_handler)
+ return -EBUSY;
+
+ exclusive_handler = handler;
+
+ return 0;
+}
+
+/* release_keyboard_exclusive
+ *
+ * Release the exclusive access to the keyboard.
+ */
+
+void suspend_release_keyboard_exclusive(void)
+{
+ BUG_ON(!exclusive_handler);
+
+ exclusive_handler = NULL;
+}
+
+/*
+ * suspend_handle_keypress
+ *
+ * This is the basic routine for handling keystrokes.
+ * If it doesn't know what to do with a keypress, it
+ * passes it on to the plugins.
+ */
+void suspend_handle_keypress(unsigned int keycode, int source)
+{
+ keycode = keycode_to_action(keycode, source);
+
+ if (test_suspend_state(SUSPEND_SANITY_CHECK_PROMPT)) {
+ if (keycode == 32)
+ wake_up_interruptible(&suspend_wait_for_key);
+ else if (keycode == 99) {
+ set_suspend_state(SUSPEND_CONTINUE_REQ);
+ wake_up_interruptible(&suspend_wait_for_key);
+ }
+ return;
+ }
+
+ /* Do we have a plugin grabbing all the keys? */
+ if (exclusive_handler) {
+ exclusive_handler(keycode);
+ return;
+ }
+
+ last_key = keycode;
+
+ /*
+ * If the message was handled or is the space bar, we
+ * wake our completion handler.
+ */
+ if ((suspend2_core_ops->keypress(keycode)) ||
+ (keycode == 32))
+ wake_up_interruptible(&suspend_wait_for_key);
+}
+
+/* suspend_early_boot_message()
+ * Description: Handle errors early in the process of booting.
+ * The user may press C to continue booting, perhaps
+ * invalidating the image, or space to reboot.
+ * This works from either the serial console or normally
+ * attached keyboard.
+ *
+ * Note that we come in here from init, while the kernel is
+ * locked. If we want to get events from the serial console,
+ * we need to temporarily unlock the kernel.
+ *
+ * Arguments: Char *. Pointer to a string explaining why we're moaning.
+ */
+
+#define say(message, a...) printk(KERN_EMERG message, ##a)
+
+int suspend_early_boot_message(int can_erase_image, char *warning_reason, ...)
+{
+ unsigned long orig_state = get_suspend_state(), continue_req;
+ int orig_loglevel = console_loglevel;
+ va_list args;
+ int printed_len;
+
+ set_suspend_state(SUSPEND_RUNNING);
+
+#ifdef CONFIG_BOOTSPLASH
+/*
+ * So we can make any error visible if necessary. The core might
+ * not be loaded and the bootsplash module might not be loaded
+ * or even have been compiled.
+ */
+ {
+ extern int splash_verbose(void);
+ splash_verbose();
+ }
+#endif
+
+#if defined(CONFIG_VT) || defined(CONFIG_SERIAL_CONSOLE)
+ console_loglevel = 7;
+
+ if (suspend2_core_ops)
+ suspend2_core_ops->early_boot_plugins();
+
+ say("=== Software Suspend ===\n\n");
+ if (warning_reason) {
+ va_start(args, warning_reason);
+ printed_len = vsnprintf(suspend_print_buf,
+ sizeof(suspend_print_buf),
+ warning_reason,
+ args);
+ va_end(args);
+ say("BIG FAT WARNING!! %s\n\n", suspend_print_buf);
+ if (can_erase_image) {
+ say("If you want to use the current suspend image, reboot and try\n");
+ say("again with the same kernel that you suspended from. If you want\n");
+ say("to forget that image, continue and the image will be erased.\n");
+ } else {
+ say("If you continue booting, note that any image WILL NOT BE REMOVED.\n");
+ say("Suspend is unable to do so because the appropriate modules aren't\n");
+ say("loaded. You should manually remove the image to avoid any\n");
+ say("possibility of corrupting your filesystem(s) later.\n");
+ }
+ say("Press SPACE to reboot or C to continue booting with this kernel\n");
+ } else {
+ say("BIG FAT WARNING!!\n\n");
+ say("You have tried to resume from this image before.\n");
+ say("If it failed once, it will probably fail again.\n");
+ say("Would you like to remove the image and boot normally?\n");
+ say("This will be equivalent to entering noresume2 on the\n");
+ say("kernel command line.\n\n");
+ say("Press SPACE to remove the image or C to continue resuming.\n");
+ }
+
+ set_suspend_state(SUSPEND_SANITY_CHECK_PROMPT);
+
+ interruptible_sleep_on(&suspend_wait_for_key);
+
+ continue_req = test_suspend_state(SUSPEND_CONTINUE_REQ);
+
+ if ((warning_reason) && (!continue_req))
+ machine_restart(NULL);
+
+ restore_suspend_state(orig_state);
+ if (continue_req)
+ set_suspend_state(SUSPEND_CONTINUE_REQ);
+
+ console_loglevel = orig_loglevel;
+#endif // CONFIG_VT or CONFIG_SERIAL_CONSOLE
+ return -EPERM;
+}
+#undef say
+
+/* --------------- Registration of the core code -----------------------------
+ *
+ * We don't need to do anything more. The writers determine
+ * whether suspending is disabled and they're not loaded yet.
+ */
+int suspend2_register_core(struct suspend2_core_ops * ops_pointer)
+{
+ if (suspend2_core_ops)
+ return -EBUSY;
+
+ suspend2_core_ops = ops_pointer;
+ return 0;
+}
+
+void suspend2_unregister_core(void)
+{
+ suspend2_core_ops = NULL;
+}
+
+/* ------------ Functions for kickstarting a suspend or resume ----------- */
+
+static int can_suspend(void)
+{
+ if (test_suspend_state(SUSPEND_RUNNING)) {
+ printk(name_suspend "Software suspend is already running.\n");
+ return 0;
+ }
+
+ if (!suspend2_core_ops) {
+ printk(name_suspend "Software suspend is disabled.\n"
+ "You do not appear to have inserted the core module.\n");
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ return 0;
+ }
+
+ if (test_suspend_state(SUSPEND_DISABLED)) {
+ printk(name_suspend "Software suspend is disabled.\n"
+ "This may be because you haven't put something along the "
+ "lines of\n\nresume2=swap:/dev/hda1\n\n"
+ "in lilo.conf or equivalent. (Where /dev/hda1 is your "
+ "swap partition).\n");
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * Check if we have an image and if so try to resume.
+ */
+
+void software_suspend_try_resume(void)
+{
+ mm_segment_t oldfs;
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+
+ clear_suspend_state(SUSPEND_RESUME_NOT_DONE);
+
+ if (!suspend2_core_ops) {
+ /*
+ * We can only get here at boot time. It is really dangerous
+ * to suspend, boot without resuming and then boot with
+ * resuming. Since the core (and writers) isn't loaded, we
+ * can't fix this. We can however print a big fat warning
+ * and give the user the option of rebooting.
+ *
+ * We don't do this if no resume2= parameter was specified.
+ */
+
+ if (resume2_file[0])
+ suspend_early_boot_message(0,
+ "Can't check whether to resume. Suspend's core module isn't loaded.");
+ goto out;
+ }
+ suspend2_core_ops->do_resume();
+out:
+ set_fs(oldfs);
+ clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL);
+ return;
+}
+
+/*
+ * suspend_try_suspend
+ * Functionality : First level of code for software suspend invocations.
+ * Performs the basic checking as to whether suspend is
+ * enabled before invoking the high level routine.
+ * Called From :
+ */
+void suspend_try_suspend(void)
+{
+ mm_segment_t oldfs;
+
+ suspend_result = 0;
+
+ if (!can_suspend())
+ return;
+
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+ suspend2_core_ops->do_suspend();
+ set_fs(oldfs);
+}
+
+/* ------------------- Commandline Parameter Handling -----------------
+ *
+ * Resume setup: obtain the storage device.
+ */
+
+static int __init resume_setup(char *str)
+{
+ if (str == NULL)
+ return 1;
+
+ strncpy(resume2_file, str, 255);
+ return 0;
+}
+
+/*
+ * Allow the user to set the action parameter from lilo, prior to resuming.
+ */
+static int __init suspend_act_setup(char *str)
+{
+ if(str)
+ suspend_action=simple_strtol(str,NULL,0);
+ suspend_act_used = 1;
+ return 0;
+}
+
+/*
+ * Allow the user to set the debug parameter from lilo, prior to resuming.
+ */
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+static int __init suspend_dbg_setup(char *str)
+{
+ if(str)
+ suspend_debug_state=simple_strtol(str,NULL,0);
+ suspend_dbg_used = 1;
+ return 0;
+}
+
+/*
+ * Allow the user to set the debug level parameter from lilo, prior to
+ * resuming.
+ */
+static int __init suspend_lvl_setup(char *str)
+{
+ if(str)
+ console_loglevel =
+ suspend_default_console_level =
+ simple_strtol(str,NULL,0);
+ suspend_lvl_used = 1;
+ clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL);
+ return 0;
+}
+#endif
+
+/*
+ * Allow the user to specify that we should ignore any image found and
+ * invalidate the image if necesssary. This is equivalent to running
+ * the task queue and a sync and then turning off the power. The same
+ * precautions should be taken: fsck if you're not journalled.
+ */
+static int __init noresume_setup(char *str)
+{
+ set_suspend_state(SUSPEND_NORESUME_SPECIFIED);
+ /* Message printed later */
+ return 0;
+}
+
+/* In leiu of exporting variables, some get_ functions for suspend2 */
+unsigned long get_highstart_pfn(void)
+{
+ return highstart_pfn;
+}
+
+/*
+ * Running suspend makes for a very high load average. I'm told that
+ * sendmail and crond check the load average, so in order for them
+ * not to be unnecessarily affected by the operation of suspend, we
+ * store the avenrun values prior to suspending and restore them
+ * at the end of the resume cycle. Thus, the operation of suspend
+ * should be invisible to them. Thanks to Marcus Gaugusch and Bernard
+ * Blackham for noticing the problem and suggesting the solution.
+ */
+void suspend_save_avenrun(void)
+{
+ int i;
+
+ for (i = 0; i < 3; i++)
+ avenrun_save[i] = avenrun[i];
+}
+
+void suspend_restore_avenrun(void)
+{
+ int i;
+
+ for (i = 0; i < 3; i++)
+ avenrun[i] = avenrun_save[i];
+}
+
+static int num_pcp_pages(void)
+{
+ struct zone *zone;
+ int result = 0, i = 0;
+
+ /* PCP lists */
+ for_each_zone(zone) {
+ struct per_cpu_pageset *pset;
+ int cpu;
+
+ if (!zone->present_pages)
+ continue;
+
+ for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ if (!cpu_possible(cpu))
+ continue;
+
+ pset = &zone->pageset[cpu];
+
+ for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
+ struct per_cpu_pages *pcp;
+
+ pcp = &pset->pcp[i];
+ result += pcp->count;
+ }
+ }
+ }
+ return result;
+}
+
+int real_nr_free_pages(void)
+{
+ return nr_free_pages() + num_pcp_pages();
+}
+EXPORT_SYMBOL(real_nr_free_pages);
+
+__setup("resume2=", resume_setup);
+__setup("suspend_act=", suspend_act_setup);
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+__setup("suspend_dbg=", suspend_dbg_setup);
+__setup("suspend_lvl=", suspend_lvl_setup);
+#endif
+__setup("noresume2", noresume_setup);
+
+EXPORT_SYMBOL(get_highstart_pfn);
+EXPORT_SYMBOL(suspend_save_avenrun);
+EXPORT_SYMBOL(suspend_restore_avenrun);
+
+EXPORT_SYMBOL(nr_suspends);
+EXPORT_SYMBOL(pagedir1);
+EXPORT_SYMBOL(pagedir2);
+EXPORT_SYMBOL(suspend2_register_core);
+EXPORT_SYMBOL(suspend2_unregister_core);
+EXPORT_SYMBOL(resume2_file);
+EXPORT_SYMBOL(suspend_act_used);
+EXPORT_SYMBOL(suspend_lvl_used);
+EXPORT_SYMBOL(suspend_dbg_used);
+EXPORT_SYMBOL(suspend_try_suspend);
+EXPORT_SYMBOL(suspend_debug_state);
+EXPORT_SYMBOL(suspend_result);
+
+/* Symnols exported for Suspend plugins */
+EXPORT_SYMBOL(suspend_default_console_level);
+EXPORT_SYMBOL(pagedir_resume);
+EXPORT_SYMBOL(suspend_io_time);
+EXPORT_SYMBOL(suspend2_core_ops);
+EXPORT_SYMBOL(suspend_early_boot_message);
+EXPORT_SYMBOL(suspend_wait_for_keypress);


2004-11-24 13:36:33

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 47/51: GZIP support.

The original compressor. Slow. I've tried to drop it, but for reasons I
simply don't understand, some users still want it.

diff -ruN 853-gzip-old/kernel/power/suspend_gzip.c 853-gzip-new/kernel/power/suspend_gzip.c
--- 853-gzip-old/kernel/power/suspend_gzip.c 1970-01-01 10:00:00.000000000 +1000
+++ 853-gzip-new/kernel/power/suspend_gzip.c 2004-11-11 08:46:25.000000000 +1100
@@ -0,0 +1,560 @@
+/*
+ * kernel/power/gzip_compression.c
+ *
+ * Copyright (C) 2003,2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains data compression routines for suspend.
+ * Compression is implemented using the zlib library.
+ *
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/highmem.h>
+#include <linux/zlib.h>
+
+#include "plugins.h"
+#include "proc.h"
+#include "suspend.h"
+
+/* Forward declaration for the ops structure we export */
+struct suspend_plugin_ops gzip_compression_ops;
+
+/* The next driver in the pipeline */
+static struct suspend_plugin_ops * next_driver;
+
+/* Zlib routines we use to compress/decompress the data */
+extern int zlib_compress(unsigned char *data_in, unsigned char *cpage_out,
+ u32 *sourcelen, u32 *dstlen);
+extern void zlib_decompress(unsigned char *data_in, unsigned char *cpage_out,
+ u32 srclen, u32 destlen);
+
+/* Buffers */
+static void *compression_workspace = NULL;
+static char *local_buffer = NULL;
+
+/* Configuration data we pass to zlib */
+static z_stream strm;
+
+/* Stats we save */
+static __nosavedata unsigned long bytes_in = 0, bytes_out = 0;
+
+/* Expected compression is used to reduce the amount of storage allocated */
+static int expected_gzip_compression = 0;
+
+/* ---- Zlib memory management ---- */
+
+/* allocate_zlib_compression_space
+ *
+ * Description: Allocate space for zlib to use in compressing our data.
+ * Each call must have a matching call to free_zlib_memory.
+ * Returns: Int: Zero if successful, -ENONEM otherwise.
+ */
+static inline int allocate_zlib_compression_space(void)
+{
+ BUG_ON(compression_workspace);
+
+ compression_workspace = vmalloc_32(zlib_deflate_workspacesize());
+ if (!compression_workspace) {
+ printk(KERN_WARNING
+ "Failed to allocate %d bytes for deflate workspace\n",
+ zlib_deflate_workspacesize());
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/* allocate_zlib_decompression_space
+ *
+ * Description: Allocate space for zlib to use in decompressing our data.
+ * Each call must have a matching call to free_zlib_memory.
+ * Returns: Int: Zero if successful, -ENONEM otherwise.
+ */
+static inline int allocate_zlib_decompression_space(void)
+{
+ BUG_ON(compression_workspace);
+
+ compression_workspace = vmalloc_32(zlib_inflate_workspacesize());
+ if (!compression_workspace) {
+ printk(KERN_WARNING
+ "Failed to allocate %d bytes for inflate workspace\n",
+ zlib_inflate_workspacesize());
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/* free_zlib_memory
+ *
+ * Description: Frees memory allocated by either allocation routine (above).
+ */
+static inline void free_zlib_memory(void)
+{
+ if (!compression_workspace)
+ return;
+
+ vfree(compression_workspace);
+ compression_workspace = NULL;
+}
+
+/* ---- Local buffer management ---- */
+
+/* allocate_local_buffer
+ *
+ * Description: Allocates a page of memory for buffering output.
+ * Returns: Int: Zero if successful, -ENONEM otherwise.
+ */
+static int allocate_local_buffer(void)
+{
+ if (local_buffer)
+ return 0;
+
+ local_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+
+ if (!local_buffer) {
+ printk(KERN_ERR
+ "Failed to allocate a page for compression "
+ "driver's buffer.\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/* free_local_buffer
+ *
+ * Description: Frees memory allocated for buffering output.
+ */
+static inline void free_local_buffer(void)
+{
+ if (local_buffer)
+ free_pages((unsigned long) local_buffer, 0);
+ local_buffer = NULL;
+}
+
+/* ---- Functions exported via operations struct ---- */
+
+/* gzip_write_init
+ *
+ * Description: Allocate buffers and prepares zlib for deflating a new stream
+ * of data.
+ * Arguments: Stream_number: Ignored.
+ * Returns: Int. Zero if successful, otherwise an appropriate error number.
+ */
+
+static int gzip_write_init(int stream_number)
+{
+ int result;
+
+ next_driver = get_next_filter(&gzip_compression_ops);
+
+ if (!next_driver) {
+ printk("GZip Compression Driver: Argh! No one wants my output!");
+ return -ECHILD;
+ }
+
+ if ((result = allocate_zlib_compression_space()))
+ return result;
+
+ if ((result = allocate_local_buffer()))
+ return result;
+
+ strm.total_in = 0;
+ strm.total_out = 0;
+ strm.workspace = compression_workspace;
+ strm.next_out = (char *) local_buffer;
+ strm.avail_out = PAGE_SIZE;
+ result = zlib_deflateInit(&strm, Z_BEST_SPEED);
+
+ if (Z_OK != result) {
+ printk(KERN_ERR name_suspend "Failed to initialise zlib.\n");
+ return -EPERM;
+ }
+
+ /*
+ * Reset the statistics iif we are about to write the first part of
+ * the image
+ */
+ if (stream_number == 2)
+ bytes_in = bytes_out = 0;
+
+ return 0;
+}
+
+/* gzip_write_chunk()
+ *
+ * Description: Compress a page of data, buffering output and passing on
+ * filled pages to the next plugin in the pipeline.
+ * Arguments: Buffer_start: Pointer to a buffer of size PAGE_SIZE,
+ * containing data to be compressed.
+ * Returns: 0 on success. Otherwise the error is that returned by later
+ * plugins, -ECHILD if we have a broken pipeline or -EPERM if
+ * zlib errs.
+ */
+
+static int gzip_write_chunk(struct page * buffer_page)
+{
+ int ret;
+ char * buffer_start = kmap(buffer_page);
+
+ /* Work to do */
+ strm.next_in = buffer_start;
+ strm.avail_in = PAGE_SIZE;
+ while (strm.avail_in) {
+ ret = zlib_deflate(&strm, Z_PARTIAL_FLUSH);
+ if (ret != Z_OK) {
+ printk("Zlib failed to compress our data. "
+ "Result code was %d.\n", ret);
+ kunmap(buffer_page);
+ return -EPERM;
+ }
+
+ if (!strm.avail_out) {
+
+ if ((ret = next_driver->ops.filter.write_chunk(
+ virt_to_page(local_buffer)))) {
+ kunmap(buffer_page);
+ return ret;
+ }
+ strm.next_out = local_buffer;
+ strm.avail_out = PAGE_SIZE;
+ }
+ }
+ kunmap(buffer_page);
+
+ return 0;
+}
+
+/* gzip_write_cleanup()
+ *
+ * Description: Flush remaining data, update statistics and free allocated
+ * space.
+ * Returns: Zero. Never fails. Okay. Zlib might fail... but it shouldn't.
+ */
+
+static int gzip_write_cleanup(void)
+{
+ int ret = 0, finished = 0;
+
+ while (!finished) {
+ if (strm.avail_out) {
+ ret = zlib_deflate(&strm, Z_FINISH);
+
+ if (ret == Z_STREAM_END) {
+ ret = zlib_deflateEnd(&strm);
+ finished = 1;
+ }
+
+ if ((ret != Z_OK) && (ret != Z_STREAM_END)) {
+ zlib_deflateEnd(&strm);
+ printk("Failed to finish compressing data. "
+ "Result %d received.\n", ret);
+ return -EPERM;
+ }
+ }
+
+ if ((!strm.avail_out) || (finished)) {
+ if ((ret = next_driver->ops.filter.write_chunk(
+ virt_to_page(local_buffer))))
+ return ret;
+ strm.next_out = local_buffer;
+ strm.avail_out = PAGE_SIZE;
+ }
+ }
+
+ bytes_in+= strm.total_in;
+ bytes_out+= strm.total_out;
+
+ free_zlib_memory();
+ free_local_buffer();
+
+ return 0;
+}
+
+/* gzip_read_init
+ *
+ * Description: Prepare to read a new stream of data.
+ * Arguments: Stream_number: Not used.
+ * Returns: Int. Zero if successful, otherwise an appropriate error number.
+ */
+
+static int gzip_read_init(int stream_number)
+{
+ int result;
+
+ next_driver = get_next_filter(&gzip_compression_ops);
+
+ if (!next_driver) {
+ printk("GZip Compression Driver: Argh! "
+ "No one wants to feed me data!");
+ return -ECHILD;
+ }
+
+ if ((result = allocate_zlib_decompression_space()))
+ return result;
+
+ if ((result = allocate_local_buffer()))
+ return result;
+
+ strm.total_in = 0;
+ strm.total_out = 0;
+ strm.workspace = compression_workspace;
+ strm.avail_in = 0;
+ if ((result = zlib_inflateInit(&strm)) != Z_OK) {
+ printk(KERN_ERR name_suspend "Failed to initialise zlib.\n");
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+/* gzip_read_chunk()
+ *
+ * Description: Retrieve data from later plugins and decompress it until the
+ * input buffer is filled.
+ * Arguments: Buffer_start: Pointer to a buffer of size PAGE_SIZE.
+ * Sync: Whether the previous plugin (or core) wants its
+ * data synchronously.
+ * Returns: Zero if successful. Error condition from me or from downstream
+ * on failure.
+ */
+
+static int gzip_read_chunk(struct page * buffer_page, int sync)
+{
+ int ret;
+ char * buffer_start = kmap(buffer_page);
+
+ /*
+ * All our reads must be synchronous - we can't decompress
+ * data that hasn't been read yet.
+ */
+
+ /* Work to do */
+ strm.next_out = buffer_start;
+ strm.avail_out = PAGE_SIZE;
+ while (strm.avail_out) {
+ if (!strm.avail_in) {
+ if ((ret = next_driver->ops.filter.read_chunk(
+ virt_to_page(local_buffer),
+ SUSPEND_SYNC)) < 0) {
+ kunmap(buffer_page);
+ return ret;
+ }
+ strm.next_in = local_buffer;
+ strm.avail_in = PAGE_SIZE;
+ }
+
+ ret = zlib_inflate(&strm, Z_PARTIAL_FLUSH);
+
+ if ((ret == Z_BUF_ERROR) && (!strm.avail_in)) {
+ continue;
+ }
+
+ if ((ret != Z_OK) && (ret != Z_STREAM_END)) {
+ printk("Zlib failed to decompress our data. "
+ "Result code was %d.\n", ret);
+ kunmap(buffer_page);
+ return -EPERM;
+ }
+ }
+ kunmap(buffer_page);
+
+ return 0;
+}
+
+/* read_cleanup()
+ *
+ * Description: Clean up after reading part or all of a stream of data.
+ * Returns: int: Always zero. Never fails.
+ */
+
+static int gzip_read_cleanup(void)
+{
+ zlib_inflateEnd(&strm);
+
+ free_zlib_memory();
+ free_local_buffer();
+ return 0;
+}
+
+/* gzip_print_debug_stats
+ *
+ * Description: Print information to be recorded for debugging purposes into a
+ * buffer.
+ * Arguments: buffer: Pointer to a buffer into which the debug info will be
+ * printed.
+ * size: Size of the buffer.
+ * Returns: Number of characters written to the buffer.
+ */
+
+static int gzip_print_debug_stats(char * buffer, int size)
+{
+ int pages_in = bytes_in >> PAGE_SHIFT;
+ int pages_out = bytes_out >> PAGE_SHIFT;
+ int len;
+
+ //Output the compression ratio achieved.
+ len = suspend_snprintf(buffer, size, "- GZIP compressor enabled.\n");
+ if (pages_in)
+ len+= suspend_snprintf(buffer+len, size - len,
+ " Compressed %ld bytes into %ld.\n "
+ "Image compressed by %d percent.\n",
+ bytes_in, bytes_out, (pages_in - pages_out) * 100 / pages_in);
+ return len;
+}
+
+/* compression_memory_needed
+ *
+ * Description: Tell the caller how much memory we need to operate during
+ * suspend/resume.
+ * Returns: Unsigned long. Maximum number of bytes of memory required for
+ * operation.
+ */
+
+static unsigned long gzip_memory_needed(void)
+{
+ return PAGE_SIZE + max( zlib_deflate_workspacesize(),
+ zlib_inflate_workspacesize());
+}
+
+static unsigned long gzip_storage_needed(void)
+{
+ return 2 * sizeof(unsigned long);
+}
+
+/* gzip_save_config_info
+ *
+ * Description: Save information needed when reloading the image at resume time.
+ * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE.
+ * Returns: Number of bytes used for saving our data.
+ */
+
+static int gzip_save_config_info(char * buffer)
+{
+ *((unsigned long *) buffer) = bytes_in;
+ *((unsigned long *) (buffer + sizeof(unsigned long))) = bytes_out;
+ *((int *) (buffer + 2 * sizeof(unsigned long))) = expected_gzip_compression;
+ return 2 * sizeof(unsigned long) + sizeof(int);
+}
+
+/* gzip_load_config_info
+ *
+ * Description: Reload information needed for decompressing the image at
+ * resume time.
+ * Arguments: Buffer: Pointer to the start of the data.
+ * Size: Number of bytes that were saved.
+ */
+
+static void gzip_load_config_info(char * buffer, int size)
+{
+ if(size == 2 * sizeof(unsigned long) + sizeof(int)) {
+ bytes_in = *((unsigned long *) buffer);
+ bytes_out = *((unsigned long *) (buffer + sizeof(unsigned long)));
+ expected_gzip_compression = *((int *) (buffer + 2 * sizeof(unsigned long)));
+ } else
+ printk("Suspend GZIP config info size mismatch: settings ignored.\n");
+ return;
+}
+
+/* gzip_get_expected_compression
+ *
+ * Description: Returns the expected ratio between data passed into this plugin
+ * and the amount of data output when writing.
+ * Returns: The value set by the user via our proc entry.
+ */
+
+static int gzip_get_expected_compression(void)
+{
+ return 100 - expected_gzip_compression;
+}
+
+/*
+ * data for our proc entries.
+ */
+
+struct suspend_proc_data expected_compression_proc_data = {
+ .filename = "expected_gzip_compression",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &expected_gzip_compression,
+ .minimum = 0,
+ .maximum = 99,
+ }
+ }
+};
+
+struct suspend_proc_data disable_compression_proc_data = {
+ .filename = "disable_gzip_compression",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &gzip_compression_ops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ }
+};
+
+/*
+ * Ops structure.
+ */
+
+struct suspend_plugin_ops gzip_compression_ops = {
+ .type = FILTER_PLUGIN,
+ .name = "Zlib Page Compressor",
+ .memory_needed = gzip_memory_needed,
+ .print_debug_info = gzip_print_debug_stats,
+ .save_config_info = gzip_save_config_info,
+ .load_config_info = gzip_load_config_info,
+ .storage_needed = gzip_storage_needed,
+ .ops = {
+ .filter = {
+ .write_init = gzip_write_init,
+ .write_chunk = gzip_write_chunk,
+ .write_cleanup = gzip_write_cleanup,
+ .read_init = gzip_read_init,
+ .read_chunk = gzip_read_chunk,
+ .read_cleanup = gzip_read_cleanup,
+ .expected_compression = gzip_get_expected_compression,
+ }
+ }
+};
+
+/* ---- Registration ---- */
+
+static __init int gzip_load(void)
+{
+ int result;
+
+ if (!(result = suspend_register_plugin(&gzip_compression_ops))) {
+ printk("Software Suspend Gzip Compression Driver registered.\n");
+ suspend_register_procfile(&expected_compression_proc_data);
+ suspend_register_procfile(&disable_compression_proc_data);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void gzip_unload(void)
+{
+ printk("Software Suspend Gzip Compression Driver unloading.\n");
+ suspend_unregister_procfile(&expected_compression_proc_data);
+ suspend_unregister_procfile(&disable_compression_proc_data);
+ suspend_unregister_plugin(&gzip_compression_ops);
+}
+
+module_init(gzip_load);
+module_exit(gzip_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Gzip Compression support for Suspend2");
+#else
+late_initcall(gzip_load);
+#endif


2004-11-24 13:40:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.

> /*
> * Platforms implementing 32 bit compatibility ioctl handlers in
> - * modules need this exported
> + * modules need this exported. So does Suspend2 (when made as
> + * modules), so the export_symbol is now unconditional.
> */
> -#ifdef CONFIG_COMPAT
> EXPORT_SYMBOL(sys_ioctl);
> -#endif

This is definitly the wrong interface for whatever you want to do.

> diff -ruN 400-exports-old/fs/namei.c 400-exports-new/fs/namei.c
> --- 400-exports-old/fs/namei.c 2004-11-03 21:53:11.000000000 +1100
> +++ 400-exports-new/fs/namei.c 2004-11-04 16:27:40.000000000 +1100
> @@ -1649,6 +1649,8 @@
> return error;
> }
>
> +EXPORT_SYMBOL(sys_mkdir);

Dito

> * We try to drop the dentry early: we should have
> * a usage count of 2 if we're the only user of this
> diff -ruN 400-exports-old/fs/namespace.c 400-exports-new/fs/namespace.c
> --- 400-exports-old/fs/namespace.c 2004-11-03 21:54:15.000000000 +1100
> +++ 400-exports-new/fs/namespace.c 2004-11-04 16:27:40.000000000 +1100
> @@ -490,6 +490,8 @@
> return retval;
> }
>
> +EXPORT_SYMBOL(sys_umount);

Dito

> +EXPORT_SYMBOL(sys_mount);

Dito.

> +EXPORT_SYMBOL(proc_match);

Also nothing anything outside of procfs internals should do.

> +EXPORT_SYMBOL(sys_write);

wrong aswell.

> +EXPORT_SYMBOL(sys_reboot);

Dito

> unsigned long avenrun[3];
> +EXPORT_SYMBOL(avenrun);

Nothing you should poke into.

> +/* Exported for Software Suspend 2 */
> +EXPORT_SYMBOL(nr_free_highpages);
> +EXPORT_SYMBOL(pgdat_list);

Dito.

> +EXPORT_SYMBOL(swap_free);
> +EXPORT_SYMBOL(swap_info);
> +EXPORT_SYMBOL(sys_swapoff);
> +EXPORT_SYMBOL(sys_swapon);
> +EXPORT_SYMBOL(si_swapinfo);
> +EXPORT_SYMBOL(map_swap_page);
> +EXPORT_SYMBOL(get_swap_page);
> +EXPORT_SYMBOL(get_swap_info_struct);

Dito. Lowlevel swapdevice access isn't something modules should poke
into.

Nigel, why do I have this strange feeling that exactly the same patch
was rejected already but you resubmitted it again?

If you want anything merged drop the modular swsusp bits, I doubt it'll
ever be merged.

2004-11-24 13:40:04

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 45/51: Bootsplash support.

Support for bootsplash.

I might switch to fbsplash soon. It is better supported.

diff -ruN 851-suspend-bootsplash-old/kernel/power/suspend_bootsplash.c 851-suspend-bootsplash-new/kernel/power/suspend_bootsplash.c
--- 851-suspend-bootsplash-old/kernel/power/suspend_bootsplash.c 1970-01-01 10:00:00.000000000 +1000
+++ 851-suspend-bootsplash-new/kernel/power/suspend_bootsplash.c 2004-11-11 07:30:21.000000000 +1100
@@ -0,0 +1,302 @@
+/*
+ * kernel/power/suspend2_bootsplash.c
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file implements bootsplash support for suspend2.
+ */
+#define SUSPEND_CONSOLE_C
+
+#define __KERNEL_SYSCALLS__
+
+#include <linux/suspend.h>
+#include <linux/console.h>
+#include <linux/proc_fs.h>
+#include <linux/hardirq.h>
+#include <asm/hardirq.h>
+
+#include "plugins.h"
+#include "proc.h"
+#include "suspend.h"
+
+static int barwidth = 100, barposn = -1, newbarposn = 0;
+static int lastloglevel = -1;
+
+/* Your bootsplash progress bar may have a width of (eg) 1024 pixels. That
+ * doesn't necessarily mean you want the bar updated 1024 times when writing
+ * the image */
+static int bar_granularity_limit = 0;
+
+/* ------------------ Splash screen defines -------------------------- */
+
+extern struct display fb_display[MAX_NR_CONSOLES];
+
+/* splash_is_on
+ *
+ * Description: Determine whether a VT has a splash screen on.
+ * Arguments: int consolenr. The VT number of a console to check.
+ * Returns: Boolean indicating whether the splash screen for
+ * that console is on right now.
+ */
+static int splash_is_on(int consolenr)
+{
+ struct splash_data *info = get_splash_data(consolenr);
+
+ if (info)
+ return ((info->splash_state & 1) == 1);
+ return 0;
+}
+
+/* splash_write_proc.
+ *
+ * Write to Bootsplash's proc entry. We need this to work when /proc
+ * hasn't been mounted yet and / can't be mounted. In addition, we
+ * want it to work despite the fact that bootsplash (under 2.4 at least)
+ * removes its proc entry when it shouldn't. We therefore use
+ * our proc.c find_proc_dir_entry routine to get the location of the
+ * write routine once (boot time & at start of each resume), and keep it.
+ */
+
+extern struct proc_dir_entry * find_proc_dir_entry(const char *name,
+ struct proc_dir_entry *parent);
+
+static void splash_write_proc(const char *buffer, unsigned long count)
+{
+ static write_proc_t * write_routine;
+ struct proc_dir_entry * proc_entry;
+
+ if (in_interrupt())
+ return;
+
+ if (unlikely(!write_routine)) {
+ proc_entry = find_proc_dir_entry("splash", &proc_root);
+ if (proc_entry)
+ write_routine = proc_entry->write_proc;
+ }
+
+ if (write_routine)
+ write_routine(NULL, buffer, count, NULL);
+}
+
+/* fb_splash-set_progress
+ *
+ * Description: Set the progress bar position for a splash screen.
+ * Arguments: int consolenr. The VT number of a console to use.
+ * unsigned long value, unsigned long maximum:
+ * The proportion (value/maximum) of the bar to fill.
+ */
+
+static int fb_splash_set_progress(int consolenr, unsigned long value,
+ unsigned long maximum)
+{
+ char procstring[15];
+ int length, bitshift = generic_fls(maximum) - 16;
+ static unsigned long lastvalue = 0;
+ unsigned long thisvalue;
+
+ BUG_ON(consolenr >= MAX_NR_CONSOLES);
+
+ if (in_interrupt())
+ return 0;
+
+ if (value > maximum)
+ value = maximum;
+
+ /* Avoid math problems - we can't do 64 bit math here
+ * (and don't need it - anyone got screen resolution
+ * of 65536 pixels or more?) */
+ if (bitshift > 0) {
+ maximum = maximum >> bitshift;
+ value = value >> bitshift;
+ }
+
+ thisvalue = value * 65534 / maximum;
+
+ length = sprintf(procstring, "show %lu", thisvalue);
+
+ splash_write_proc(procstring, length);
+
+ lastvalue = thisvalue;
+
+ return 0;
+}
+
+/* bootsplash_loglevel_change
+ *
+ * Description: Update the display when the user changes the log level.
+ */
+
+static void bootsplash_loglevel_change(void)
+{
+ /* Calculate progress bar width. Note that whether the
+ * splash screen is on might have changed (this might be
+ * the first call in a new cycle), so we can't take it
+ * for granted that the width should be the same as
+ * last time we came in here */
+ if (!splash_is_on(fg_console))
+ return;
+
+ /* proc interface ensures bar_granularity_limit >= 0 */
+ if (bar_granularity_limit)
+ barwidth = bar_granularity_limit;
+ else
+ barwidth = 100;
+
+ /* Only reset the display if we're switching between nice display
+ * and displaying debugging output */
+ if (console_loglevel > 1) {
+ if (lastloglevel < 2)
+ splash_write_proc("verbose\n", 9);
+ } else if (lastloglevel > 1)
+ splash_write_proc("silent\n", 8);
+
+ lastloglevel = console_loglevel;
+}
+
+static void bootsplash_prepare(void)
+{
+ if (!splash_is_on(fg_console))
+ return;
+
+ if (console_loglevel < 2)
+ splash_write_proc("silent\n", 8);
+ else
+ splash_write_proc("verbose\n", 9);
+}
+
+/* bootsplash_update_progress
+ *
+ * Description: Update the progress bar and (if on) in-bar message.
+ * Arguments: UL value, maximum: Current progress percentage (value/max).
+ * const char *fmt, ...: Message to be displayed in the middle
+ * of the progress bar.
+ * Note that a NULL message does not mean that any previous
+ * message is erased! For that, you need message with
+ * clearbar on.
+ * Returns: Unsigned long: The next value where status needs to be updated.
+ * This is to reduce unnecessary calls to update_progress.
+ *
+ * Note that for Bootsplash, we ignore the in-bar message
+ */
+static unsigned long bootsplash_update_progress(
+ unsigned long value, unsigned long maximum,
+ const char *fmt, va_list args)
+{
+ unsigned long next_update = 0;
+ int bitshift = generic_fls(maximum) - 16;
+
+ if ((!maximum) || (!barwidth))
+ return maximum;
+
+ if (value < 0)
+ value = 0;
+
+ if (value > maximum)
+ value = maximum;
+
+ /* Try to avoid math problems - we can't do 64 bit math here
+ * (and shouldn't need it - anyone got screen resolution
+ * of 65536 pixels or more?) */
+ if (bitshift > 0) {
+ unsigned long temp_maximum = maximum >> bitshift;
+ unsigned long temp_value = value >> bitshift;
+ newbarposn = (int) (temp_value * barwidth / temp_maximum);
+ } else
+ newbarposn = (int) (value * barwidth / maximum);
+
+ if (newbarposn < barposn)
+ barposn = 0;
+
+ next_update = ((newbarposn + 1) * maximum / barwidth) + 1;
+
+ if ((splash_is_on(fg_console)) &&
+ (newbarposn != barposn)) {
+ fb_splash_set_progress(fg_console, value, maximum);
+ barposn = newbarposn;
+ }
+ return next_update;
+}
+
+/*
+ * User interface specific /proc/suspend entries.
+ */
+static struct suspend_plugin_ops bootsplash_ops;
+
+static struct suspend_proc_data proc_params[] = {
+ { .filename = "bootsplash_granularity_limit",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &bar_granularity_limit,
+ .minimum = 1,
+ .maximum = 2000,
+ }
+ }
+ },
+
+ { .filename = "disable_bootsplash_support",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &bootsplash_ops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ }
+ }
+};
+
+static struct suspend_plugin_ops bootsplash_ops = {
+ .type = UI_PLUGIN,
+ .name = "Bootsplash Support",
+ .ops = {
+ .ui = {
+ .prepare = bootsplash_prepare,
+ .log_level_change = bootsplash_loglevel_change,
+ .update_progress = bootsplash_update_progress,
+ .post_kernel_restore_redraw =
+ bootsplash_prepare,
+ }
+ }
+};
+
+/* ---- Registration ---- */
+
+static __init int bootsplash_load(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+ int result;
+
+ if (!(result = suspend_register_plugin(&bootsplash_ops))) {
+ printk("Software Suspend Bootsplash Support loaded.\n");
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&proc_params[i]);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void bootsplash_unload(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ printk("Software Suspend Bootsplash support unloading.\n");
+
+ for (i=0; i< numfiles; i++)
+ suspend_unregister_procfile(&proc_params[i]);
+
+ suspend_unregister_plugin(&bootsplash_ops);
+}
+
+module_init(bootsplash_load);
+module_exit(bootsplash_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Suspend2 Bootsplash support");
+#else
+late_initcall(bootsplash_load);
+#endif


2004-11-24 13:30:48

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 18/51: Debug page_alloc support.

This patch provides support for making suspend work when DEBUG_PAGEALLOC
is enabled.

diff -ruN 510-debug-pagealloc-support-old/arch/i386/mm/pageattr.c 510-debug-pagealloc-support-new/arch/i386/mm/pageattr.c
--- 510-debug-pagealloc-support-old/arch/i386/mm/pageattr.c 2004-11-03 21:53:39.000000000 +1100
+++ 510-debug-pagealloc-support-new/arch/i386/mm/pageattr.c 2004-11-04 16:27:40.000000000 +1100
@@ -211,5 +213,49 @@
EXPORT_SYMBOL(kernel_map_pages);
#endif

+#ifdef CONFIG_SOFTWARE_SUSPEND2
+#ifdef CONFIG_DEBUG_PAGEALLOC
+static int page_is_kernel_mapped(struct page * page)
+{
+ pte_t *kpte;
+ unsigned long address;
+
+#ifdef CONFIG_HIGHMEM
+ if (page >= highmem_start_page)
+ return 0;
+#endif
+
+ address = (unsigned long)page_address(page);
+
+ kpte = lookup_address(address);
+ if (!kpte)
+ return 0;
+
+ if (pte_same(*kpte, mk_pte(page, PAGE_KERNEL)))
+ return 1;
+
+ return 0;
+}
+
+int suspend_map_kernel_page(struct page * page, int enable)
+{
+ int is_already_mapped = page_is_kernel_mapped(page);
+
+ if (enable == is_already_mapped)
+ return 1;
+
+ kernel_map_pages(page, 1, enable);
+
+ return 0;
+}
+#else
+int suspend_map_kernel_page(struct page * page, int enable)
+{
+ return (enable == 1);
+}
+#endif
+EXPORT_SYMBOL(suspend_map_kernel_page);
+#endif
+
EXPORT_SYMBOL(change_page_attr);
EXPORT_SYMBOL(global_flush_tlb);


2004-11-24 13:27:56

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 31/51: Export tlb flushing

This patch adds a do_flush_tlb_all function that does the
SMP-appropriate thing for suspend after the image is restored.

diff -ruN 818-tlb-flushing-functions-old/arch/i386/kernel/smp.c 818-tlb-flushing-functions-new/arch/i386/kernel/smp.c
--- 818-tlb-flushing-functions-old/arch/i386/kernel/smp.c 2004-11-06 09:27:19.225681536 +1100
+++ 818-tlb-flushing-functions-new/arch/i386/kernel/smp.c 2004-11-04 16:27:41.000000000 +1100
@@ -476,7 +476,7 @@
preempt_enable();
}

-static void do_flush_tlb_all(void* info)
+void do_flush_tlb_all(void* info)
{
unsigned long cpu = smp_processor_id();

diff -ruN 818-tlb-flushing-functions-old/include/asm-i386/tlbflush.h 818-tlb-flushing-functions-new/include/asm-i386/tlbflush.h
--- 818-tlb-flushing-functions-old/include/asm-i386/tlbflush.h 2004-11-03 21:55:01.000000000 +1100
+++ 818-tlb-flushing-functions-new/include/asm-i386/tlbflush.h 2004-11-04 16:27:41.000000000 +1100
@@ -82,6 +82,7 @@
#define flush_tlb() __flush_tlb()
#define flush_tlb_all() __flush_tlb_all()
#define local_flush_tlb() __flush_tlb()
+#define local_flush_tlb_all() __flush_tlb_all();

static inline void flush_tlb_mm(struct mm_struct *mm)
{
@@ -114,6 +115,10 @@
extern void flush_tlb_current_task(void);
extern void flush_tlb_mm(struct mm_struct *);
extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
+extern void do_flush_tlb_all(void * info);
+
+#define local_flush_tlb_all() \
+ do_flush_tlb_all(NULL);

#define flush_tlb() flush_tlb_current_task()



2004-11-24 13:52:05

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 33/51: More documentation.

More documentation for suspend. The internals.txt file is still to be
completed (and a bit out of date, too!).

diff -ruN 821-docs-old/Documentation/power/internals.txt 821-docs-new/Documentation/power/internals.txt
--- 821-docs-old/Documentation/power/internals.txt 1970-01-01 10:00:00.000000000 +1000
+++ 821-docs-new/Documentation/power/internals.txt 2004-11-04 16:27:41.000000000 +1100
@@ -0,0 +1,364 @@
+ Software Suspend 2.0 Internal Documentation.
+ Version 1
+
+1. Introduction.
+
+ Software Suspend 2.0 is an addition to the Linux Kernel, designed to
+ allow the user to quickly shutdown and quickly boot a computer, without
+ needing to close documents or programs. It is equivalent to the
+ hibernate facility in some laptops. This implementation, however,
+ requires no special BIOS or hardware support.
+
+ The code in these files is based upon the original implementation
+ prepared by Gabor Kuti and additional work by Pavel Machek and a
+ host of others. This code has been substantially reworked by Nigel
+ Cunningham, again with the help and testing of many others, not the
+ least of whom is Michael Frank, At its heart, however, the operation is
+ essentially the same as Gabor's version.
+
+2. Overview of operation.
+
+ The basic sequence of operations is as follows:
+
+ a. Quiesce all other activity.
+ b. Ensure enough memory and storage space are available, and attempt
+ to free memory/storage if necessary.
+ c. Allocate the required memory and storage space.
+ d. Write the image.
+ e. Power down.
+
+ There are a number of complicating factors which mean that things are
+ not as simple as the above would imply, however...
+
+ o The activity of each process must be stopped at a point where it will
+ not be holding locks necessary for saving the image, or unexpectedly
+ restart operations due to something like a timeout and thereby make
+ our image inconsistent.
+
+ o It is desirous that we sync outstanding I/O to disk before calculating
+ image statistics. This reduces corruption if one should suspend but
+ then not resume, and also makes later parts of the operation safer (see
+ below).
+
+ o We need to get as close as we can to an atomic copy of the data.
+ Inconsistencies in the image will result inconsistent memory contents at
+ resume time, and thus in instability of the system and/or file system
+ corruption. This would appear to imply a maximum image size of one half of
+ the amount of RAM, but we have a solution... (again, below).
+
+ o In 2.6, we must play nicely with the other suspend-to-disk
+ implementations.
+
+3. Detailed description of internals.
+
+ a. Quiescing activity.
+
+ Safely quiescing the system is achieved in a number of steps. First, we
+ wait for existing activity to complete, while holding new activity until
+ post-resume. Second, we sync unwritten buffers. Third, we send a
+ 'pseudo-signal' to all processes that have not yet entered the
+ 'refrigerator' but should be frozen, causing them to be refrigerated.
+
+ Waiting for existing activity to complete is achieved by using hooks at
+ the beginning and end of critical paths in the kernel code. When a process
+ enters a section where it cannot be safely refrigerated, the process flag
+ PF_FRIDGE_WAIT is set from the SWSUSP_ACTIVITY_STARTING macro. In the same
+ routine, at completion of the critical region, a SWSUSP_ACTIVITY_END macro
+ resets the flag. The _STARTING and _ENDING macros also atomically adjust
+ the global counter swsusp_num_active. While the counter is non-zero,
+ Software Suspend's freezer will wait.
+
+ These macros serve two other additional purposes. Local variables are used
+ to ensure that processes can safely pass through multiple _STARTING and
+ _ENDING macros, and checks are made to ensure that the freezer is not
+ waiting for activity to finish. If a process wants to start on a critical
+ path when Suspend is waiting for activity to finish, it will be held at the
+ start of the critical path and refrigerated earlier than would normally be
+ the case. It will be allowed to continue operation after the Suspend cycle
+ is finished or aborted.
+
+ A process in a critical path may also have a section where it releases
+ locks and can be safely stopped until post-resume. For these cases, the
+ SWSUSP_ACTIVITY_PAUSING and _RESTARTING macros may be used. They function
+ in a similar manner to the _STARTING and _ENDING macros.
+
+ Finally, we remember that some threads may be necessary for syncing data to
+ storage. These threads have PF_SYNCTHREAD set, and may use the special macro
+ SWSUSP_ACTIVITY_SYNCTHREAD_PAUSING to indicate that Suspend can safely
+ continue, while not themselves entering the refrigerator.
+
+ Once activity is stopped, Suspend will initiate a fsync of all devices.
+ This aims to increase the integrity of the disk state, just in case
+ something should go wrong.
+
+ During the initial stage, Suspend indicates its desire that processes be
+ stopped by setting the FREEZE_NEW_ACTIVITY bit of swsusp_state. Once the
+ sync is complete, SYNCTHREAD processes no longer need to run. The
+ FREEZE_UNREFRIGERATED bit is now set, causing them to be refrigerated as
+ well, should they attempt to start new activity. (There should be nothing
+ for them to do, but just-in-case).
+
+ Suspend can now put remaining processes in the refrigerator without fear
+ of deadlocking or leaving dirty data unsynced. The refrigerator is a
+ procedure where processes wait until the cycle is complete. While in there,
+ we can be sure that they will not perform activity that will make our
+ image inconsistent. Processes enter the refrigerator either by being
+ caught at one of the previously mentioned hooks, or by receiving a 'pseudo-
+ signal' from Suspend at this stage. I call it a pseudo signal because
+ signal_wake_up is called for the process when it actually hasn't been
+ signalled. A special hook in the signal handler then calls the refrigerator.
+ The refrigerator, in turn, recalculates the signal pending status to
+ ensure no ill effects result.
+
+ Not all processes are refrigerated. The Suspend thread itself, of course,
+ is one such thread. Others are flagged by setting PF_NOFREEZE, usually
+ because they are needed during suspend.
+
+ In 2.4, the dosexec thread (Win4Lin) is treated specially. It does not
+ handle us even pretending to send it a signal. This is worked-around by
+ us adjusting the can_schedule() macro in schedule.c to stop the task from
+ being scheduled during suspend. Ugly, but it works. The 2.6 version of
+ Win4Lin has been made compatible.
+
+ b. Ensure enough memory & storage are available.
+ c. Allocate the required memory and storage space.
+
+ These steps are merged together in the prepare_image function, found in
+ prepare_image.c. The functions are merged because of the cyclical nature
+ of the problem of calculating how much memory and storage is needed. Since
+ the data structures containing the information about the image must
+ themselves take memory and use storage, the amount of memory and storage
+ required changes as we prepare the image. Since the changes are not large,
+ only one or two iterations will be required to achieve a solution.
+
+ d. Write the image.
+
+ We previously mentioned the need to create an atomic copy of the data, and
+ the half-of-memory limitation that is implied in this. This limitation is
+ circumvented by dividing the memory to be saved into two parts, called
+ pagesets.
+
+ Pageset2 contains the page cache - the pages on the active and inactive
+ lists. These pages are saved first and reloaded last. While saving these
+ pages, the swapwriter plugin carefully ensures that the work of writing
+ the pages doesn't make the image inconsistent. Pages added to the LRU
+ lists are immediately shot down, and careful accounting for available
+ memory aids debugging. No atomic copy of these pages needs to be made.
+
+ Writing the image requires memory, of course, and at this point we have
+ also not yet suspended the drivers. To avoid the possibility of remaining
+ activity corrupting the image, we allocate a special memory pool. Calls
+ to __alloc_pages and __free_pages_ok are then diverted to use our memory
+ pool. Pages in the memory pool are saved as part of pageset1 regardless of
+ whether or not they are used.
+
+ Once pageset2 has been saved, we suspend the drivers and save the CPU
+ context before making an atomic copy of pageset1, resuming the drivers
+ and saving the atomic copy. After saving the two pagesets, we just need to
+ save our metadata before powering down.
+
+ Having saved pageset2 pages, we can safely overwrite their contents with
+ the atomic copy of pageset1. This is how we manage to overcome the half of
+ memory limitation. Pageset2 is normally far larger than pageset1, and
+ pageset1 is normally much smaller than half of the memory, with the result
+ that pageset2 pages can be safely overwritten with the atomic copy of
+ pageset1. This is where we need to be careful about syncing, however.
+ Pageset2 will probably contain filesystem meta data. If this is overwritten
+ with pageset1 and then a sync occurs, the filesystem will be corrupted -
+ at least until resume time and another sync of the restored data. Since
+ there is a possibility that the user might not resume or (may it never be!)
+ that suspend might oops, we do our utmost to avoid syncing filesystems after
+ copying pageset1.
+
+ e. Power down.
+
+ Powering down uses standard kernel routines. Prior to this, however, we
+ suspend drivers again, ensuring that write caches are flushed.
+
+4. The method of writing the image.
+
+ Software Suspend 2.0rc3 and later contain an internal API which is
+ designed to simplify the implementation of new methods of transforming
+ the image to be written and writing the image itself. Prior to rc3,
+ compression support was inlined in the image writing code, and the data
+ structures and code for managing swap were intertwined with the rest of
+ the code. A number of people had expressed interest in implementing
+ image encryption, and alternative methods of storing the image. This
+ internal API makes that possible by implementing 'plugins'.
+
+ A plugin is a single file which encapsulates the functionality needed
+ to transform a pageset of data (encryption or compression, for example),
+ or to write the pageset to a device. The former type of plugin is called
+ a 'page-transformer', the later a 'writer'.
+
+ Plugins are linked together in pipeline fashion. There may be zero or more
+ page transformers in a pipeline, and there is always exactly one writer.
+ The pipeline follows this pattern:
+
+ ---------------------------------
+ | Software Suspend Core |
+ ---------------------------------
+ |
+ |
+ ---------------------------------
+ | Page transformer 1 |
+ ---------------------------------
+ |
+ |
+ ---------------------------------
+ | Page transformer 2 |
+ ---------------------------------
+ |
+ |
+ ---------------------------------
+ | Writer |
+ ---------------------------------
+
+ During the writing of an image, the core code feeds pages one at a time
+ to the first plugin. This plugin performs whatever transformations it
+ implements on the incoming data, completely consuming the incoming data and
+ feeding output in a similar manner to the next plugin. A plugin may buffer
+ its output.
+
+ During reading, the pipeline works in the reverse direction. The core code
+ calls the first plugin with the address of a buffer which should be filled.
+ (Note that the buffer size is always PAGE_SIZE at this time). This plugin
+ will in turn request data from the next plugin and so on down until the
+ writer is made to read from the stored image.
+
+ Part of definition of the structure of a plugin thus looks like this:
+
+ /* Writing the image proper */
+ int (*write_init) (int stream_number);
+ int (*write_chunk) (char * buffer_start);
+ int (*write_cleanup) (void);
+
+ /* Reading the image proper */
+ int (*read_init) (int stream_number);
+ int (*read_chunk) (char * buffer_start, int sync);
+ int (*read_cleanup) (void);
+
+ It should be noted that the _cleanup routines may be called before the
+ full stream of data has been read or written. While writing the image,
+ the user may (depending upon settings) choose to abort suspending, and
+ if we are in the midst of writing the last portion of the image, a portion
+ of the second pageset may be reread.
+
+ In addition to the above routines for writing the data, all plugins have a
+ number of other routines:
+
+ TYPE indicates whether the plugin is a page transformer or a writer.
+ #define TRANSFORMER_PLUGIN 1
+ #define WRITER_PLUGIN 2
+
+ NAME is the name of the plugin, used in generic messages.
+
+ PLUGIN_LIST is used to link the plugin into the list of all plugins.
+
+ MEMORY_NEEDED returns the number of pages of memory required by the plugin
+ to do its work.
+
+ STORAGE_NEEDED returns the number of pages in the suspend header required
+ to store the plugin's configuration data.
+
+ PRINT_DEBUG_INFO fills a buffer with information to be displayed about the
+ operation or settings of the plugin.
+
+ SAVE_CONFIG_INFO returns a buffer of PAGE_SIZE or smaller (the size is the
+ return code), containing the plugin's configuration info. This information
+ will be written in the image header and restored at resume time. Since this
+ buffer is allocated after the atomic copy of the kernel is made, you don't
+ need to worry about the buffer being freed.
+
+ LOAD_CONFIG_INFO gives the plugin a pointer to the the configuration info
+ which was saved during suspending. Once again, the plugin doesn't need to
+ worry about freeing the buffer. The kernel will be overwritten with the
+ original kernel, so no memory leak will occur.
+
+ OPS contains the operations specific to transformers and writers. These are
+ described below.
+
+ The complete definition of struct swsusp_plugin_ops is:
+
+ struct swsusp_plugin_ops {
+ /* Functions common to transformers and writers */
+ int type;
+ char * name;
+ struct list_head plugin_list;
+ unsigned long (*memory_needed) (void);
+ unsigned long (*storage_needed) (void);
+ int (*print_debug_info) (char * buffer, int size);
+ int (*save_config_info) (char * buffer);
+ void (*load_config_info) (char * buffer, int len);
+
+ /* Writing the image proper */
+ int (*write_init) (int stream_number);
+ int (*write_chunk) (char * buffer_start);
+ int (*write_cleanup) (void);
+
+ /* Reading the image proper */
+ int (*read_init) (int stream_number);
+ int (*read_chunk) (char * buffer_start, int sync);
+ int (*read_cleanup) (void);
+
+ union {
+ struct swsusp_transformer_ops transformer;
+ struct swsusp_writer_ops writer;
+ } ops;
+ };
+
+
+ The operations specific to transformers are few in number:
+
+ struct swsusp_transformer_ops {
+ int (*expected_compression) (void);
+ struct list_head transformer_list;
+ };
+
+ Expected compression returns the expected ratio between the amount of
+ data sent to this plugin and the amount of data it passes to the next
+ plugin. The value is used by the core code to calculate the amount of
+ space required to write the image. If the ratio is not achieved, the
+ writer will complain when it runs out of space with data still to
+ write, and the core code will abort the suspend.
+
+ transformer_list links together page transformers, in the order in
+ which they register, which is in turn determined by order in the
+ Makefile.
+
+ There are many more operations specific to a writer:
+
+ struct swsusp_writer_ops {
+
+ long (*storage_available) (void);
+
+ unsigned long (*storage_allocated) (void);
+
+ int (*release_storage) (void);
+
+ long (*allocate_header_space) (unsigned long space_requested);
+ int (*allocate_storage) (unsigned long space_requested);
+
+ int (*write_header_init) (void);
+ int (*write_header_chunk) (char * buffer_start, int buffer_size);
+ int (*write_header_cleanup) (void);
+
+ int (*read_header_init) (void);
+ int (*read_header_chunk) (char * buffer_start, int buffer_size);
+ int (*read_header_cleanup) (void);
+
+ int (*prepare_save) (void);
+ int (*post_load) (void);
+
+ int (*parse_image_location) (char * buffer);
+
+ int (*image_exists) (void);
+
+ int (*invalidate_image) (void);
+
+ int (*wait_on_io) (int flush_all);
+
+ struct list_head writer_list;
+ };
+
+ STORAGE_AVAILABLE is
diff -ruN 821-docs-old/Documentation/power/todo.txt 821-docs-new/Documentation/power/todo.txt
--- 821-docs-old/Documentation/power/todo.txt 1970-01-01 10:00:00.000000000 +1000
+++ 821-docs-new/Documentation/power/todo.txt 2004-11-04 16:27:41.000000000 +1100
@@ -0,0 +1,19 @@
+Suspend2 todo list
+
+20041021
+ 2.1 known issues:
+ ----------------
+- NFS support missing
+- Encryption support missing
+- DRI support for 2.4 & 2.6
+- USB support under 2.4 and 2.6
+- Incomplete support in other drivers
+- No support for discontig memory
+- Currently requires PSE extension (/proc/cpuinfo)
+- Highmem >4GB not supported
+
+20040107
+- Further cleaning up.
+
+20031216
+- Include progress-bar-granularity in all_settings.


2004-11-24 13:54:49

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

Lowlevel code for i386. This is the code responsible for saving and
restoring CPU state, and for restoring the original kernel (except LRU
pages, which are loaded afterwards).

diff -ruN 700-suspend2-lowlevel-old/arch/i386/power/Makefile 700-suspend2-lowlevel-new/arch/i386/power/Makefile
--- 700-suspend2-lowlevel-old/arch/i386/power/Makefile 2004-11-03 21:52:57.000000000 +1100
+++ 700-suspend2-lowlevel-new/arch/i386/power/Makefile 2004-11-04 16:27:40.000000000 +1100
@@ -1,2 +1,4 @@
-obj-$(CONFIG_PM) += cpu.o
+CFLAGS_suspend2.o = -O0
+
+obj-$(CONFIG_PM) += cpu.o suspend2.o
obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o
diff -ruN 700-suspend2-lowlevel-old/arch/i386/power/suspend2.c 700-suspend2-lowlevel-new/arch/i386/power/suspend2.c
--- 700-suspend2-lowlevel-old/arch/i386/power/suspend2.c 1970-01-01 10:00:00.000000000 +1000
+++ 700-suspend2-lowlevel-new/arch/i386/power/suspend2.c 2004-11-17 19:43:08.000000000 +1100
@@ -0,0 +1,639 @@
+ /*
+ * Copyright 2003-2004 Nigel Cunningham <[email protected]>
+ * Based on code
+ * Copyright 2001-2002 Pavel Machek <[email protected]>
+ * Based on code
+ * Copyright 2001 Patrick Mochel <[email protected]>
+ */
+#include <linux/init.h>
+#include <linux/version.h>
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/irq.h>
+#include <asm/desc.h>
+#include <asm/i387.h>
+#include <asm/apic.h>
+#include <asm/tlbflush.h>
+#include "../../../kernel/power/suspend.h"
+
+extern volatile struct suspend2_core_ops * suspend2_core_ops;
+extern struct pagedir pagedir_resume;
+extern volatile int suspend_io_time[2][2];
+extern char __nosavedata swsusp_pg_dir[PAGE_SIZE]
+ __attribute__ ((aligned (PAGE_SIZE)));
+#include <asm/processor.h>
+#undef inline
+#define inline __inline__ __attribute__((always_inline))
+
+#ifdef CONFIG_MTRR
+/* MTRR functions */
+extern int mtrr_save(void);
+extern int mtrr_restore_one_cpu(void);
+extern void mtrr_restore_finish(void);
+#else
+#define mtrr_save() do { } while(0)
+#define mtrr_restore_one_cpu() do { } while(0)
+#define mtrr_restore_finish() do { } while(0)
+#endif
+
+/* image of the saved processor states */
+struct suspend2_saved_context {
+ u32 eax, ebx, ecx, edx;
+ u32 esp, ebp, esi, edi;
+ u16 es, fs, gs, ss;
+ u32 cr0, cr2, cr3, cr4;
+ u16 gdt_pad;
+ u16 gdt_limit;
+ u32 gdt_base;
+ u16 idt_pad;
+ u16 idt_limit;
+ u32 idt_base;
+ u16 ldt;
+ u16 tss;
+ u32 tr;
+ u32 safety;
+ u32 return_address;
+ u32 eflags;
+} __attribute__((packed));
+
+#ifdef CONFIG_SMP
+static struct suspend2_saved_context suspend2_saved_contexts[NR_CPUS];
+#else
+#undef cpu_clear
+#define cpu_clear(a, b) do { } while(0)
+#endif
+static struct suspend2_saved_context suspend2_saved_context; /* temporary storage */
+
+#define loaddebug(thread,register) \
+ __asm__("movl %0,%%db" #register \
+ : /* no output */ \
+ :"r" ((thread)->debugreg[register]))
+
+
+/*
+ * save_processor_context
+ *
+ * Save the state of the processor before we go to sleep.
+ *
+ * return_stack is the value of the stack pointer (%esp) as the caller sees it.
+ * A good way could not be found to obtain it from here (don't want to make _too_
+ * many assumptions about the layout of the stack this far down.) Also, the
+ * handy little __builtin_frame_pointer(level) where level > 0, is blatantly
+ * buggy - it returns the value of the stack at the proper location, not the
+ * location, like it should (as of gcc 2.91.66)
+ *
+ * Note that the context and timing of this function is pretty critical.
+ * With a minimal amount of things going on in the caller and in here, gcc
+ * does a good job of being just a dumb compiler. Watch the assembly output
+ * if anything changes, though, and make sure everything is going in the right
+ * place.
+ */
+static inline void save_processor_context(void)
+{
+ kernel_fpu_begin();
+
+ /*
+ * descriptor tables
+ */
+ asm volatile ("sgdt (%0)" : "=m" (suspend2_saved_context.gdt_limit));
+ asm volatile ("sidt (%0)" : "=m" (suspend2_saved_context.idt_limit));
+ asm volatile ("sldt (%0)" : "=m" (suspend2_saved_context.ldt));
+ asm volatile ("str (%0)" : "=m" (suspend2_saved_context.tr));
+
+ /*
+ * save the general registers.
+ * note that gcc has constructs to specify output of certain registers,
+ * but they're not used here, because it assumes that you want to modify
+ * those registers, so it tries to be smart and save them beforehand.
+ * It's really not necessary, and kinda fishy (check the assembly output),
+ * so it's avoided.
+ */
+ asm volatile ("movl %%esp, (%0)" : "=m" (suspend2_saved_context.esp));
+ asm volatile ("movl %%eax, (%0)" : "=m" (suspend2_saved_context.eax));
+ asm volatile ("movl %%ebx, (%0)" : "=m" (suspend2_saved_context.ebx));
+ asm volatile ("movl %%ecx, (%0)" : "=m" (suspend2_saved_context.ecx));
+ asm volatile ("movl %%edx, (%0)" : "=m" (suspend2_saved_context.edx));
+ asm volatile ("movl %%ebp, (%0)" : "=m" (suspend2_saved_context.ebp));
+ asm volatile ("movl %%esi, (%0)" : "=m" (suspend2_saved_context.esi));
+ asm volatile ("movl %%edi, (%0)" : "=m" (suspend2_saved_context.edi));
+
+ /*
+ * segment registers
+ */
+ asm volatile ("movw %%es, %0" : "=r" (suspend2_saved_context.es));
+ asm volatile ("movw %%fs, %0" : "=r" (suspend2_saved_context.fs));
+ asm volatile ("movw %%gs, %0" : "=r" (suspend2_saved_context.gs));
+ asm volatile ("movw %%ss, %0" : "=r" (suspend2_saved_context.ss));
+
+ /*
+ * control registers
+ */
+ asm volatile ("movl %%cr0, %0" : "=r" (suspend2_saved_context.cr0));
+ asm volatile ("movl %%cr2, %0" : "=r" (suspend2_saved_context.cr2));
+ asm volatile ("movl %%cr3, %0" : "=r" (suspend2_saved_context.cr3));
+ asm volatile ("movl %%cr4, %0" : "=r" (suspend2_saved_context.cr4));
+
+ /*
+ * eflags
+ */
+ asm volatile ("pushfl ; popl (%0)" : "=m" (suspend2_saved_context.eflags));
+}
+
+static void fix_processor_context(void)
+{
+ int nr = smp_processor_id();
+ struct tss_struct * t = &per_cpu(init_tss,nr);
+
+ set_tss_desc(nr,t); /* This just modifies memory; should not be neccessary. But... This is neccessary, because 386 hardware has concept of busy tsc or some similar stupidity. */
+ per_cpu(cpu_gdt_table,nr)[GDT_ENTRY_TSS].b &= 0xfffffdff;
+
+ load_TR_desc();
+
+ load_LDT(&current->active_mm->context); /* This does lldt */
+
+ /*
+ * Now maybe reload the debug registers
+ */
+ if (current->thread.debugreg[7]){
+ loaddebug(&current->thread, 0);
+ loaddebug(&current->thread, 1);
+ loaddebug(&current->thread, 2);
+ loaddebug(&current->thread, 3);
+ /* no 4 and 5 */
+ loaddebug(&current->thread, 6);
+ loaddebug(&current->thread, 7);
+ }
+
+}
+
+static void do_fpu_end(void)
+{
+ /* restore FPU regs if necessary */
+ /* Do it out of line so that gcc does not move cr0 load to some stupid place */
+ kernel_fpu_end();
+}
+
+/*
+ * restore_processor_context
+ *
+ * Restore the processor context as it was before we went to sleep
+ * - descriptor tables
+ * - control registers
+ * - segment registers
+ * - flags
+ *
+ * Note that it is critical that this function is declared inline.
+ * It was separated out from restore_state to make that function
+ * a little clearer, but it needs to be inlined because we won't have a
+ * stack when we get here (so we can't push a return address).
+ */
+static inline void restore_processor_context(void)
+{
+ /*
+ * first restore %ds, so we can access our data properly
+ */
+ asm volatile (".align 4");
+ asm volatile ("movw %0, %%ds" :: "r" ((u16)__KERNEL_DS));
+
+
+ /*
+ * control registers
+ */
+ asm volatile ("movl %0, %%cr4" :: "r" (suspend2_saved_context.cr4));
+ asm volatile ("movl %0, %%cr3" :: "r" (suspend2_saved_context.cr3));
+ asm volatile ("movl %0, %%cr2" :: "r" (suspend2_saved_context.cr2));
+ asm volatile ("movl %0, %%cr0" :: "r" (suspend2_saved_context.cr0));
+
+ /*
+ * segment registers
+ */
+ asm volatile ("movw %0, %%es" :: "r" (suspend2_saved_context.es));
+ asm volatile ("movw %0, %%fs" :: "r" (suspend2_saved_context.fs));
+ asm volatile ("movw %0, %%gs" :: "r" (suspend2_saved_context.gs));
+ asm volatile ("movw %0, %%ss" :: "r" (suspend2_saved_context.ss));
+
+ /*
+ * the other general registers
+ *
+ * note that even though gcc has constructs to specify memory
+ * input into certain registers, it will try to be too smart
+ * and save them at the beginning of the function. This is esp.
+ * bad since we don't have a stack set up when we enter, and we
+ * want to preserve the values on exit. So, we set them manually.
+ */
+ asm volatile ("movl %0, %%esp" :: "m" (suspend2_saved_context.esp));
+ asm volatile ("movl %0, %%ebp" :: "m" (suspend2_saved_context.ebp));
+ asm volatile ("movl %0, %%eax" :: "m" (suspend2_saved_context.eax));
+ asm volatile ("movl %0, %%ebx" :: "m" (suspend2_saved_context.ebx));
+ asm volatile ("movl %0, %%ecx" :: "m" (suspend2_saved_context.ecx));
+ asm volatile ("movl %0, %%edx" :: "m" (suspend2_saved_context.edx));
+ asm volatile ("movl %0, %%esi" :: "m" (suspend2_saved_context.esi));
+ asm volatile ("movl %0, %%edi" :: "m" (suspend2_saved_context.edi));
+
+ /*
+ * now restore the descriptor tables to their proper values
+ * ltr is done i fix_processor_context().
+ */
+
+ asm volatile ("lgdt (%0)" :: "m" (suspend2_saved_context.gdt_limit));
+ asm volatile ("lidt (%0)" :: "m" (suspend2_saved_context.idt_limit));
+ asm volatile ("lldt (%0)" :: "m" (suspend2_saved_context.ldt));
+
+ fix_processor_context();
+
+ /*
+ * the flags
+ */
+ asm volatile ("pushl %0 ; popfl" :: "m" (suspend2_saved_context.eflags));
+
+ do_fpu_end();
+}
+
+#if defined(CONFIG_SOFTWARE_SUSPEND2) || defined(CONFIG_SMP)
+volatile static int loop __nosavedata = 0;
+extern atomic_t suspend_cpu_counter __nosavedata;
+volatile unsigned char * my_saved_context __nosavedata;
+volatile static unsigned long c_loops_per_jiffy_ref[NR_CPUS] __nosavedata;
+#endif
+
+#ifdef CONFIG_SOFTWARE_SUSPEND2
+/* Local variables for do_suspend2_lowlevel */
+volatile static int state1 __nosavedata = 0;
+volatile static int state2 __nosavedata = 0;
+volatile static int state3 __nosavedata = 0;
+volatile static struct range *origrange __nosavedata;
+volatile static struct range *copyrange __nosavedata;
+volatile static int origoffset __nosavedata;
+volatile static int copyoffset __nosavedata;
+volatile static unsigned long * origpage __nosavedata;
+volatile static unsigned long * copypage __nosavedata;
+volatile static int io_speed_save[2][2] __nosavedata;
+#ifndef CONFIG_SMP
+volatile static unsigned long cpu_khz_ref __nosavedata = 0;
+#endif
+
+/*
+ * APIC support: These routines save the APIC
+ * configuration for the CPU on which they are
+ * being executed
+ */
+extern void suspend_apic_save_state(void);
+extern void suspend_apic_reload_state(void);
+
+#ifdef CONFIG_SMP
+/* ------------------------------------------------
+ * BEGIN Irq affinity code, based on code from LKCD.
+ *
+ * IRQ affinity support:
+ * Save and restore IRQ affinities, and set them
+ * all to CPU 0.
+ *
+ * Section between dashes taken from LKCD code.
+ * Perhaps we should be working toward a shared library
+ * of such routines for kexec, lkcd, software suspend
+ * and whatever other similar projects there are?
+ */
+
+extern irq_desc_t irq_desc[];
+extern cpumask_t irq_affinity[];
+cpumask_t saved_affinity[NR_IRQS];
+
+/*
+ * Routine to save the old irq affinities and change affinities of all irqs to
+ * the dumping cpu.
+ */
+static void save_and_set_irq_affinity(void)
+{
+ int i;
+ int cpu = smp_processor_id();
+
+ memcpy(saved_affinity, irq_affinity, NR_IRQS * sizeof(cpumask_t));
+ for (i = 0; i < NR_IRQS; i++) {
+ if (irq_desc[i].handler == NULL)
+ continue;
+ irq_affinity[i] = cpumask_of_cpu(cpu);
+ if (irq_desc[i].handler->set_affinity != NULL)
+ irq_desc[i].handler->set_affinity(i, irq_affinity[i]);
+ }
+}
+
+/*
+ * Restore old irq affinities.
+ */
+static void reset_irq_affinity(void)
+{
+ int i;
+
+ memcpy(irq_affinity, saved_affinity, NR_IRQS * sizeof(cpumask_t));
+ for (i = 0; i < NR_IRQS; i++) {
+ if (irq_desc[i].handler == NULL)
+ continue;
+ if (irq_desc[i].handler->set_affinity != NULL)
+ irq_desc[i].handler->set_affinity(i, irq_affinity[i]);
+ }
+}
+
+/*
+ * END of IRQ affinity code, based on LKCD code.
+ * -----------------------------------------------------------------
+ */
+#else
+#define save_and_set_irq_affinity() do { } while(0)
+#define reset_irq_affinity() do { } while(0)
+#endif
+
+/*
+ * FIXME: This function should really be written in assembly. Actually
+ * requirement is that it does not touch stack, because %esp will be
+ * wrong during resume before restore_processor_context(). Check
+ * assembly if you modify this.
+ *
+ * SMP support:
+ * All SMP processors enter this routine during suspend. The one through
+ * which the suspend is initiated (which, for simplicity, is always CPU 0)
+ * sends the others here using an IPI during do_suspend2_suspend_1. They
+ * remain here until after the atomic copy of the kernel is made, to ensure
+ * that they don't mess with memory in the meantime (even just idling will
+ * do that). Once the atomic copy is made, they are free to carry on idling.
+ * Note that we must let them go, because if we're using compression, the
+ * vfree calls in the compressors will result in IPIs being called and hanging
+ * because the CPUs are still here.
+ *
+ * At resume time, we do a similar thing. CPU 0 sends the others in here using
+ * an IPI. It then copies the original kernel back, restores its own processor
+ * context and flushes local tlbs before freeing the others to do the same.
+ * They can then go back to idling while CPU 0 reloads pageset 2, cleans up
+ * and unfreezes the processes.
+ *
+ * (Remember that freezing and thawing processes also uses IPIs, as may
+ * decompressing the data. Again, therefore, we cannot leave the other processors
+ * in here).
+ *
+ * At the moment, we do nothing about APICs, even though the code is there.
+ */
+void do_suspend2_lowlevel(int resume)
+{
+ int processor_id = smp_processor_id();
+
+ if (!resume) {
+ /*
+ * Save the irq affinities before we freeze the
+ * other processors!
+ */
+ save_and_set_irq_affinity();
+ mtrr_save();
+
+ suspend2_core_ops->suspend1();
+ save_processor_context(); /* We need to capture registers and memory at "same time" */
+ suspend2_core_ops->suspend2(); /* If everything goes okay, this function does not return */
+ return;
+ }
+
+ state1 = suspend_action;
+ state2 = suspend_debug_state;
+ state3 = console_loglevel;
+ for (loop = 0; loop < 4; loop++)
+ io_speed_save[loop/2][loop%2] =
+ suspend_io_time[loop/2][loop%2];
+
+ /* Send all IRQs to CPU 0. We will replace the saved affinities
+ * with the suspend-time ones when we copy the original kernel
+ * back in place
+ */
+ save_and_set_irq_affinity();
+
+ c_loops_per_jiffy_ref[processor_id] = current_cpu_data.loops_per_jiffy;
+#ifndef CONFIG_SMP
+ cpu_khz_ref = cpu_khz;
+#endif
+
+ /* We want to run from swsusp_pg_dir, since swsusp_pg_dir is stored in constant
+ * place in memory
+ */
+
+ __asm__( "movl %%ecx,%%cr3\n" ::"c"(__pa(swsusp_pg_dir)));
+
+/*
+ * Final function for resuming: after copying the pages to their original
+ * position, it restores the register state.
+ *
+ * What about page tables? Writing data pages may toggle
+ * accessed/dirty bits in our page tables. That should be no problems
+ * with 4MB page tables. That's why we require have_pse.
+ *
+ * This loops destroys stack from under itself, so it better should
+ * not use any stack space, itself. When this function is entered at
+ * resume time, we move stack to _old_ place. This is means that this
+ * function must use no stack and no local variables in registers,
+ * until calling restore_processor_context();
+ *
+ * Critical section here: noone should touch saved memory after
+ * do_suspend2_resume_1; copying works, because nr_copy_pages,
+ * pagedir_resume, loop and loop2 are nosavedata.
+ *
+ * If we're running with DEBUG_PAGEALLOC, the boot and resume kernels both have
+ * all the pages we need mapped into kernel space, so we don't need to change
+ * page protections while doing the copy-back.
+ */
+
+ suspend2_core_ops->resume1();
+
+ origrange = pagedir_resume.origranges.first;
+ copyrange = pagedir_resume.destranges.first;
+ origoffset = origrange->minimum;
+ copyoffset = copyrange->minimum;
+ origpage = (unsigned long *) (lowmem_page_address(mem_map + origoffset));
+ copypage = (unsigned long *) (lowmem_page_address(mem_map + copyoffset));
+
+ BUG_ON(!irqs_disabled());
+
+ /* As of 2.0.0.51, pageset1 can include highmem pages. If
+ * !CONFIG_HIGHMEM, highstart_pfn == 0, hence the #ifdef.
+ */
+#ifdef CONFIG_HIGHMEM
+ while ((origrange) && (origoffset < highstart_pfn)) {
+#else
+ while (origrange) {
+#endif
+ for (loop=0; loop < (PAGE_SIZE / sizeof(unsigned long)); loop++) {
+ *(origpage + loop) = *(copypage + loop);
+ *(copypage + loop) = 0xb000b000;
+ }
+
+ if (origoffset < origrange->maximum) {
+ origoffset++;
+ origpage += (PAGE_SIZE / sizeof(unsigned long));
+ } else {
+ origrange = origrange->next;
+ if (origrange) {
+ origoffset = origrange->minimum;
+ origpage = (unsigned long *) (lowmem_page_address(mem_map + origoffset));
+ }
+ }
+
+ if (copyoffset < copyrange->maximum) {
+ copyoffset++;
+ copypage += (PAGE_SIZE / sizeof(unsigned long));
+ } else {
+ copyrange = copyrange->next;
+ if (copyrange) {
+ copyoffset = copyrange->minimum;
+ copypage = (unsigned long *) (lowmem_page_address(mem_map + copyoffset));
+ }
+ }
+ }
+
+ restore_processor_context();
+ cpu_clear(processor_id, per_cpu(cpu_tlbstate, processor_id).active_mm->cpu_vm_mask);
+ wbinvd();
+ __flush_tlb_all();
+
+ BUG_ON(!irqs_disabled());
+
+ /* Now we are running with our old stack, and with registers copied
+ * from suspend time. Let's copy back those remaining Highmem pages. */
+
+#ifdef CONFIG_HIGHMEM
+ while (origrange) {
+ unsigned long * origpage = (unsigned long *) kmap_atomic(mem_map + origoffset, KM_USER1);
+ for (loop=0; loop < (PAGE_SIZE / sizeof(unsigned long)); loop++) {
+ *(origpage + loop) = *(copypage + loop);
+ *(copypage + loop) = 0xb000b000;
+ }
+ kunmap_atomic(origpage, KM_USER1);
+
+ if (origoffset < origrange->maximum)
+ origoffset++;
+ else {
+ origrange = origrange->next;
+ if (origrange)
+ origoffset = origrange->minimum;
+ }
+
+ if (copyoffset < copyrange->maximum) {
+ copyoffset++;
+ copypage += (PAGE_SIZE / sizeof(unsigned long));
+ } else {
+ copyrange = copyrange->next;
+ if (copyrange) {
+ copyoffset = copyrange->minimum;
+ copypage = (unsigned long *) (page_address(mem_map + copyoffset));
+ }
+ }
+ }
+#endif
+
+ suspend2_verify_checksums();
+
+ BUG_ON(!irqs_disabled());
+
+ cpu_clear(processor_id, per_cpu(cpu_tlbstate, processor_id).active_mm->cpu_vm_mask);
+ wbinvd();
+ __flush_tlb_all();
+ mtrr_restore_one_cpu();
+
+ /* Get other CPUs to restore their contexts and flush their tlbs. */
+ clear_suspend_state(SUSPEND_FREEZE_SMP);
+
+ do {
+ cpu_relax();
+ barrier();
+ } while (atomic_read(&suspend_cpu_counter));
+
+ mtrr_restore_finish();
+
+ BUG_ON(!irqs_disabled());
+
+ /* put the irq affinity tables back */
+ reset_irq_affinity();
+
+ current_cpu_data.loops_per_jiffy = c_loops_per_jiffy_ref[processor_id];
+#ifndef CONFIG_SMP
+ loops_per_jiffy = c_loops_per_jiffy_ref[processor_id];
+ cpu_khz = cpu_khz_ref;
+#endif
+ suspend_action = state1;
+ suspend_debug_state = state2;
+ console_loglevel = state3;
+
+ for (loop = 0; loop < 4; loop++)
+ suspend_io_time[loop/2][loop%2] =
+ io_speed_save[loop/2][loop%2];
+
+ suspend2_core_ops->resume2();
+}
+EXPORT_SYMBOL(do_suspend2_lowlevel);
+#endif
+
+#ifdef CONFIG_SMP
+/*
+ * Save and restore processor state for secondary processors.
+ * IRQs (and therefore preemption) are already disabled
+ * when we enter here (IPI).
+ */
+
+void __smp_suspend_lowlevel(void * info)
+{
+ __asm__( "movl %%ecx,%%cr3\n" ::"c"(__pa(swsusp_pg_dir)));
+
+ if (test_suspend_state(SUSPEND_NOW_RESUMING)) {
+ BUG_ON(!irqs_disabled());
+ kernel_fpu_begin();
+ c_loops_per_jiffy_ref[smp_processor_id()] = current_cpu_data.loops_per_jiffy;
+ atomic_inc(&suspend_cpu_counter);
+
+ /* Only image copied back while we spin in this loop. Our
+ * task info should not be looked at while this is happening
+ * (which smp_processor_id() will do( */
+ while (test_suspend_state(SUSPEND_FREEZE_SMP)) {
+ cpu_relax();
+ barrier();
+ }
+
+ while (atomic_read(&suspend_cpu_counter) != smp_processor_id()) {
+ cpu_relax();
+ barrier();
+ }
+ my_saved_context = (unsigned char *) (suspend2_saved_contexts + smp_processor_id());
+ for (loop = sizeof(struct suspend2_saved_context); loop--; loop)
+ *(((unsigned char *) &suspend2_saved_context) + loop - 1) = *(my_saved_context + loop - 1);
+ restore_processor_context();
+ cpu_clear(smp_processor_id(), per_cpu(cpu_tlbstate, smp_processor_id()).active_mm->cpu_vm_mask);
+ load_cr3(swapper_pg_dir);
+ wbinvd();
+ __flush_tlb_all();
+ current_cpu_data.loops_per_jiffy = c_loops_per_jiffy_ref[smp_processor_id()];
+ mtrr_restore_one_cpu();
+ atomic_dec(&suspend_cpu_counter);
+ } else { /* suspending */
+ BUG_ON(!irqs_disabled());
+ /*
+ *Save context and go back to idling.
+ * Note that we cannot leave the processor
+ * here. It must be able to receive IPIs if
+ * the LZF compression driver (eg) does a
+ * vfree after compressing the kernel etc
+ */
+ while (test_suspend_state(SUSPEND_FREEZE_SMP) &&
+ (atomic_read(&suspend_cpu_counter) != (smp_processor_id() - 1))) {
+ cpu_relax();
+ barrier();
+ }
+ save_processor_context();
+ my_saved_context = (unsigned char *) (suspend2_saved_contexts + smp_processor_id());
+ for (loop = sizeof(struct suspend2_saved_context); loop--; loop)
+ *(my_saved_context + loop - 1) = *(((unsigned char *) &suspend2_saved_context) + loop - 1);
+ atomic_inc(&suspend_cpu_counter);
+ /* Now spin until the atomic copy of the kernel is made. */
+ while (test_suspend_state(SUSPEND_FREEZE_SMP)) {
+ cpu_relax();
+ barrier();
+ }
+ atomic_dec(&suspend_cpu_counter);
+ kernel_fpu_end();
+ }
+}
+
+EXPORT_SYMBOL(__smp_suspend_lowlevel);
+#endif /* SMP */


2004-11-24 14:03:46

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

While eating memory, we will potentially trigger this a lot. We
therefore disable the message when suspending.

diff -ruN 503-disable-page-alloc-warnings-while-suspending-old/mm/page_alloc.c 503-disable-page-alloc-warnings-while-suspending-new/mm/page_alloc.c
--- 503-disable-page-alloc-warnings-while-suspending-old/mm/page_alloc.c 2004-11-06 09:24:37.231308424 +1100
+++ 503-disable-page-alloc-warnings-while-suspending-new/mm/page_alloc.c 2004-11-06 09:24:40.844759096 +1100
@@ -725,7 +725,10 @@
}

nopage:
- if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
+ if ((!(gfp_mask & __GFP_NOWARN)) &&
+ (!test_suspend_state(SUSPEND_RUNNING)) &&
+ printk_ratelimit()) {
+
printk(KERN_WARNING "%s: page allocation failure."
" order:%d, mode:0x%x\n",
p->comm, order, gfp_mask);


2004-11-24 14:06:35

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 41/51: Ranges (extents).

This is the heart of our metadata storage.

The comments at the top of range.c say it all. Ranges are used to store
the which pages are in which pageset. The swapwriter uses them to store
which swap addresses are allocated and what block numbers on the swap
devices those addresses map to (we can't swapon at resume time to find
out).

diff -ruN 831-range-old/kernel/power/range.c 831-range-new/kernel/power/range.c
--- 831-range-old/kernel/power/range.c 1970-01-01 10:00:00.000000000 +1000
+++ 831-range-new/kernel/power/range.c 2004-11-04 16:27:41.000000000 +1100
@@ -0,0 +1,784 @@
+/* Suspend2 routines for manipulating ranges.
+ *
+ * (C) 2003-2004 Nigel Cunningham <[email protected]>
+ *
+ * Distributed under GPLv2.
+ *
+ * These encapsulate the manipulation of ranges. I learnt after writing this
+ * code that ranges are more commonly called extents. They work like this:
+ *
+ * A lot of the data that suspend saves involves continguous ranges of memory
+ * or storage. Let's say that we're storing data on disk in blocks 1-32768 and
+ * 49152-49848 of a swap partition. Rather than recording 1, 2, 3... in arrays
+ * pointing to the locations, we simply use:
+ *
+ * struct range {
+ * unsigned long min;
+ * unsigned long max;
+ * struct range * next;
+ * }
+ *
+ * We can then store 1-32768 and 49152-49848 in 2 struct ranges, using 24 bytes
+ * instead of something like 133,860. This is of course inefficient where a range
+ * covers only one or two values, but the benefits gained by the much larger
+ * ranges more than outweight these instances.
+ *
+ * Whole pages are allocated to store ranges, with unused structs being chained
+ * together and linked into an unused_ranges list:
+ *
+ * struct range * unused_ranges; (just below).
+ *
+ * We can fit 341 ranges in a 4096 byte page (rangepage), with 4 bytes left over.
+ * These four bytes, referred to as the RangePageLink, are used to link the pages
+ * together. The RangePageLink is a pointer to the next page, or'd with the index
+ * number of the page.
+ *
+ * RangePages are stored in the header of the suspend image. For portability
+ * between suspend time and resume time, we 'relativise' the contents of each page
+ * before writing them to disk. That is, each .next and each RangePageLink is
+ * changed to point not to an absolute location, but to the relative location in
+ * the list of pages. This makes all the information valid and usable (after it
+ * has been absolutised again, of course) regardless of where it is reloaded to
+ * at resume time.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/suspend.h>
+#include <linux/mm.h>
+
+#include "pageflags.h"
+#include "suspend.h"
+
+struct range * unused_ranges = NULL;
+int nr_unused_ranges = 0;
+int max_ranges_used = 0;
+int num_range_pages = 0;
+static unsigned long ranges_allocated = 0;
+struct range * first_range_page = NULL, * last_range_page = NULL;
+
+/* Add_range_pages
+ *
+ * Allocates and initialises new pages for storing ranges.
+ * Returns 1 on failure to get a page.
+ * Otherwise adds the new pages to the unused_ranges pool and returns 0.
+ * During resuming, it ensures the page added doesn't collide with memory that
+ * will be overwritten when copying the original kernel back.
+ */
+
+static int add_range_pages(int number_requested)
+{
+ int i, j;
+ struct range * ranges;
+ void **eaten_memory = NULL, **this;
+
+ for (j = 0; j < number_requested; j++) {
+ if (test_suspend_state(SUSPEND_NOW_RESUMING)) {
+ struct page * pageaddr;
+ /* Make sure page doesn't collide when we're resuming */
+ while ((this = (void **) get_zeroed_page(GFP_ATOMIC))) {
+ pageaddr = virt_to_page(this);
+ if (!PageInUse(pageaddr))
+ break;
+ *this = eaten_memory;
+ eaten_memory = this;
+ }
+ // Free unwanted memory
+ while(eaten_memory) {
+ this = eaten_memory;
+ eaten_memory = *eaten_memory;
+ free_page((unsigned long) this);
+ }
+ } else
+ this = (void *) get_grabbed_pages(0);
+
+ if (!this)
+ return 1;
+
+ num_range_pages++;
+ if (!first_range_page)
+ first_range_page = (struct range *) this;
+ if (last_range_page)
+ *RANGEPAGELINK(last_range_page) |= (unsigned long) this;
+ *RANGEPAGELINK(this) = num_range_pages;
+ last_range_page = (struct range *) this;
+ ranges = (struct range *) this;
+ for (i = 0; i < RANGES_PER_PAGE; i++)
+ (ranges+i)->next = (ranges+i+1);
+ (ranges + i - 1)->next = unused_ranges;
+ unused_ranges = ranges;
+ nr_unused_ranges += i;
+ }
+ return 0;
+}
+
+
+/*
+ * Free ranges.
+ *
+ * Frees pages allocated by add_range_pages()
+ *
+ * Checks that all ranges allocated have been freed and emits a warning if this
+ * is not true.
+ */
+
+int free_ranges(void)
+{
+ int i;
+ struct range * this_range_page = first_range_page,
+ * next_range_page = NULL;
+
+ if (ranges_allocated)
+ printk(" *** Warning: %ld ranges still allocated when "
+ "free_ranges() called.\n", ranges_allocated);
+
+ for (i = 0; i < num_range_pages; i++) {
+ next_range_page = (struct range *)
+ (((unsigned long)
+ (*RANGEPAGELINK(this_range_page))) & PAGE_MASK);
+ free_pages((unsigned long) this_range_page, 0);
+ this_range_page = next_range_page;
+ }
+
+ nr_unused_ranges = num_range_pages = ranges_allocated = 0;
+ unused_ranges = last_range_page = first_range_page = NULL;
+
+ return 0;
+}
+
+/* get_range
+ *
+ * Returns a free range, having removed it from the unused list and having
+ * incremented the usage count. May imply allocating a new page and may
+ * therefore fail, returning NULL instead.
+ *
+ * No locking. This is because we are only called from suspend, which is single
+ * threaded.
+ */
+
+static struct range * get_range(void)
+{
+ struct range * result;
+
+ if ((!unused_ranges) && (add_range_pages(1)))
+ return NULL;
+
+ result = unused_ranges;
+ unused_ranges = unused_ranges->next;
+ nr_unused_ranges--;
+ ranges_allocated++;
+ if (ranges_allocated > max_ranges_used)
+ max_ranges_used++;
+ result->minimum = result->maximum = 0;
+ result->next = NULL;
+ return result;
+}
+
+/*
+ * put_range.
+ *
+ * Returns a range to the pool of unused pages and decrements the usage count.
+ *
+ * Assumes unlinking is done by the caller.
+ */
+void put_range(struct range * range)
+{
+ if (!range) {
+ printk("Error! put_range called with NULL range.\n");
+ return;
+ }
+ range->minimum = range->maximum = 0;
+ range->next = unused_ranges;
+ unused_ranges = range;
+ ranges_allocated--;
+ nr_unused_ranges++;
+}
+
+/*
+ * put_range_chain.
+ *
+ * Returns a whole chain of ranges to the unused pool.
+ */
+void put_range_chain(struct rangechain * chain)
+{
+ int count = 0;
+ struct range * this;
+
+ if (chain->first) {
+ this = chain->first;
+ while (this) {
+ this->minimum = this->maximum = 0;
+ this=this->next;
+ }
+ chain->last->next = unused_ranges;
+ unused_ranges = chain->first;
+ count = chain->allocs - chain->frees;
+ ranges_allocated -= count;
+ nr_unused_ranges += count;
+
+ chain->first = NULL;
+ chain->last = NULL;
+ chain->size = 0;
+ chain->allocs = 0;
+ chain->frees = 0;
+ chain->timesusedoptimisation = 0;
+ chain->lastaccessed = NULL; /* Invalidate optimisation info */
+ chain->prevtolastaccessed = NULL;
+ chain->prevtoprev = NULL;
+ }
+}
+
+/* print_chain.
+ *
+ * Displays the contents of a chain.
+ *
+ * printmethod:
+ * 0: integer
+ * 1: hex
+ * 2: page number
+ */
+void print_chain(int debuglevel, struct rangechain * chain, int printmethod)
+{
+ struct range * this = chain->first;
+ int count = 0, size = 0;
+
+ if ((console_loglevel < debuglevel) || (!this) ||
+ (!TEST_DEBUG_STATE(SUSPEND_RANGES)))
+ return;
+
+ if (!chain->name)
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "Chain %p\n", chain);
+ else
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "%s\n", chain->name);
+
+ while (this) {
+ /*
+ * 'This' is printed separately so it is displayed if an oops
+ * results.
+ */
+ switch (printmethod) {
+ case 0:
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "(%p) ",
+ this);
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "%lx-%lx; ",
+ this->minimum, this->maximum);
+ break;
+ case 1:
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "(%p)",
+ this);
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "%lu-%lu; ",
+ this->minimum, this->maximum);
+ break;
+ case 2:
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "(%p)",
+ this);
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "%p-%p; ",
+ page_address(mem_map+this->minimum),
+ page_address(mem_map+this->maximum) +
+ PAGE_SIZE - 1);
+ break;
+ }
+ size+= this->maximum - this->minimum + 1;
+ this = this->next;
+ count++;
+ if (!(count%4))
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "\n");
+ }
+
+ if ((count%4))
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "\n");
+
+ suspend_message(SUSPEND_RANGES, debuglevel, 1,"%d entries/%ld allocated. "
+ "Allocated %d and freed %d. Size %d.",
+ count,
+ ranges_allocated,
+ chain->allocs,
+ chain->frees,
+ size);
+ if (count != (chain->allocs - chain->frees)) {
+ chain->debug = 1;
+ check_shift_keys(1, "Discrepancy in chain.");
+ }
+ suspend_message(SUSPEND_RANGES, debuglevel, 1, "\n");
+}
+
+/*
+ * add_to_range_chain.
+ *
+ * Takes a value to be stored and a pointer to a chain and adds the value to
+ * the range chain, merging with an existing range or adding a new entry as
+ * necessary. Ranges are stored in increasing order.
+ *
+ * Values should be consecutive, and so may need to be transformed first. (eg
+ * for pages, would want to call with page-mem_map).
+ *
+ * Important optimisation:
+ * We store in the chain info the location of the last range accessed or added
+ * (and its previous). If the next value is outside this range by one, we start
+ * from the previous entry instead of the start of the chain. In cases of heavy
+ * fragmentation, this saves a lot of time searching.
+ *
+ * Returns:
+ * 0 if successful
+ * 1 if the value is already included.
+ * 2 if unable to allocate memory.
+ * 3 if fall out bottom (shouldn't happen).
+ */
+
+int add_to_range_chain(struct rangechain * chain, unsigned long value)
+{
+ struct range * this, * prev = NULL, * prevtoprev = NULL;
+ int usedoptimisation = 0;
+
+ if (!chain->first) { /* Empty */
+ chain->last = chain->first = get_range();
+ if (!chain->first) {
+ printk("Error unable to allocate the first range for "
+ "the chain.\n");
+ return 2;
+ }
+ chain->allocs++;
+ chain->first->maximum = value;
+ chain->first->minimum = value;
+ chain->size++;
+ return 0;
+ }
+
+ this = chain->first;
+
+ if (chain->lastaccessed && chain->prevtolastaccessed &&
+ chain->prevtoprev) {
+ if ((value + 1) == chain->lastaccessed->minimum) {
+ prev = chain->prevtoprev;
+ this = chain->prevtolastaccessed;
+ usedoptimisation = 1;
+ } else if (((value - 1) == chain->lastaccessed->maximum)) {
+ prev = chain->prevtolastaccessed;
+ this = chain->lastaccessed;
+ usedoptimisation = 1;
+ }
+ }
+
+ while (this) {
+ /* Need new entry prior to this? */
+ if ((value + 1) < this->minimum) {
+ struct range * new = get_range();
+ if (!new)
+ return 2;
+ chain->allocs++;
+ new->minimum = value;
+ new->maximum = value;
+ new->next = this;
+ /* Prior to start of chain? */
+ if (!prev)
+ chain->first = new;
+ else
+ prev->next = new;
+ if (!usedoptimisation) {
+ chain->prevtoprev = prevtoprev;
+ chain->prevtolastaccessed = prev;
+ chain->lastaccessed = new;
+ }
+ chain->size++;
+ return 0;
+ }
+
+ if ((this->minimum <= value) && (this->maximum >= value)) {
+ if (chain->name)
+ printk("%s:", chain->name);
+ else
+ printk("%p:", chain);
+ printk("Trying to add a value (%ld/0x%lx) already "
+ "included in chain.\n",
+ value, value);
+ print_chain(SUSPEND_ERROR, chain, 0);
+ check_shift_keys(1, NULL);
+ return 1;
+ }
+ if ((value + 1) == this->minimum) {
+ this->minimum = value;
+ if (!usedoptimisation) {
+ chain->prevtoprev = prevtoprev;
+ chain->prevtolastaccessed = prev;
+ chain->lastaccessed = this;
+ }
+ chain->size++;
+ return 0;
+ }
+ if ((value - 1) == this->maximum) {
+ if ((this->next) &&
+ (this->next->minimum == value + 1)) {
+ struct range * oldnext = this->next;
+ this->maximum = this->next->maximum;
+ this->next = this->next->next;
+ if ((chain->last) == oldnext)
+ chain->last = this;
+ put_range(oldnext);
+ /* Invalidate optimisation info */
+ chain->lastaccessed = NULL;
+ chain->frees++;
+ if (!usedoptimisation) {
+ chain->prevtoprev = prevtoprev;
+ chain->prevtolastaccessed = prev;
+ chain->lastaccessed = this;
+ }
+ chain->size++;
+ return 0;
+ }
+ this->maximum = value;
+ if (!usedoptimisation) {
+ chain->prevtoprev = prevtoprev;
+ chain->prevtolastaccessed = prev;
+ chain->lastaccessed = this;
+ }
+ chain->size++;
+ return 0;
+ }
+ if (!this->next) {
+ struct range * new = get_range();
+ if (!new) {
+ printk("Error unable to append a new range to "
+ "the chain.\n");
+ return 2;
+ }
+ chain->allocs++;
+ new->minimum = value;
+ new->maximum = value;
+ new->next = NULL;
+ this->next = new;
+ chain->last = new;
+ if (!usedoptimisation) {
+ chain->prevtoprev = prev;
+ chain->prevtolastaccessed = this;
+ chain->lastaccessed = new;
+ }
+ chain->size++;
+ return 0;
+ }
+ prevtoprev = prev;
+ prev = this;
+ this = this->next;
+ }
+ printk("\nFell out the bottom of add_to_range_chain. This shouldn't "
+ "happen!\n");
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ return 3;
+}
+
+/* append_range
+ * Used where we know a range is to be added to the end of the list
+ * and does not need merging with the current last range.
+ * (count_data_pages only at the moment)
+ */
+
+int append_range_to_range_chain(struct rangechain * chain,
+ unsigned long minimum, unsigned long maximum)
+{
+ struct range * newrange = NULL;
+
+ newrange = get_range();
+ if (!newrange) {
+ printk("Error unable to append a new range to the chain.\n");
+ return 2;
+ }
+
+ chain->allocs++;
+ chain->size+= (maximum - minimum + 1);
+ newrange->minimum = minimum;
+ newrange->maximum = maximum;
+ newrange->next = NULL;
+
+ if (chain->last) {
+ chain->last->next = newrange;
+ chain->last = newrange;
+ } else
+ chain->last = chain->first = newrange;
+
+ /* No need to reset optimisation info since added to end */
+ return 0;
+}
+
+int append_to_range_chain(int chain, unsigned long min, unsigned long max)
+{
+ int result = 0;
+
+ switch (chain) {
+ case 0:
+ return 0;
+ case 1:
+ result = append_range_to_range_chain(
+ &pagedir1.origranges, min, max);
+ break;
+ case 2:
+ result = append_range_to_range_chain(
+ &pagedir2.origranges, min, max);
+ if (!result)
+ result = append_range_to_range_chain(
+ &pagedir1.destranges, min, max);
+ }
+ return result;
+}
+
+/* -------------- Routines for relativising and absoluting ranges -------------
+ *
+ * Prepare rangesets for save by translating addresses to relative indices.
+ */
+void relativise_ranges(void)
+{
+ struct range * this_range_page = first_range_page;
+ int i;
+
+ while (this_range_page) {
+ struct range * this_range = this_range_page;
+ for (i = 0; i < RANGES_PER_PAGE; i++) {
+ if (this_range->next) {
+ struct range * orig = this_range->next;
+ this_range->next =
+ RANGE_RELATIVE(this_range->next);
+ suspend_message(SUSPEND_RANGES, SUSPEND_VERBOSE, 1,
+ "Relativised range %d on this page is %p. Absolutised range is %p.\n",
+ i, this_range->next, orig);
+ }
+ this_range++;
+ }
+ this_range_page = (struct range *)
+ ((*RANGEPAGELINK(this_range_page)) & PAGE_MASK);
+ }
+}
+
+/* Convert ->next pointers for ranges back to absolute values.
+ * The issue is finding out what page the absolute value is now at.
+ * If we use an array of values, we gain speed, but then we need to
+ * be able to allocate contiguous pages. Fortunately, this is done
+ * prior to loading pagesets, so we can just allocate the pages
+ * needed, set up our array and use it and then discard the data
+ * before we exit.
+ */
+
+void absolutise_ranges()
+{
+ struct range * this_range_page = first_range_page;
+ int i;
+
+ while (this_range_page) {
+ struct range * this_range = this_range_page;
+ for (i = 0; i < RANGES_PER_PAGE; i++) {
+ if (this_range->next) {
+ struct range * orig = this_range->next;
+ this_range->next =
+ RANGE_ABSOLUTE(this_range->next);
+ suspend_message(SUSPEND_RANGES, SUSPEND_VERBOSE, 1,
+ "Relativised range %d on this page is %p. Absolutised range is %p.\n",
+ i, orig, this_range->next);
+ }
+ this_range++;
+ }
+ this_range_page = (struct range *)
+ ((*RANGEPAGELINK(this_range_page)) & PAGE_MASK);
+ }
+}
+
+void absolutise_chain(struct rangechain * chain)
+{
+ if (chain->first)
+ chain->first = RANGE_ABSOLUTE(chain->first);
+ if (chain->last)
+ chain->last = RANGE_ABSOLUTE(chain->last);
+ if (chain->lastaccessed)
+ chain->lastaccessed = RANGE_ABSOLUTE(chain->lastaccessed);
+ if (chain->prevtolastaccessed)
+ chain->prevtolastaccessed =
+ RANGE_ABSOLUTE(chain->prevtolastaccessed);
+ if (chain->prevtoprev)
+ chain->prevtoprev =
+ RANGE_ABSOLUTE(chain->prevtoprev);
+}
+
+void relativise_chain(struct rangechain * chain)
+{
+ if (chain->first)
+ chain->first = RANGE_RELATIVE(chain->first);
+ if (chain->last)
+ chain->last = RANGE_RELATIVE(chain->last);
+ if (chain->lastaccessed)
+ chain->lastaccessed = RANGE_RELATIVE(chain->lastaccessed);
+ if (chain->prevtolastaccessed)
+ chain->prevtolastaccessed =
+ RANGE_RELATIVE(chain->prevtolastaccessed);
+ if (chain->prevtoprev)
+ chain->prevtoprev = RANGE_RELATIVE(chain->prevtoprev);
+}
+
+/*
+ * Each page in the rangepages lists starts with a pointer to the next page
+ * containing the list. This lets us only use order zero allocations.
+ */
+#define POINTERS_PER_PAGE ((PAGE_SIZE / sizeof(void *)) - 1)
+static unsigned long * range_pagelist = NULL;
+
+unsigned long * get_rangepages_list_entry(int index)
+{
+ int pagenum, offset, i;
+ unsigned long * current_list_page = range_pagelist;
+
+ BUG_ON(index > num_range_pages);
+
+ pagenum = index / POINTERS_PER_PAGE;
+ offset = index - (pagenum * POINTERS_PER_PAGE);
+
+ for (i = 0; i < pagenum; i++)
+ current_list_page = *((unsigned long **) current_list_page);
+
+ return (unsigned long *) current_list_page[offset];
+}
+
+int get_rangepages_list(void)
+{
+ struct range * this_range_page = first_range_page;
+ int i, j, pages_needed, num_in_this_page;
+ unsigned long * current_list_page = range_pagelist;
+ unsigned long * prev_list_page = NULL;
+
+ pages_needed =
+ ((num_range_pages + POINTERS_PER_PAGE - 1) / POINTERS_PER_PAGE);
+
+ for (i = 0; i < pages_needed; i++) {
+ int page_start = i * POINTERS_PER_PAGE;
+
+ if (!current_list_page) {
+ current_list_page =
+ (unsigned long *) get_grabbed_pages(0);
+ if (!current_list_page)
+ current_list_page = (unsigned long *) get_zeroed_page(GFP_ATOMIC);
+ if (!current_list_page) {
+ abort_suspend("Unable to allocate memory for a range pages list.");
+ printk("Number of range pages is %d.\n", num_range_pages);
+ return -ENOMEM;
+ }
+
+ current_list_page[0] = 0;
+ if (!prev_list_page)
+ range_pagelist = current_list_page;
+ else {
+ *prev_list_page = (unsigned long) current_list_page;
+ prev_list_page = current_list_page;
+ }
+ }
+
+ num_in_this_page = num_range_pages - page_start;
+ if (num_in_this_page > POINTERS_PER_PAGE)
+ num_in_this_page = POINTERS_PER_PAGE;
+
+ for (j = 1; j <= num_in_this_page; j++) {
+ current_list_page[j] = (unsigned long) this_range_page;
+
+ this_range_page = (struct range *) (((unsigned long)
+ (*RANGEPAGELINK(this_range_page))) & PAGE_MASK);
+ }
+
+ for (j = (num_in_this_page + 1); j <= POINTERS_PER_PAGE; j++)
+ current_list_page[j] = 0;
+
+ if ((num_range_pages - page_start) > POINTERS_PER_PAGE)
+ current_list_page = (unsigned long *) current_list_page[0];
+ }
+
+ return 0;
+}
+
+void put_rangepages_list(void)
+{
+ unsigned long * last;
+
+ while (range_pagelist) {
+ last = range_pagelist;
+ range_pagelist = *((unsigned long **) range_pagelist);
+ free_pages((unsigned long) last, 0);
+ }
+}
+
+int PageRangePage(char * seeking)
+{
+ int i;
+
+ for (i = 1; i <= num_range_pages; i++)
+ if (get_rangepages_list_entry(i) ==
+ (unsigned long *) seeking)
+ return 1;
+
+ return 0;
+}
+/* relocate_rangepages
+ *
+ * Called at the start of resuming. As well as absolutising pages, we need
+ * to ensure they won't be overwritten by the kernel we're restoring.
+ */
+int relocate_rangepages()
+{
+ void **eaten_memory = NULL;
+ void **c = eaten_memory, *m = NULL, *f;
+ int oom = 0, i, numeaten = 0;
+ unsigned long * prev_page = NULL;
+
+ for (i = 1; i <= num_range_pages; i++) {
+ int this_collides = 0;
+ unsigned long * this_page = get_rangepages_list_entry(i);
+
+ this_collides = PageInUse(virt_to_page(this_page));
+
+ if (!this_collides) {
+ prev_page = this_page;
+ continue;
+ }
+
+ while ((m = (void *) get_zeroed_page(GFP_ATOMIC))) {
+ memset(m, 0, PAGE_SIZE);
+ if (!PageInUse(virt_to_page(m))) {
+ copy_page(m, (void *) this_page);
+ free_page((unsigned long) this_page);
+ if (i == 1)
+ first_range_page = m;
+ else
+ *RANGEPAGELINK(prev_page) =
+ (i | (unsigned long) m);
+ prev_page = m;
+ break;
+ }
+ numeaten++;
+ eaten_memory = m;
+ *eaten_memory = c;
+ c = eaten_memory;
+ }
+
+ if (!m) {
+ printk("\nRan out of memory trying to relocate "
+ "rangepages (tried %d pages).\n", numeaten);
+ oom = 1;
+ break;
+ }
+ }
+
+ c = eaten_memory;
+ while(c) {
+ f = c;
+ c = *c;
+ if (f)
+ free_pages((unsigned long) f, 0);
+ }
+ eaten_memory = NULL;
+
+ if (oom)
+ return -ENOMEM;
+ else
+ return 0;
+}
+
+EXPORT_SYMBOL(put_rangepages_list);
+EXPORT_SYMBOL(get_rangepages_list);
+EXPORT_SYMBOL(get_rangepages_list_entry);
+EXPORT_SYMBOL(absolutise_chain);
+EXPORT_SYMBOL(relativise_chain);
+EXPORT_SYMBOL(put_range);
+EXPORT_SYMBOL(put_range_chain);
+EXPORT_SYMBOL(add_to_range_chain);
+EXPORT_SYMBOL(PageRangePage);


2004-11-24 14:03:47

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 21/51: Refrigerator upgrade.

Here's the suspend2 version of the process refrigerator. We do things in three steps:

1. Freeze userspace threads (p->mm != NULL) that don't have
PF_SYNCTHREAD. This should stop new I/O being submitted.
2. Let data get synced to disk and run our own sys_sync just in case no
one else was. PF_SYNCTHREAD is given to processes when they begin a
sys_sync (or sibling) and removed when they exit the call, so no sync
operations were hung under step 1. After this we set the DISABLE_SYNCING
flag to stop further syncs.
3. Since kernel threads that don't have PF_NOFREEZE.

Included in this patch is a new try_to_freeze() macro Andrew M suggested
a while back. The refrigerator declarations are put in sched.h to save
extra includes of suspend.h.

Changes to keep swsusp working are included.

Note that you can also thaw just the kernel threads; this allows syncing
while eating memory.

diff -ruN 582-refrigerator-old/arch/arm/kernel/signal.c 582-refrigerator-new/arch/arm/kernel/signal.c
--- 582-refrigerator-old/arch/arm/kernel/signal.c 2004-11-24 09:52:51.000000000 +1100
+++ 582-refrigerator-new/arch/arm/kernel/signal.c 2004-11-24 17:56:06.836085952 +1100
@@ -12,7 +12,6 @@
#include <linux/signal.h>
#include <linux/ptrace.h>
#include <linux/personality.h>
-#include <linux/suspend.h>

#include <asm/cacheflush.h>
#include <asm/ucontext.h>
@@ -689,10 +688,8 @@
if (!user_mode(regs))
return 0;

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (current->ptrace & PT_SINGLESTEP)
ptrace_cancel_bpt(current);
diff -ruN 582-refrigerator-old/arch/i386/kernel/io_apic.c 582-refrigerator-new/arch/i386/kernel/io_apic.c
--- 582-refrigerator-old/arch/i386/kernel/io_apic.c 2004-11-24 18:03:13.053291088 +1100
+++ 582-refrigerator-new/arch/i386/kernel/io_apic.c 2004-11-24 17:56:06.839085496 +1100
@@ -575,6 +575,7 @@
for ( ; ; ) {
set_current_state(TASK_INTERRUPTIBLE);
time_remaining = schedule_timeout(time_remaining);
+ try_to_freeze(PF_FREEZE);
if (time_after(jiffies,
prev_balance_time+balanced_irq_interval)) {
do_irq_balance();
diff -ruN 582-refrigerator-old/arch/i386/kernel/signal.c 582-refrigerator-new/arch/i386/kernel/signal.c
--- 582-refrigerator-old/arch/i386/kernel/signal.c 2004-11-24 18:03:13.109282576 +1100
+++ 582-refrigerator-new/arch/i386/kernel/signal.c 2004-11-24 17:56:06.840085344 +1100
@@ -18,7 +18,6 @@
#include <linux/unistd.h>
#include <linux/stddef.h>
#include <linux/personality.h>
-#include <linux/suspend.h>
#include <linux/ptrace.h>
#include <linux/elf.h>
#include <asm/processor.h>
@@ -596,10 +595,8 @@
return 1;
#endif

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(PF_FREEZE) && !signal_pending(current))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/arch/mips/kernel/irixsig.c 582-refrigerator-new/arch/mips/kernel/irixsig.c
--- 582-refrigerator-old/arch/mips/kernel/irixsig.c 2004-11-03 21:54:45.000000000 +1100
+++ 582-refrigerator-new/arch/mips/kernel/irixsig.c 2004-11-24 17:56:06.855083064 +1100
@@ -13,7 +13,6 @@
#include <linux/smp_lock.h>
#include <linux/time.h>
#include <linux/ptrace.h>
-#include <linux/suspend.h>

#include <asm/ptrace.h>
#include <asm/uaccess.h>
@@ -182,10 +181,8 @@
if (!user_mode(regs))
return 1;

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/arch/mips/kernel/signal32.c 582-refrigerator-new/arch/mips/kernel/signal32.c
--- 582-refrigerator-old/arch/mips/kernel/signal32.c 2004-11-24 09:52:53.000000000 +1100
+++ 582-refrigerator-new/arch/mips/kernel/signal32.c 2004-11-24 17:56:30.046557424 +1100
@@ -18,7 +18,6 @@
#include <linux/wait.h>
#include <linux/ptrace.h>
#include <linux/compat.h>
-#include <linux/suspend.h>
#include <linux/bitops.h>

#include <asm/asm.h>
@@ -700,10 +699,8 @@
if (!user_mode(regs))
return 1;

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/arch/mips/kernel/signal.c 582-refrigerator-new/arch/mips/kernel/signal.c
--- 582-refrigerator-old/arch/mips/kernel/signal.c 2004-11-24 09:52:53.000000000 +1100
+++ 582-refrigerator-new/arch/mips/kernel/signal.c 2004-11-24 17:56:06.893077288 +1100
@@ -18,7 +18,6 @@
#include <linux/errno.h>
#include <linux/wait.h>
#include <linux/ptrace.h>
-#include <linux/suspend.h>
#include <linux/unistd.h>
#include <linux/bitops.h>

@@ -551,10 +550,8 @@
if (!user_mode(regs))
return 1;

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/arch/sh/kernel/signal.c 582-refrigerator-new/arch/sh/kernel/signal.c
--- 582-refrigerator-old/arch/sh/kernel/signal.c 2004-11-24 09:52:54.000000000 +1100
+++ 582-refrigerator-new/arch/sh/kernel/signal.c 2004-11-24 17:56:06.899076376 +1100
@@ -24,7 +24,6 @@
#include <linux/tty.h>
#include <linux/personality.h>
#include <linux/binfmts.h>
-#include <linux/suspend.h>

#include <asm/ucontext.h>
#include <asm/uaccess.h>
@@ -579,10 +578,8 @@
if (!user_mode(regs))
return 1;

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/arch/sh64/kernel/signal.c 582-refrigerator-new/arch/sh64/kernel/signal.c
--- 582-refrigerator-old/arch/sh64/kernel/signal.c 2004-11-03 21:53:44.000000000 +1100
+++ 582-refrigerator-new/arch/sh64/kernel/signal.c 2004-11-24 17:56:06.914074096 +1100
@@ -701,10 +701,8 @@
if (!user_mode(regs))
return 1;

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/arch/x86_64/kernel/signal.c 582-refrigerator-new/arch/x86_64/kernel/signal.c
--- 582-refrigerator-old/arch/x86_64/kernel/signal.c 2004-11-03 21:54:16.000000000 +1100
+++ 582-refrigerator-new/arch/x86_64/kernel/signal.c 2004-11-24 17:56:06.939070296 +1100
@@ -24,7 +24,6 @@
#include <linux/stddef.h>
#include <linux/personality.h>
#include <linux/compiler.h>
-#include <linux/suspend.h>
#include <asm/ucontext.h>
#include <asm/uaccess.h>
#include <asm/i387.h>
@@ -417,10 +416,8 @@
return 1;
}

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(0))
goto no_signal;
- }

if (!oldset)
oldset = &current->blocked;
diff -ruN 582-refrigerator-old/drivers/ieee1394/ieee1394_core.c 582-refrigerator-new/drivers/ieee1394/ieee1394_core.c
--- 582-refrigerator-old/drivers/ieee1394/ieee1394_core.c 2004-11-03 21:52:22.000000000 +1100
+++ 582-refrigerator-new/drivers/ieee1394/ieee1394_core.c 2004-11-24 17:56:06.947069080 +1100
@@ -32,7 +32,6 @@
#include <linux/bitops.h>
#include <linux/kdev_t.h>
#include <linux/skbuff.h>
-#include <linux/suspend.h>

#include <asm/byteorder.h>
#include <asm/semaphore.h>
@@ -1030,15 +1029,17 @@

daemonize("khpsbpkt");

- while (!down_interruptible(&khpsbpkt_sig)) {
- if (khpsbpkt_kill)
+ while (1) {
+ if (down_interruptible(&khpsbpkt_sig)) {
+ if (try_to_freeze(0))
+ continue;
+ printk("khpsbpkt: received unexpected signal?!\n" );
break;
-
- if (current->flags & PF_FREEZE) {
- refrigerator(0);
- continue;
}

+ if (khpsbpkt_kill)
+ break;
+
while ((skb = skb_dequeue(&hpsbpkt_queue)) != NULL) {
packet = (struct hpsb_packet *)skb->data;

diff -ruN 582-refrigerator-old/drivers/ieee1394/nodemgr.c 582-refrigerator-new/drivers/ieee1394/nodemgr.c
--- 582-refrigerator-old/drivers/ieee1394/nodemgr.c 2004-11-24 09:52:58.000000000 +1100
+++ 582-refrigerator-new/drivers/ieee1394/nodemgr.c 2004-11-24 17:57:00.519924768 +1100
@@ -19,7 +19,6 @@
#include <linux/delay.h>
#include <linux/pci.h>
#include <linux/moduleparam.h>
-#include <linux/suspend.h>
#include <asm/atomic.h>

#include "ieee1394_types.h"
@@ -1480,10 +1479,8 @@

if (down_interruptible(&hi->reset_sem) ||
down_interruptible(&nodemgr_serialize)) {
- if (current->flags & PF_FREEZE) {
- refrigerator(0);
+ if (try_to_freeze(PF_FREEZE))
continue;
- }
printk("NodeMgr: received unexpected signal?!\n" );
break;
}
@@ -1498,6 +1495,8 @@
for (i = 0; i < 4 ; i++) {
set_current_state(TASK_INTERRUPTIBLE);
if (msleep_interruptible(63)) {
+ if (try_to_freeze(PF_FREEZE))
+ continue;
up(&nodemgr_serialize);
goto caught_signal;
}
diff -ruN 582-refrigerator-old/drivers/input/serio/serio.c 582-refrigerator-new/drivers/input/serio/serio.c
--- 582-refrigerator-old/drivers/input/serio/serio.c 2004-11-24 09:52:58.000000000 +1100
+++ 582-refrigerator-new/drivers/input/serio/serio.c 2004-11-24 17:56:06.968065888 +1100
@@ -34,7 +34,6 @@
#include <linux/completion.h>
#include <linux/sched.h>
#include <linux/smp_lock.h>
-#include <linux/suspend.h>
#include <linux/slab.h>

MODULE_AUTHOR("Vojtech Pavlik <[email protected]>");
@@ -225,8 +224,7 @@
do {
serio_handle_events();
wait_event_interruptible(serio_wait, !list_empty(&serio_event_list));
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
} while (!signal_pending(current));

printk(KERN_DEBUG "serio: kseriod exiting\n");
diff -ruN 582-refrigerator-old/drivers/md/md.c 582-refrigerator-new/drivers/md/md.c
--- 582-refrigerator-old/drivers/md/md.c 2004-11-24 09:52:58.000000000 +1100
+++ 582-refrigerator-new/drivers/md/md.c 2004-11-24 17:56:06.977064520 +1100
@@ -36,7 +36,6 @@
#include <linux/sysctl.h>
#include <linux/devfs_fs_kernel.h>
#include <linux/buffer_head.h> /* for invalidate_bdev */
-#include <linux/suspend.h>

#include <linux/init.h>

@@ -2761,6 +2760,7 @@
*/

daemonize(thread->name, mdname(thread->mddev));
+ current->flags |= PF_NOFREEZE;

current->exit_signal = SIGCHLD;
allow_signal(SIGKILL);
@@ -2785,8 +2785,6 @@

wait_event_interruptible(thread->wqueue,
test_bit(THREAD_WAKEUP, &thread->flags));
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);

clear_bit(THREAD_WAKEUP, &thread->flags);

diff -ruN 582-refrigerator-old/drivers/media/video/msp3400.c 582-refrigerator-new/drivers/media/video/msp3400.c
--- 582-refrigerator-old/drivers/media/video/msp3400.c 2004-11-24 09:52:59.000000000 +1100
+++ 582-refrigerator-new/drivers/media/video/msp3400.c 2004-11-24 17:58:02.541496056 +1100
@@ -741,6 +741,7 @@
{
DECLARE_WAITQUEUE(wait, current);

+again:
add_wait_queue(&msp->wq, &wait);
if (!kthread_should_stop()) {
if (timeout < 0) {
@@ -756,9 +757,11 @@
#endif
}
}
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
remove_wait_queue(&msp->wq, &wait);
+
+ if (try_to_freeze(PF_FREEZE))
+ goto again;
+
return msp->restart;
}

diff -ruN 582-refrigerator-old/drivers/media/video/tvaudio.c 582-refrigerator-new/drivers/media/video/tvaudio.c
--- 582-refrigerator-old/drivers/media/video/tvaudio.c 2004-11-24 09:52:59.000000000 +1100
+++ 582-refrigerator-new/drivers/media/video/tvaudio.c 2004-11-24 17:56:06.982063760 +1100
@@ -285,6 +285,7 @@
schedule();
}
remove_wait_queue(&chip->wq, &wait);
+ try_to_freeze(PF_FREEZE);
if (chip->done || signal_pending(current))
break;
dprintk("%s: thread wakeup\n", i2c_clientname(&chip->c));
diff -ruN 582-refrigerator-old/drivers/mtd/mtd_blkdevs.c 582-refrigerator-new/drivers/mtd/mtd_blkdevs.c
--- 582-refrigerator-old/drivers/mtd/mtd_blkdevs.c 2004-11-24 09:52:59.000000000 +1100
+++ 582-refrigerator-new/drivers/mtd/mtd_blkdevs.c 2004-11-24 17:56:06.984063456 +1100
@@ -113,6 +113,8 @@
schedule();
remove_wait_queue(&tr->blkcore_priv->thread_wq, &wait);

+ try_to_freeze(PF_FREEZE);
+
spin_lock_irq(rq->queue_lock);

continue;
diff -ruN 582-refrigerator-old/drivers/net/8139too.c 582-refrigerator-new/drivers/net/8139too.c
--- 582-refrigerator-old/drivers/net/8139too.c 2004-11-24 09:52:59.000000000 +1100
+++ 582-refrigerator-new/drivers/net/8139too.c 2004-11-24 17:56:06.987063000 +1100
@@ -108,7 +108,6 @@
#include <linux/mii.h>
#include <linux/completion.h>
#include <linux/crc32.h>
-#include <linux/suspend.h>
#include <asm/io.h>
#include <asm/uaccess.h>
#include <asm/irq.h>
@@ -1624,8 +1623,7 @@
do {
timeout = interruptible_sleep_on_timeout (&tp->thr_wait, timeout);
/* make swsusp happy with our thread */
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
} while (!signal_pending (current) && (timeout > 0));

if (signal_pending (current)) {
diff -ruN 582-refrigerator-old/drivers/net/irda/sir_kthread.c 582-refrigerator-new/drivers/net/irda/sir_kthread.c
--- 582-refrigerator-old/drivers/net/irda/sir_kthread.c 2004-11-03 21:55:05.000000000 +1100
+++ 582-refrigerator-new/drivers/net/irda/sir_kthread.c 2004-11-24 17:56:06.988062848 +1100
@@ -19,7 +19,6 @@
#include <linux/smp_lock.h>
#include <linux/completion.h>
#include <linux/delay.h>
-#include <linux/suspend.h>

#include <net/irda/irda.h>

@@ -113,6 +112,7 @@
DECLARE_WAITQUEUE(wait, current);

daemonize("kIrDAd");
+ current->flags |= PF_NOFREEZE;

irda_rq_queue.thread = current;

@@ -135,10 +135,6 @@
__set_task_state(current, TASK_RUNNING);
remove_wait_queue(&irda_rq_queue.kick, &wait);

- /* make swsusp happy with our thread */
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
-
run_irda_queue();
}

diff -ruN 582-refrigerator-old/drivers/net/irda/stir4200.c 582-refrigerator-new/drivers/net/irda/stir4200.c
--- 582-refrigerator-old/drivers/net/irda/stir4200.c 2004-11-24 09:53:01.000000000 +1100
+++ 582-refrigerator-new/drivers/net/irda/stir4200.c 2004-11-24 17:56:06.990062544 +1100
@@ -46,7 +46,6 @@
#include <linux/time.h>
#include <linux/skbuff.h>
#include <linux/netdevice.h>
-#include <linux/suspend.h>
#include <linux/slab.h>
#include <linux/delay.h>
#include <linux/usb.h>
diff -ruN 582-refrigerator-old/drivers/net/wireless/airo.c 582-refrigerator-new/drivers/net/wireless/airo.c
--- 582-refrigerator-old/drivers/net/wireless/airo.c 2004-11-24 09:53:02.000000000 +1100
+++ 582-refrigerator-new/drivers/net/wireless/airo.c 2004-11-24 17:56:06.998061328 +1100
@@ -33,7 +33,6 @@
#include <linux/string.h>
#include <linux/timer.h>
#include <linux/interrupt.h>
-#include <linux/suspend.h>
#include <linux/in.h>
#include <linux/bitops.h>
#include <asm/io.h>
@@ -2918,8 +2917,7 @@
flush_signals(current);

/* make swsusp happy with our thread */
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);

if (test_bit(JOB_DIE, &ai->flags))
break;
diff -ruN 582-refrigerator-old/drivers/pcmcia/cs.c 582-refrigerator-new/drivers/pcmcia/cs.c
--- 582-refrigerator-old/drivers/pcmcia/cs.c 2004-11-24 09:53:02.000000000 +1100
+++ 582-refrigerator-new/drivers/pcmcia/cs.c 2004-11-24 17:56:07.020057984 +1100
@@ -48,7 +48,6 @@
#include <linux/pm.h>
#include <linux/pci.h>
#include <linux/device.h>
-#include <linux/suspend.h>
#include <asm/system.h>
#include <asm/irq.h>

@@ -718,8 +717,7 @@
}

schedule();
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);

if (!skt->thread)
break;
diff -ruN 582-refrigerator-old/drivers/pcmcia/socket_sysfs.c 582-refrigerator-new/drivers/pcmcia/socket_sysfs.c
--- 582-refrigerator-old/drivers/pcmcia/socket_sysfs.c 2004-11-03 21:51:32.000000000 +1100
+++ 582-refrigerator-new/drivers/pcmcia/socket_sysfs.c 2004-11-24 17:56:07.035055704 +1100
@@ -25,7 +25,6 @@
#include <linux/pm.h>
#include <linux/pci.h>
#include <linux/device.h>
-#include <linux/suspend.h>
#include <asm/system.h>
#include <asm/irq.h>

diff -ruN 582-refrigerator-old/drivers/pnp/pnpbios/core.c 582-refrigerator-new/drivers/pnp/pnpbios/core.c
--- 582-refrigerator-old/drivers/pnp/pnpbios/core.c 2004-11-24 09:53:03.000000000 +1100
+++ 582-refrigerator-new/drivers/pnp/pnpbios/core.c 2004-11-24 17:58:33.769748640 +1100
@@ -179,6 +179,10 @@
* Poll every 2 seconds
*/
msleep_interruptible(2000);
+
+ if(current->flags & PF_FREEZE)
+ refrigerator(PF_FREEZE);
+
if(signal_pending(current))
break;

diff -ruN 582-refrigerator-old/drivers/scsi/libata-core.c 582-refrigerator-new/drivers/scsi/libata-core.c
--- 582-refrigerator-old/drivers/scsi/libata-core.c 2004-11-24 18:03:13.232263880 +1100
+++ 582-refrigerator-new/drivers/scsi/libata-core.c 2004-11-24 17:56:07.050053424 +1100
@@ -35,7 +35,6 @@
#include <linux/timer.h>
#include <linux/interrupt.h>
#include <linux/completion.h>
-#include <linux/suspend.h>
#include <linux/workqueue.h>
#include <scsi/scsi.h>
#include "scsi.h"
diff -ruN 582-refrigerator-old/drivers/usb/core/hub.c 582-refrigerator-new/drivers/usb/core/hub.c
--- 582-refrigerator-old/drivers/usb/core/hub.c 2004-11-24 09:53:05.000000000 +1100
+++ 582-refrigerator-new/drivers/usb/core/hub.c 2004-11-24 17:56:07.053052968 +1100
@@ -26,7 +26,6 @@
#include <linux/ioctl.h>
#include <linux/usb.h>
#include <linux/usbdevice_fs.h>
-#include <linux/suspend.h>

#include <asm/semaphore.h>
#include <asm/uaccess.h>
@@ -2713,8 +2712,7 @@
do {
hub_events();
wait_event_interruptible(khubd_wait, !list_empty(&hub_event_list));
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
} while (!signal_pending(current));

pr_debug ("%s: khubd exiting\n", usbcore_name);
diff -ruN 582-refrigerator-old/drivers/w1/w1.c 582-refrigerator-new/drivers/w1/w1.c
--- 582-refrigerator-old/drivers/w1/w1.c 2004-11-24 09:53:07.000000000 +1100
+++ 582-refrigerator-new/drivers/w1/w1.c 2004-11-24 17:56:07.076049472 +1100
@@ -32,7 +32,6 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/sched.h>
-#include <linux/suspend.h>

#include "w1.h"
#include "w1_io.h"
@@ -628,8 +627,7 @@
timeout = w1_timeout*HZ;
do {
timeout = interruptible_sleep_on_timeout(&w1_control_wait, timeout);
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
} while (!signal_pending(current) && (timeout > 0));

if (signal_pending(current))
@@ -701,8 +699,7 @@
timeout = w1_timeout*HZ;
do {
timeout = interruptible_sleep_on_timeout(&dev->kwait, timeout);
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
} while (!signal_pending(current) && (timeout > 0));

if (signal_pending(current))
diff -ruN 582-refrigerator-old/fs/afs/kafsasyncd.c 582-refrigerator-new/fs/afs/kafsasyncd.c
--- 582-refrigerator-old/fs/afs/kafsasyncd.c 2004-11-03 21:55:01.000000000 +1100
+++ 582-refrigerator-new/fs/afs/kafsasyncd.c 2004-11-24 17:56:07.089047496 +1100
@@ -116,6 +116,8 @@
remove_wait_queue(&kafsasyncd_sleepq, &myself);
set_current_state(TASK_RUNNING);

+ try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
afs_discard_my_signals();

diff -ruN 582-refrigerator-old/fs/afs/kafstimod.c 582-refrigerator-new/fs/afs/kafstimod.c
--- 582-refrigerator-old/fs/afs/kafstimod.c 2004-11-03 21:55:05.000000000 +1100
+++ 582-refrigerator-new/fs/afs/kafstimod.c 2004-11-24 17:56:07.092047040 +1100
@@ -91,6 +91,8 @@
complete_and_exit(&kafstimod_dead, 0);
}

+ try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
afs_discard_my_signals();

diff -ruN 582-refrigerator-old/fs/buffer.c 582-refrigerator-new/fs/buffer.c
--- 582-refrigerator-old/fs/buffer.c 2004-11-24 09:53:07.000000000 +1100
+++ 582-refrigerator-new/fs/buffer.c 2004-11-24 17:59:03.766188488 +1100
@@ -38,6 +38,8 @@
#include <linux/bio.h>
#include <linux/notifier.h>
#include <linux/cpu.h>
+#include <linux/init.h>
+#include <linux/swap.h>
#include <linux/bitops.h>

static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
@@ -170,6 +172,16 @@
*/
int fsync_super(struct super_block *sb)
{
+ int ret;
+
+ /* A safety net. During suspend, we might overwrite
+ * memory containing filesystem info. We don't then
+ * want to sync it to disk. */
+ if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
+ return 0;
+
+ current->flags |= PF_SYNCTHREAD;
+
sync_inodes_sb(sb, 0);
DQUOT_SYNC(sb);
lock_super(sb);
@@ -181,7 +193,10 @@
sync_blockdev(sb->s_bdev);
sync_inodes_sb(sb, 1);

- return sync_blockdev(sb->s_bdev);
+ ret = sync_blockdev(sb->s_bdev);
+
+ current->flags &= ~PF_SYNCTHREAD;
+ return ret;
}

/*
@@ -192,12 +207,22 @@
int fsync_bdev(struct block_device *bdev)
{
struct super_block *sb = get_super(bdev);
+ int ret;
+
+ if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
+ return 0;
+
+ current->flags |= PF_SYNCTHREAD;
+
if (sb) {
int res = fsync_super(sb);
drop_super(sb);
+ current->flags &= ~PF_SYNCTHREAD;
return res;
}
- return sync_blockdev(bdev);
+ ret = sync_blockdev(bdev);
+ current->flags &= ~PF_SYNCTHREAD;
+ return ret;
}

/**
@@ -277,6 +302,14 @@
*/
static void do_sync(unsigned long wait)
{
+ /* A safety net. During suspend, we might overwrite
+ * memory containing filesystem info. We don't then
+ * want to sync it to disk. */
+ if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
+ return;
+
+ current->flags |= PF_SYNCTHREAD;
+
wakeup_bdflush(0);
sync_inodes(0); /* All mappings, inodes and their blockdevs */
DQUOT_SYNC(NULL);
@@ -288,6 +321,8 @@
printk("Emergency Sync complete\n");
if (unlikely(laptop_mode))
laptop_sync_completion();
+
+ current->flags &= ~PF_SYNCTHREAD;
}

asmlinkage long sys_sync(void)
@@ -296,6 +331,8 @@
return 0;
}

+EXPORT_SYMBOL(sys_sync);
+
void emergency_sync(void)
{
pdflush_operation(do_sync, 0);
@@ -313,6 +350,11 @@
struct super_block * sb;
int ret;

+ if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
+ return 0;
+
+ current->flags |= PF_SYNCTHREAD;
+
/* sync the inode to buffers */
write_inode_now(inode, 0);

@@ -325,6 +367,8 @@

/* .. finally sync the buffers to disk */
ret = sync_blockdev(sb->s_bdev);
+
+ current->flags &= ~PF_SYNCTHREAD;
return ret;
}

@@ -334,6 +378,8 @@
struct address_space *mapping;
int ret, err;

+ current->flags |= PF_SYNCTHREAD;
+
ret = -EBADF;
file = fget(fd);
if (!file)
@@ -363,6 +409,7 @@
out_putf:
fput(file);
out:
+ current->flags &= ~PF_SYNCTHREAD;
return ret;
}

@@ -372,6 +419,8 @@
struct address_space *mapping;
int ret, err;

+ current->flags |= PF_SYNCTHREAD;
+
ret = -EBADF;
file = fget(fd);
if (!file)
@@ -398,6 +447,7 @@
out_putf:
fput(file);
out:
+ current->flags &= ~PF_SYNCTHREAD;
return ret;
}

@@ -1062,6 +1112,8 @@
* async buffer heads in use.
*/
free_more_memory();
+ if (suspend_task == current->pid)
+ suspend2_cleanup_finished_io();
goto try_again;
}
EXPORT_SYMBOL_GPL(alloc_page_buffers);
diff -ruN 582-refrigerator-old/fs/jbd/journal.c 582-refrigerator-new/fs/jbd/journal.c
--- 582-refrigerator-old/fs/jbd/journal.c 2004-11-24 09:53:07.000000000 +1100
+++ 582-refrigerator-new/fs/jbd/journal.c 2004-11-24 17:56:07.115043544 +1100
@@ -130,6 +130,7 @@
current_journal = journal;

daemonize("kjournald");
+ current->flags |= PF_SYNCTHREAD;

/* Set up an interval timer which can be used to trigger a
commit wakeup after the commit interval expires */
diff -ruN 582-refrigerator-old/fs/jffs/intrep.c 582-refrigerator-new/fs/jffs/intrep.c
--- 582-refrigerator-old/fs/jffs/intrep.c 2004-11-03 21:53:49.000000000 +1100
+++ 582-refrigerator-new/fs/jffs/intrep.c 2004-11-24 17:56:07.127041720 +1100
@@ -3338,6 +3338,7 @@
D1(int i = 1);

daemonize("jffs_gcd");
+ current->flags |= PF_SYNCTHREAD;

c->gc_task = current;

@@ -3373,6 +3374,9 @@
siginfo_t info;
unsigned long signr = 0;

+ if (try_to_freeze(PF_FREEZE))
+ continue;
+
spin_lock_irq(&current->sighand->siglock);
signr = dequeue_signal(current, &current->blocked, &info);
spin_unlock_irq(&current->sighand->siglock);
diff -ruN 582-refrigerator-old/fs/jffs2/background.c 582-refrigerator-new/fs/jffs2/background.c
--- 582-refrigerator-old/fs/jffs2/background.c 2004-11-03 21:51:10.000000000 +1100
+++ 582-refrigerator-new/fs/jffs2/background.c 2004-11-24 17:56:07.150038224 +1100
@@ -15,7 +15,6 @@
#include <linux/jffs2.h>
#include <linux/mtd/mtd.h>
#include <linux/completion.h>
-#include <linux/suspend.h>
#include "nodelist.h"


@@ -93,12 +92,8 @@
schedule();
}

- if (current->flags & PF_FREEZE) {
- refrigerator(0);
- /* refrigerator() should recalc sigpending for us
- but doesn't. No matter - allow_signal() will. */
+ if (try_to_freeze(0))
continue;
- }

cond_resched();

diff -ruN 582-refrigerator-old/fs/jfs/jfs_logmgr.c 582-refrigerator-new/fs/jfs/jfs_logmgr.c
--- 582-refrigerator-old/fs/jfs/jfs_logmgr.c 2004-11-24 09:53:07.000000000 +1100
+++ 582-refrigerator-new/fs/jfs/jfs_logmgr.c 2004-11-24 17:56:07.168035488 +1100
@@ -2316,6 +2316,7 @@
struct lbuf *bp;

daemonize("jfsIO");
+ current->flags |= PF_SYNCTHREAD;

complete(&jfsIOwait);

diff -ruN 582-refrigerator-old/fs/jfs/jfs_txnmgr.c 582-refrigerator-new/fs/jfs/jfs_txnmgr.c
--- 582-refrigerator-old/fs/jfs/jfs_txnmgr.c 2004-11-24 09:53:07.000000000 +1100
+++ 582-refrigerator-new/fs/jfs/jfs_txnmgr.c 2004-11-24 17:56:07.172034880 +1100
@@ -47,7 +47,6 @@
#include <linux/vmalloc.h>
#include <linux/smp_lock.h>
#include <linux/completion.h>
-#include <linux/suspend.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include "jfs_incore.h"
@@ -2727,6 +2726,7 @@
struct jfs_sb_info *sbi;

daemonize("jfsCommit");
+ current->flags |= PF_SYNCTHREAD;

complete(&jfsIOwait);

diff -ruN 582-refrigerator-old/fs/lockd/clntlock.c 582-refrigerator-new/fs/lockd/clntlock.c
--- 582-refrigerator-old/fs/lockd/clntlock.c 2004-11-03 21:55:00.000000000 +1100
+++ 582-refrigerator-new/fs/lockd/clntlock.c 2004-11-24 17:56:07.183033208 +1100
@@ -200,6 +200,7 @@
struct inode *inode;

daemonize("%s-reclaim", host->h_name);
+ current->flags |= PF_SYNCTHREAD;
allow_signal(SIGKILL);

/* This one ensures that our parent doesn't terminate while the
@@ -222,6 +223,7 @@

fl->fl_u.nfs_fl.flags &= ~NFS_LCK_RECLAIM;
nlmclnt_reclaim(host, fl);
+ try_to_freeze(PF_FREEZE);
if (signalled())
break;
goto restart;
diff -ruN 582-refrigerator-old/fs/lockd/clntproc.c 582-refrigerator-new/fs/lockd/clntproc.c
--- 582-refrigerator-old/fs/lockd/clntproc.c 2004-11-03 21:55:04.000000000 +1100
+++ 582-refrigerator-new/fs/lockd/clntproc.c 2004-11-24 17:56:07.185032904 +1100
@@ -310,6 +310,7 @@
prepare_to_wait(queue, &wait, TASK_INTERRUPTIBLE);
if (!signalled ()) {
schedule_timeout(NLMCLNT_GRACE_WAIT);
+ try_to_freeze(PF_FREEZE);
if (!signalled ())
status = 0;
}
diff -ruN 582-refrigerator-old/fs/lockd/svc.c 582-refrigerator-new/fs/lockd/svc.c
--- 582-refrigerator-old/fs/lockd/svc.c 2004-11-03 21:54:14.000000000 +1100
+++ 582-refrigerator-new/fs/lockd/svc.c 2004-11-24 17:56:07.191031992 +1100
@@ -112,6 +112,7 @@
up(&lockd_start);

daemonize("lockd");
+ current->flags |= PF_SYNCTHREAD;

/* Process request with signals blocked, but allow SIGKILL. */
allow_signal(SIGKILL);
@@ -135,6 +136,8 @@
while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) {
long timeout = MAX_SCHEDULE_TIMEOUT;

+ try_to_freeze(PF_SYNCTHREAD);
+
if (signalled()) {
flush_signals(current);
if (nlmsvc_ops) {
diff -ruN 582-refrigerator-old/fs/nfsd/nfssvc.c 582-refrigerator-new/fs/nfsd/nfssvc.c
--- 582-refrigerator-old/fs/nfsd/nfssvc.c 2004-11-24 09:53:08.000000000 +1100
+++ 582-refrigerator-new/fs/nfsd/nfssvc.c 2004-11-24 17:59:23.732153200 +1100
@@ -180,6 +180,7 @@
/* Lock module and set up kernel thread */
lock_kernel();
daemonize("nfsd");
+ current->flags |= PF_SYNCTHREAD;

/* After daemonize() this kernel thread shares current->fs
* with the init process. We need to create files with a
diff -ruN 582-refrigerator-old/fs/reiserfs/journal.c 582-refrigerator-new/fs/reiserfs/journal.c
--- 582-refrigerator-old/fs/reiserfs/journal.c 2004-11-24 18:03:13.238262968 +1100
+++ 582-refrigerator-new/fs/reiserfs/journal.c 2004-11-24 17:56:07.211028952 +1100
@@ -50,7 +50,6 @@
#include <linux/stat.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/suspend.h>
#include <linux/buffer_head.h>
#include <linux/workqueue.h>
#include <linux/writeback.h>
diff -ruN 582-refrigerator-old/fs/xfs/linux-2.6/xfs_buf.c 582-refrigerator-new/fs/xfs/linux-2.6/xfs_buf.c
--- 582-refrigerator-old/fs/xfs/linux-2.6/xfs_buf.c 2004-11-24 18:03:13.239262816 +1100
+++ 582-refrigerator-new/fs/xfs/linux-2.6/xfs_buf.c 2004-11-24 17:56:07.214028496 +1100
@@ -51,7 +51,6 @@
#include <linux/sysctl.h>
#include <linux/proc_fs.h>
#include <linux/workqueue.h>
-#include <linux/suspend.h>
#include <linux/percpu.h>

#include "xfs_linux.h"
@@ -1657,7 +1656,7 @@

/* Set up the thread */
daemonize("xfsbufd");
- current->flags |= PF_MEMALLOC;
+ current->flags |= PF_MEMALLOC | PF_SYNCTHREAD;

pagebuf_daemon_task = current;
pagebuf_daemon_active = 1;
@@ -1665,9 +1664,7 @@

INIT_LIST_HEAD(&tmp);
do {
- /* swsusp */
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);

set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout((xfs_buf_timer_centisecs * HZ) / 100);
diff -ruN 582-refrigerator-old/fs/xfs/linux-2.6/xfs_super.c 582-refrigerator-new/fs/xfs/linux-2.6/xfs_super.c
--- 582-refrigerator-old/fs/xfs/linux-2.6/xfs_super.c 2004-11-03 21:55:00.000000000 +1100
+++ 582-refrigerator-new/fs/xfs/linux-2.6/xfs_super.c 2004-11-24 17:56:07.230026064 +1100
@@ -71,7 +71,6 @@
#include <linux/namei.h>
#include <linux/init.h>
#include <linux/mount.h>
-#include <linux/suspend.h>
#include <linux/writeback.h>

STATIC struct quotactl_ops linvfs_qops;
@@ -472,6 +471,7 @@
struct vfs_sync_work *work, *n;

daemonize("xfssyncd");
+ current->flags |= PF_SYNCTHREAD;

vfsp->vfs_sync_work.w_vfs = vfsp;
vfsp->vfs_sync_work.w_syncer = vfs_sync_worker;
@@ -485,8 +485,7 @@
set_current_state(TASK_INTERRUPTIBLE);
timeleft = schedule_timeout(timeleft);
/* swsusp */
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
if (vfsp->vfs_flag & VFS_UMOUNT)
break;

diff -ruN 582-refrigerator-old/include/linux/sched.h 582-refrigerator-new/include/linux/sched.h
--- 582-refrigerator-old/include/linux/sched.h 2004-11-24 18:03:13.123280448 +1100
+++ 582-refrigerator-new/include/linux/sched.h 2004-11-24 17:59:48.248426160 +1100
@@ -19,6 +19,7 @@
#include <asm/page.h>
#include <asm/ptrace.h>
#include <asm/mmu.h>
+#include <asm/current.h>

#include <linux/smp.h>
#include <linux/sem.h>
@@ -701,7 +702,7 @@
#define PF_MEMDIE 0x00001000 /* Killed for out-of-memory */
#define PF_FLUSHER 0x00002000 /* responsible for disk writeback */

-#define PF_FREEZE 0x00004000 /* this task should be frozen for suspend */
+#define PF_FREEZE 0x00004000 /* this task is being frozen for suspend now */
#define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */
#define PF_FROZEN 0x00010000 /* frozen for system suspend */
#define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */
@@ -710,6 +711,8 @@
#define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
#define PF_SYNCWRITE 0x00200000 /* I am doing a sync write */
#define PF_BORROWED_MM 0x00400000 /* I am a kthread doing use_mm */
+#define PF_SYNCTHREAD 0x00800000 /* this thread can start activity during the
+ early part of freezing processes */

#ifdef CONFIG_SMP
extern int set_cpus_allowed(task_t *p, cpumask_t new_mask);
@@ -720,6 +723,29 @@
}
#endif

+/* try_to_freeze
+ *
+ * Checks whether we need to enter the refrigerator
+ * and returns 1 if we did so.
+ */
+#ifdef CONFIG_PM
+extern void refrigerator(unsigned long);
+
+static inline int try_to_freeze(unsigned long refrigerator_flags)
+{
+ if (unlikely(current->flags & PF_FREEZE)) {
+ refrigerator(refrigerator_flags);
+ return 1;
+ } else
+ return 0;
+}
+#else
+static inline int try_to_freeze(unsigned long refrigerator_flags)
+{
+ return 0;
+}
+#endif
+
extern unsigned long long sched_clock(void);

/* sched_exec is called by processes performing an exec */
@@ -1119,6 +1145,14 @@

#endif

+#ifdef CONFIG_PM
+extern void refrigerator(unsigned long);
+extern unsigned int suspend_task;
+#else
+#define refrigerator(a) do { } while(0)
+#define suspend_task (0)
+#endif
+
#endif /* __KERNEL__ */

#endif
diff -ruN 582-refrigerator-old/kernel/exit.c 582-refrigerator-new/kernel/exit.c
--- 582-refrigerator-old/kernel/exit.c 2004-11-24 18:03:13.124280296 +1100
+++ 582-refrigerator-new/kernel/exit.c 2004-11-24 17:56:07.236025152 +1100
@@ -15,6 +15,7 @@
#include <linux/module.h>
#include <linux/completion.h>
#include <linux/personality.h>
+#include <linux/suspend.h>
#include <linux/tty.h>
#include <linux/namespace.h>
#include <linux/key.h>
@@ -802,6 +803,8 @@
panic("Attempted to kill init!");
if (tsk->io_context)
exit_io_context();
+ if (unlikely(test_suspend_state(SUSPEND_FREEZER_ON)))
+ refrigerator(0);
tsk->flags |= PF_EXITING;
del_timer_sync(&tsk->real_timer);

diff -ruN 582-refrigerator-old/kernel/fork.c 582-refrigerator-new/kernel/fork.c
--- 582-refrigerator-old/kernel/fork.c 2004-11-24 18:03:13.126279992 +1100
+++ 582-refrigerator-new/kernel/fork.c 2004-11-24 17:56:07.238024848 +1100
@@ -39,6 +39,7 @@
#include <linux/audit.h>
#include <linux/profile.h>
#include <linux/rmap.h>
+#include <linux/suspend.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -1125,6 +1126,10 @@
int trace = 0;
long pid = alloc_pidmap();

+
+ if (unlikely(test_suspend_state(SUSPEND_FREEZER_ON)))
+ refrigerator(0);
+
if (pid < 0)
return -EAGAIN;
if (unlikely(current->ptrace)) {
diff -ruN 582-refrigerator-old/kernel/power/disk.c 582-refrigerator-new/kernel/power/disk.c
--- 582-refrigerator-old/kernel/power/disk.c 2004-11-24 09:53:12.000000000 +1100
+++ 582-refrigerator-new/kernel/power/disk.c 2004-11-24 17:56:07.253022568 +1100
@@ -116,7 +116,7 @@
device_resume();
platform_finish();
enable_nonboot_cpus();
- thaw_processes();
+ thaw_processes(FREEZER_ALL_THREADS);
pm_restore_console();
}

@@ -128,7 +128,7 @@
pm_prepare_console();

sys_sync();
- if (freeze_processes()) {
+ if (freeze_processes(1)) {
error = -EBUSY;
goto Thaw;
}
@@ -152,7 +152,7 @@
platform_finish();
Thaw:
enable_nonboot_cpus();
- thaw_processes();
+ thaw_processes(FREEZER_ALL_THREADS);
pm_restore_console();
return error;
}
diff -ruN 582-refrigerator-old/kernel/power/main.c 582-refrigerator-new/kernel/power/main.c
--- 582-refrigerator-old/kernel/power/main.c 2004-11-24 18:03:13.289255216 +1100
+++ 582-refrigerator-new/kernel/power/main.c 2004-11-24 17:56:07.254022416 +1100
@@ -55,7 +55,7 @@

pm_prepare_console();

- if (freeze_processes()) {
+ if (freeze_processes(1)) {
error = -EAGAIN;
goto Thaw;
}
@@ -72,7 +72,7 @@
if (pm_ops->finish)
pm_ops->finish(state);
Thaw:
- thaw_processes();
+ thaw_processes(FREEZER_ALL_THREADS);
pm_restore_console();
return error;
}
@@ -107,7 +107,7 @@
device_resume();
if (pm_ops && pm_ops->finish)
pm_ops->finish(state);
- thaw_processes();
+ thaw_processes(FREEZER_ALL_THREADS);
pm_restore_console();
}

diff -ruN 582-refrigerator-old/kernel/power/power.h 582-refrigerator-new/kernel/power/power.h
--- 582-refrigerator-old/kernel/power/power.h 2004-11-03 21:55:05.000000000 +1100
+++ 582-refrigerator-new/kernel/power/power.h 2004-11-24 17:56:07.264020896 +1100
@@ -45,8 +45,8 @@

extern struct subsystem power_subsys;

-extern int freeze_processes(void);
-extern void thaw_processes(void);
+extern int freeze_processes(int no_progress);
+extern void thaw_processes(int which_threads);

extern int pm_prepare_console(void);
extern void pm_restore_console(void);
diff -ruN 582-refrigerator-old/kernel/power/process.c 582-refrigerator-new/kernel/power/process.c
--- 582-refrigerator-old/kernel/power/process.c 2004-11-24 09:53:12.000000000 +1100
+++ 582-refrigerator-new/kernel/power/process.c 2004-11-24 18:03:09.613813968 +1100
@@ -1,121 +1,521 @@
/*
- * drivers/power/process.c - Functions for starting/stopping processes on
- * suspend transitions.
+ * kernel/power/process.c
*
- * Originally from swsusp.
+ * Copyright (C) 1998-2001 Gabor Kuti <[email protected]>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <[email protected]>
+ * Copyright (C) 2002-2003 Florent Chabaud <[email protected]>
+ * Copyright (C) 2002-2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Freeze_and_free contains the routines software suspend uses to freeze other
+ * processes during the suspend cycle and to (if necessary) free up memory in
+ * accordance with limitations on the image size.
+ *
+ * Ideally, the image saved to disk would be an atomic copy of the entire
+ * contents of all RAM and related hardware state. One of the first
+ * prerequisites for getting our approximation of this is stopping the activity
+ * of other processes. We can't stop all other processes, however, since some
+ * are needed in doing the I/O to save the image. Freeze_and_free.c contains
+ * the routines that control suspension and resuming of these processes.
+ *
+ * Under high I/O load, we need to be careful about the order in which we
+ * freeze processes. If we freeze processes in the wrong order, we could
+ * deadlock others. The freeze_order array this specifies the order in which
+ * critical processes are frozen. All others are suspended after these have
+ * entered the refrigerator.
+ *
+ * Another complicating factor is that freeing memory requires the processes
+ * to not be frozen, but at the end of freeing memory, they need to be frozen
+ * so that we can be sure we actually have eaten enough memory. This is why
+ * freezing and freeing are in the one file. The freezer is not called from
+ * the main logic, but indirectly, via the code for eating memory. The eat
+ * memory logic is iterative, first freezing processes and checking the stats,
+ * then (if necessary) unfreezing them and eating more memory until it looks
+ * like the criteria are met (at which point processes are frozen & stats
+ * checked again).
*/

+#define SUSPEND_FREEZER_C

-#undef DEBUG
-
-#include <linux/smp_lock.h>
-#include <linux/interrupt.h>
-#include <linux/suspend.h>
#include <linux/module.h>
+#include <linux/suspend.h>
+#include <asm/tlbflush.h>

-/*
- * Timeout for stopping processes
+#include "suspend.h"
+
+volatile struct suspend2_core_ops * suspend2_core_ops = NULL;
+unsigned long suspend_action = 0;
+unsigned long suspend_result = 0;
+unsigned long suspend_debug_state = 0;
+unsigned long software_suspend_state = ((1 << SUSPEND_DISABLED) | (1 << SUSPEND_BOOT_TIME) |
+ (1 << SUSPEND_RESUME_NOT_DONE) | (1 << SUSPEND_IGNORE_LOGLEVEL));
+unsigned int suspend_task = 0;
+
+atomic_t __nosavedata suspend_cpu_counter = { 0 };
+
+/* Timeouts when freezing */
+#define FREEZER_TOTAL_TIMEOUT (5 * HZ)
+#define FREEZER_CHECK_TIMEOUT (HZ / 10)
+
+extern void suspend_relinquish_console(void);
+
+/* ------------------------------------------------------------------------ */
+
+/**
+ * refrigerator - idle routine for frozen processes
+ * @flag: unsigned long, non zero if signals to be flushed.
+ *
+ * A routine for kernel threads which should not do work during suspend
+ * to enter and spin in until the process is finished.
*/
-#define TIMEOUT (6 * HZ)

+void refrigerator(unsigned long flag)
+{
+ unsigned long flags;
+ long save;
+
+ if (unlikely(current->flags & PF_NOFREEZE)) {
+ current->flags &= ~PF_FREEZE;
+ spin_lock_irqsave(&current->sighand->siglock, flags);
+ recalc_sigpending();
+ spin_unlock_irqrestore(&current->sighand->siglock, flags);
+ return;
+ }
+
+ /* You need correct to work with real-time processes.
+ OTOH, this way one process may see (via /proc/) some other
+ process in stopped state (and thereby discovered we were
+ suspended. We probably do not care).
+ */
+ if ((flag) && (current->flags & PF_FREEZE)) {
+
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0,
+ "\n%s (%d) refrigerated and sigpending recalculated.",
+ current->comm, current->pid);
+ spin_lock_irqsave(&current->sighand->siglock, flags);
+ recalc_sigpending();
+ spin_unlock_irqrestore(&current->sighand->siglock, flags);
+ } else
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0,
+ "\n%s (%d) refrigerated.",
+ current->comm, current->pid);
+
+ if (test_suspend_state(SUSPEND_FREEZER_ON)) {
+ save = current->state;
+ current->flags |= PF_FROZEN;
+ while (current->flags & PF_FROZEN) {
+ current->state = TASK_STOPPED;
+ schedule();
+ if (flag) {
+ spin_lock_irqsave(
+ &current->sighand->siglock, flags);
+ recalc_sigpending();
+ spin_unlock_irqrestore(
+ &current->sighand->siglock, flags);
+ }
+ }
+ current->state = save;
+ } else
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0,
+ "No longer freezing processes. Dropping out.\n");
+ current->flags &= ~PF_FREEZE;
+ spin_lock_irqsave(&current->sighand->siglock, flags);
+ recalc_sigpending();
+ spin_unlock_irqrestore(&current->sighand->siglock, flags);
+}
+
+
+#ifdef CONFIG_SMP
+static void __smp_pause(void * data)
+{
+ atomic_inc(&suspend_cpu_counter);
+ while(test_suspend_state(SUSPEND_FREEZE_SMP)) {
+ cpu_relax();
+ barrier();
+ }
+ local_flush_tlb();
+ atomic_dec(&suspend_cpu_counter);
+}
+
+void smp_pause(void)
+{
+ set_suspend_state(SUSPEND_FREEZE_SMP);
+ smp_call_function(__smp_pause, NULL, 0, 0);
+
+ while (atomic_read(&suspend_cpu_counter) < (num_online_cpus() - 1)) {
+ cpu_relax();
+ barrier();
+ }
+}

-static inline int freezeable(struct task_struct * p)
+void smp_continue(void)
{
- if ((p == current) ||
+ clear_suspend_state(SUSPEND_FREEZE_SMP);
+
+ while (atomic_read(&suspend_cpu_counter)) {
+ cpu_relax();
+ barrier();
+ }
+}
+
+extern void __smp_suspend_lowlevel(void * info);
+
+void smp_suspend(void)
+{
+ set_suspend_state(SUSPEND_FREEZE_SMP);
+ smp_call_function(__smp_suspend_lowlevel, NULL, 0, 0);
+
+ while (atomic_read(&suspend_cpu_counter) < (num_online_cpus() - 1)) {
+ cpu_relax();
+ barrier();
+ }
+}
+#else
+#define smp_pause() do { } while(0)
+#define smp_continue() do { } while(0)
+#define smp_suspend() do { } while(0)
+#endif
+
+/*
+ * to_be_frozen
+ *
+ * Description: Determine whether a process should be frozen yet.
+ * Parameters: struct task_struct * The process to consider.
+ * int Which group of processes to consider.
+ * Returns: int 0 if don't freeze yet, otherwise do.
+ */
+static int to_be_frozen(struct task_struct * p, int type_being_frozen) {
+
+ if ((p == current) ||
(p->flags & PF_NOFREEZE) ||
(p->exit_state == EXIT_ZOMBIE) ||
(p->exit_state == EXIT_DEAD) ||
- (p->state == TASK_STOPPED) ||
- (p->state == TASK_TRACED))
+ (p->state == TASK_TRACED) ||
+ (p->state == TASK_STOPPED))
+ return 0;
+ if ((!(p->mm)) && (type_being_frozen < 3))
+ return 0;
+ if ((p->flags & PF_SYNCTHREAD) && (type_being_frozen == 1))
return 0;
return 1;
}

-/* Refrigerator is place where frozen processes are stored :-). */
-void refrigerator(unsigned long flag)
+/*
+ * num_to_be_frozen
+ *
+ * Description: Determine how many processes of our type are still to be
+ * frozen. As a side effect, update the progress bar too.
+ * Parameters: int Which type we are trying to freeze.
+ * int Whether we are displaying our progress.
+ */
+static int num_to_be_frozen(int type_being_frozen, int no_progress) {
+
+ struct task_struct *p, *g;
+ int todo_this_type = 0, total_todo = 0;
+ int total_threads = 0;
+
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ if (to_be_frozen(p, type_being_frozen)) {
+ todo_this_type++;
+ total_todo++;
+ } else if (to_be_frozen(p, 3))
+ total_todo++;
+ total_threads++;
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+
+ if ((!no_progress) && (suspend2_core_ops)) {
+ suspend2_core_ops->update_status(
+ total_threads - total_todo,
+ total_threads,
+ "%d/%d",
+ total_threads - total_todo,
+ total_threads);
+ }
+ return todo_this_type;
+}
+
+/*
+ * freeze_threads
+ *
+ * Freeze a set of threads having particular attributes.
+ *
+ * Types:
+ * 1: User threads not syncing.
+ * 2: Remaining user threads.
+ * 3: Kernel threads.
+ */
+extern void show_task(struct task_struct * p);
+
+static int freeze_threads(int type, int no_progress)
{
- /* Hmm, should we be allowed to suspend when there are realtime
- processes around? */
- long save;
- save = current->state;
- current->state = TASK_UNINTERRUPTIBLE;
- pr_debug("%s entered refrigerator\n", current->comm);
- printk("=");
- current->flags &= ~PF_FREEZE;
+ struct task_struct *p, *g;
+ unsigned long start_time = jiffies;
+ int result = 0, still_to_do;
+
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 1,
+ "\n STARTING TO FREEZE TYPE %d THREADS.\n",
+ type);

- spin_lock_irq(&current->sighand->siglock);
- recalc_sigpending(); /* We sent fake signal, clean it up */
- spin_unlock_irq(&current->sighand->siglock);
-
- current->flags |= PF_FROZEN;
- while (current->flags & PF_FROZEN)
- schedule();
- pr_debug("%s left refrigerator\n", current->comm);
- current->state = save;
-}
-
-/* 0 = success, else # of processes that we failed to stop */
-int freeze_processes(void)
-{
- int todo;
- unsigned long start_time;
- struct task_struct *g, *p;
-
- printk( "Stopping tasks: " );
- start_time = jiffies;
do {
- todo = 0;
+ int numsignalled = 0;
+
+ /*
+ * Pause the other processors so we can safely
+ * change threads' flags
+ */
+ smp_pause();
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED)) {
+ smp_continue();
+ return 1;
+ }
+
+ preempt_disable();
+
+ local_irq_disable();
+
read_lock(&tasklist_lock);
+
+ /*
+ * Signal the processes.
+ *
+ * We signal them every time through. Otherwise pdflush -
+ * and maybe other processes - might never enter the
+ * fridge.
+ *
+ * NB: We're inside an SMP pause. Our printks are unsafe.
+ * They're only here for debugging.
+ *
+ */
+
do_each_thread(g, p) {
unsigned long flags;
- if (!freezeable(p))
- continue;
- if ((p->flags & PF_FROZEN) ||
- (p->state == TASK_TRACED) ||
- (p->state == TASK_STOPPED))
+ if (!to_be_frozen(p, type))
continue;
-
- /* FIXME: smp problem here: we may not access other process' flags
- without locking */
+
+ numsignalled++;
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0,
+ "\n %s: pid %d",
+ p->comm, p->pid);
p->flags |= PF_FREEZE;
spin_lock_irqsave(&p->sighand->siglock, flags);
signal_wake_up(p, 0);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
- todo++;
} while_each_thread(g, p);
+
+ if (numsignalled)
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0,
+ "\n Number of threads signalled this iteration is %d.\n",
+ numsignalled);
+
read_unlock(&tasklist_lock);
- yield(); /* Yield is okay here */
- if (time_after(jiffies, start_time + TIMEOUT)) {
- printk( "\n" );
- printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo );
- return todo;
+
+ /*
+ * Let the processes run.
+ */
+ smp_continue();
+
+ preempt_enable();
+
+ local_irq_enable();
+
+ /*
+ * Sleep.
+ */
+ set_task_state(current, TASK_INTERRUPTIBLE);
+ schedule_timeout(FREEZER_CHECK_TIMEOUT);
+
+ still_to_do = num_to_be_frozen(type, no_progress);
+ } while(still_to_do && (!TEST_RESULT_STATE(SUSPEND_ABORTED)) &&
+ !time_after(jiffies, start_time + FREEZER_TOTAL_TIMEOUT));
+
+ /*
+ * Did we time out? See if we failed to freeze processes as well.
+ *
+ */
+ if ((time_after(jiffies, start_time + FREEZER_TOTAL_TIMEOUT)) && (still_to_do)) {
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ if (!to_be_frozen(p, type))
+ continue;
+
+ if (!result) {
+ printk(KERN_ERR name_suspend
+ "Stopping tasks failed.\n");
+ printk(KERN_ERR "Tasks that refused to be refrigerated"
+ " and haven't since exited:\n");
+ result = 1;
+ }
+
+ if (p->flags & PF_FREEZE) {
+ printk(" - %s (#%d) signalled but "
+ "didn't enter refrigerator.\n",
+ p->comm, p->pid);
+ show_task(p);
+ } else
+ printk(" - %s (#%d) wasn't "
+ "signalled.\n",
+ p->comm, p->pid);
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+ } else
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 1,
+ "\n\nSuccessfully froze processes of type %d.\n",
+ type);
+ return result;
+}
+
+/*
+ * freeze_processes - Freeze processes prior to saving an image of memory.
+ *
+ * Return value: 0 = success, else # of processes that we failed to stop.
+ */
+extern asmlinkage long sys_sync(void);
+
+/* Freeze_processes.
+ * If the flag no_progress is non-zero, progress bars not be updated.
+ * Debugging output is still printed.
+ */
+int freeze_processes(int no_progress)
+{
+ int showidlelist, result = 0, num_type[3];
+ struct task_struct *p, *g;
+
+ showidlelist = 1;
+
+ num_type[0] = num_type[1] = num_type[2] = 0;
+
+ set_suspend_state(SUSPEND_FREEZER_ON);
+
+ suspend_result = 0; /* Might be called from pm_disk or suspend -
+ ensure reset */
+
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ if (p->mm) {
+ if (p->flags & PF_SYNCTHREAD) {
+ suspend_message(SUSPEND_FREEZER, SUSPEND_MEDIUM, 0,
+ "%s (%d) is a syncthread at entrance to "
+ "fridge\n", p->comm, p->pid);
+ num_type[1]++;
+ } else
+ num_type[2]++;
+ } else {
+ if (p->flags & PF_NOFREEZE)
+ suspend_message(SUSPEND_FREEZER, SUSPEND_MEDIUM, 0,
+ "%s (%d) is NO_FREEZE.\n",
+ p->comm, p->pid);
+ else
+ num_type[2]++;
}
- } while(todo);
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+ suspend_message(SUSPEND_FREEZER, SUSPEND_MEDIUM, 0, "\n");
+
+ /* First, freeze all userspace, non syncing threads. */
+ if (freeze_threads(1, no_progress) || (TEST_RESULT_STATE(SUSPEND_ABORTED)))
+ goto aborting;

- printk( "|\n" );
- BUG_ON(in_atomic());
- return 0;
+ /* Now freeze processes that were syncing and are still running */
+ if (freeze_threads(2, no_progress) || (TEST_RESULT_STATE(SUSPEND_ABORTED)))
+ goto aborting;
+
+ /* Now do our own sync, just in case one wasn't running already */
+ if ((!no_progress) && (suspend2_core_ops))
+ suspend2_core_ops->prepare_status(1, 1,
+ "Freezing processes: Syncing remaining I/O.");
+
+ sys_sync();
+
+ set_suspend_state(SUSPEND_DISABLE_SYNCING);
+
+ /* Freeze kernel threads */
+ if (freeze_threads(3, no_progress) || (TEST_RESULT_STATE(SUSPEND_ABORTED)))
+ goto aborting;
+
+ if (TEST_ACTION_STATE(SUSPEND_FREEZE_TIMERS)) {
+ printk("Enabling timer freezer. If you get a hang, note " \
+ "the timer attempting to run, press T to disable " \
+ "the timer freezer. After resuming, please look up " \
+ "the address you recorded in System.map and report " \
+ "the routinue to Nigel.\n");
+ set_suspend_state(SUSPEND_TIMER_FREEZER_ON);
+ }
+
+ suspend_task = current->pid;
+out:
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 1,
+ "Left freezer loop.\n");
+
+ clear_suspend_state(SUSPEND_FREEZE_SMP);
+
+ while (atomic_read(&suspend_cpu_counter)) {
+ cpu_relax();
+ barrier();
+ }
+
+ return result;
+aborting:
+ result = -1;
+ goto out;
}

-void thaw_processes(void)
+void thaw_processes(int which_threads)
{
- struct task_struct *g, *p;
+ struct task_struct *p, *g;
+ suspend_message(SUSPEND_FREEZER, SUSPEND_LOW, 1, "Thawing tasks\n");
+
+ suspend_task = 0;
+ if (which_threads != FREEZER_KERNEL_THREADS)
+ clear_suspend_state(SUSPEND_FREEZER_ON);
+
+ clear_suspend_state(SUSPEND_DISABLE_SYNCING);
+ clear_suspend_state(SUSPEND_TIMER_FREEZER_ON);
+
+ /*
+ * Pause the other processors so we can safely
+ * change threads' flags
+ */
+
+ smp_pause();
+
+ preempt_disable();
+
+ local_irq_disable();

- printk( "Restarting tasks..." );
read_lock(&tasklist_lock);
+
do_each_thread(g, p) {
- if (!freezeable(p))
- continue;
if (p->flags & PF_FROZEN) {
+ if ((which_threads == FREEZER_KERNEL_THREADS) &&
+ (p->mm))
+ continue;
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0,
+ "Waking %5d: %s.\n", p->pid, p->comm);
p->flags &= ~PF_FROZEN;
wake_up_process(p);
- } else
- printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
+ }
} while_each_thread(g, p);

read_unlock(&tasklist_lock);
- schedule();
- printk( " done\n" );
+
+ smp_continue();
+
+ preempt_enable();
+
+ local_irq_enable();
}

+EXPORT_SYMBOL(suspend_task);
+EXPORT_SYMBOL(suspend_action);
+EXPORT_SYMBOL(software_suspend_state);
+EXPORT_SYMBOL(freeze_processes);
+EXPORT_SYMBOL(thaw_processes);
+#ifdef CONFIG_SMP
+EXPORT_SYMBOL(smp_suspend);
+EXPORT_SYMBOL(smp_continue);
+#endif
EXPORT_SYMBOL(refrigerator);
diff -ruN 582-refrigerator-old/kernel/signal.c 582-refrigerator-new/kernel/signal.c
--- 582-refrigerator-old/kernel/signal.c 2004-11-24 18:03:13.005298384 +1100
+++ 582-refrigerator-new/kernel/signal.c 2004-11-24 17:56:07.270019984 +1100
@@ -2178,10 +2178,11 @@
sigandsets(&current->blocked, &current->blocked, &these);
recalc_sigpending();
spin_unlock_irq(&current->sighand->siglock);
-
current->state = TASK_INTERRUPTIBLE;
timeout = schedule_timeout(timeout);
-
+ if (current->flags & PF_FREEZE) {
+ refrigerator(PF_FREEZE);
+ }
spin_lock_irq(&current->sighand->siglock);
sig = dequeue_signal(current, &these, &info);
current->blocked = current->real_blocked;
diff -ruN 582-refrigerator-old/mm/pdflush.c 582-refrigerator-new/mm/pdflush.c
--- 582-refrigerator-old/mm/pdflush.c 2004-11-24 18:03:13.252260840 +1100
+++ 582-refrigerator-new/mm/pdflush.c 2004-11-24 17:56:07.271019832 +1100
@@ -17,7 +17,6 @@
#include <linux/gfp.h>
#include <linux/init.h>
#include <linux/module.h>
-#include <linux/suspend.h>
#include <linux/fs.h> // Needed by writeback.h
#include <linux/writeback.h> // Prototypes pdflush_operation()
#include <linux/kthread.h>
@@ -90,7 +89,7 @@

static int __pdflush(struct pdflush_work *my_work)
{
- current->flags |= PF_FLUSHER;
+ current->flags |= (PF_FLUSHER | PF_SYNCTHREAD);
my_work->fn = NULL;
my_work->who = current;
INIT_LIST_HEAD(&my_work->list);
@@ -106,8 +105,7 @@
spin_unlock_irq(&pdflush_lock);

schedule();
- if (current->flags & PF_FREEZE) {
- refrigerator(PF_FREEZE);
+ if (try_to_freeze(PF_FREEZE)) {
spin_lock_irq(&pdflush_lock);
continue;
}
diff -ruN 582-refrigerator-old/mm/vmscan.c 582-refrigerator-new/mm/vmscan.c
--- 582-refrigerator-old/mm/vmscan.c 2004-11-24 18:03:13.302253240 +1100
+++ 582-refrigerator-new/mm/vmscan.c 2004-11-24 17:56:07.273019528 +1100
@@ -21,7 +21,6 @@
#include <linux/highmem.h>
#include <linux/file.h>
#include <linux/writeback.h>
-#include <linux/suspend.h>
#include <linux/blkdev.h>
#include <linux/buffer_head.h> /* for try_to_release_page(),
buffer_heads_over_limit */
@@ -1170,8 +1169,7 @@
tsk->flags |= PF_MEMALLOC|PF_KSWAPD;

for ( ; ; ) {
- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);
prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
schedule();
finish_wait(&pgdat->kswapd_wait, &wait);
diff -ruN 582-refrigerator-old/net/bluetooth/rfcomm/core.c 582-refrigerator-new/net/bluetooth/rfcomm/core.c
--- 582-refrigerator-old/net/bluetooth/rfcomm/core.c 2004-11-03 21:51:10.000000000 +1100
+++ 582-refrigerator-new/net/bluetooth/rfcomm/core.c 2004-11-24 17:56:07.285017704 +1100
@@ -1726,6 +1726,8 @@
schedule();
}

+ try_to_freeze(PF_FREEZE);
+
/* Process stuff */
clear_bit(RFCOMM_SCHED_WAKEUP, &rfcomm_event);
rfcomm_process_sessions();
diff -ruN 582-refrigerator-old/net/rxrpc/krxiod.c 582-refrigerator-new/net/rxrpc/krxiod.c
--- 582-refrigerator-old/net/rxrpc/krxiod.c 2004-11-03 21:51:11.000000000 +1100
+++ 582-refrigerator-new/net/rxrpc/krxiod.c 2004-11-24 17:56:07.289017096 +1100
@@ -138,6 +138,8 @@

_debug("### End Work");

+ try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
rxrpc_discard_my_signals();

diff -ruN 582-refrigerator-old/net/rxrpc/krxsecd.c 582-refrigerator-new/net/rxrpc/krxsecd.c
--- 582-refrigerator-old/net/rxrpc/krxsecd.c 2004-11-03 21:52:23.000000000 +1100
+++ 582-refrigerator-new/net/rxrpc/krxsecd.c 2004-11-24 17:56:07.291016792 +1100
@@ -107,6 +107,8 @@

_debug("### End Inbound Calls");

+ try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
rxrpc_discard_my_signals();

diff -ruN 582-refrigerator-old/net/rxrpc/krxtimod.c 582-refrigerator-new/net/rxrpc/krxtimod.c
--- 582-refrigerator-old/net/rxrpc/krxtimod.c 2004-11-03 21:51:24.000000000 +1100
+++ 582-refrigerator-new/net/rxrpc/krxtimod.c 2004-11-24 17:56:07.292016640 +1100
@@ -90,6 +90,8 @@
complete_and_exit(&krxtimod_dead, 0);
}

+ try_to_freeze(PF_FREEZE);
+
/* discard pending signals */
rxrpc_discard_my_signals();

diff -ruN 582-refrigerator-old/net/sunrpc/sched.c 582-refrigerator-new/net/sunrpc/sched.c
--- 582-refrigerator-old/net/sunrpc/sched.c 2004-11-03 21:53:47.000000000 +1100
+++ 582-refrigerator-new/net/sunrpc/sched.c 2004-11-24 17:56:07.316012992 +1100
@@ -18,7 +18,6 @@
#include <linux/smp.h>
#include <linux/smp_lock.h>
#include <linux/spinlock.h>
-#include <linux/suspend.h>

#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/xprt.h>
diff -ruN 582-refrigerator-old/net/sunrpc/svcsock.c 582-refrigerator-new/net/sunrpc/svcsock.c
--- 582-refrigerator-old/net/sunrpc/svcsock.c 2004-11-03 21:51:16.000000000 +1100
+++ 582-refrigerator-new/net/sunrpc/svcsock.c 2004-11-24 17:56:07.324011776 +1100
@@ -31,7 +31,6 @@
#include <linux/slab.h>
#include <linux/netdevice.h>
#include <linux/skbuff.h>
-#include <linux/suspend.h>
#include <net/sock.h>
#include <net/checksum.h>
#include <net/ip.h>
@@ -1187,6 +1186,7 @@
arg->len = (pages-1)*PAGE_SIZE;
arg->tail[0].iov_len = 0;

+ try_to_freeze(PF_FREEZE);
if (signalled())
return -EINTR;

@@ -1227,8 +1227,7 @@

schedule_timeout(timeout);

- if (current->flags & PF_FREEZE)
- refrigerator(PF_FREEZE);
+ try_to_freeze(PF_FREEZE);

spin_lock_bh(&serv->sv_lock);
remove_wait_queue(&rqstp->rq_wait, &wait);


2004-11-24 14:13:04

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 36/51: Highlevel I/O routines.

Highlevel routines for doing I/O. These routines are designed to know
nothing about how and where the data is actually saved. Here, we just
focus on asking our writer to write data for us and get it back from
storage.

diff -ruN 826-io-old/kernel/power/io.c 826-io-new/kernel/power/io.c
--- 826-io-old/kernel/power/io.c 1970-01-01 10:00:00.000000000 +1000
+++ 826-io-new/kernel/power/io.c 2004-11-13 19:28:57.000000000 +1100
@@ -0,0 +1,1095 @@
+/*
+ * kernel/power/io.c
+ *
+ * Copyright (C) 1998-2001 Gabor Kuti <[email protected]>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <[email protected]>
+ * Copyright (C) 2002-2003 Florent Chabaud <[email protected]>
+ * Copyright (C) 2002-2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains high level IO routines for suspend.
+ *
+ */
+
+#define SUSPEND_IO_C
+
+#include <linux/suspend.h>
+#include <linux/version.h>
+#include <linux/mm.h>
+#include <linux/utsname.h>
+
+#include "suspend.h"
+#include "plugins.h"
+
+/* Variables saved in the suspend header */
+extern unsigned long orig_mem_free;
+extern int suspend_act_used;
+extern int suspend_lvl_used;
+extern int suspend_dbg_used;
+extern volatile int suspend_io_time[2][2];
+
+extern struct pagedir __nosavedata pagedir_resume;
+extern struct range * unused_ranges;
+extern int suspend2_prepare_console(void);
+
+/* Routines we call when reloading the original kernel */
+extern void warmup_collision_cache(void);
+extern int get_pageset1_load_addresses(void);
+
+extern void get_next_pbe(struct pbe2 * pbe);
+extern void get_first_pbe(struct pbe2 * pbe, struct pagedir * pagedir);
+
+static void noresume_reset_plugins(void);
+
+/* cleanup_finished_suspend_io
+ *
+ * Description: Very simple helper function to save #including all the
+ * suspend code in fs/buffer.c and anywhere else we might
+ * want to wait on suspend I/O in future.
+ */
+
+void cleanup_finished_suspend_io(void)
+{
+ active_writer->ops.writer.wait_on_io(0);
+}
+
+/* fill_suspend_header()
+ *
+ * Description: Fill the suspend header structure.
+ * Arguments: struct suspend_header: Header data structure to be filled.
+ */
+
+static __inline__ void fill_suspend_header(struct suspend_header *sh)
+{
+ int i;
+
+ memset((char *)sh, 0, sizeof(*sh));
+
+ sh->version_code = LINUX_VERSION_CODE;
+ sh->num_physpages = num_physpages;
+ sh->orig_mem_free = orig_mem_free;
+ strncpy(sh->machine, system_utsname.machine, 65);
+ strncpy(sh->version, system_utsname.version, 65);
+ sh->num_cpus = num_online_cpus();
+ sh->page_size = PAGE_SIZE;
+ sh->pagedir = pagedir1;
+ sh->pagedir.origranges.first = pagedir1.origranges.first;
+ sh->pagedir.destranges.first = pagedir1.destranges.first;
+ sh->pagedir.allocdranges.first = pagedir1.allocdranges.first;
+ sh->unused_ranges = unused_ranges;
+ sh->num_range_pages = num_range_pages;
+ sh->pageset_2_size = pagedir2.pageset_size;
+ sh->param0 = suspend_result;
+ sh->param1 = suspend_action;
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ sh->param2 = suspend_debug_state;
+#endif
+ sh->param3 = console_loglevel;
+ for (i = 0; i < 4; i++)
+ sh->io_time[i/2][i%2] =
+ suspend_io_time[i/2][i%2];
+}
+
+/* write_pageset()
+ *
+ * Description: Write a pageset to disk.
+ * Arguments: pagedir: Pointer to the pagedir to be saved.
+ * whichtowrite: Controls what debugging output is printed.
+ * Returns: Zero on success or -1 on failure.
+ */
+
+int write_pageset(struct pagedir * pagedir, int whichtowrite)
+{
+ int nextupdate = 0, size, ret = 0, i, base = 0;
+ int barmax = pagedir1.pageset_size + pagedir2.pageset_size;
+ int start_time, end_time;
+ long error = 0;
+ struct pbe2 pbe;
+ unsigned int origfree = real_nr_free_pages();
+ struct suspend_plugin_ops * this_filter, * first_filter = get_next_filter(NULL);
+
+ PRINTFREEMEM("at start of write pageset");
+
+ size = pagedir->pageset_size;
+ if (!size)
+ return 0;
+
+ if (whichtowrite == 1) {
+ prepare_status(1, 0, "Writing kernel & process data...");
+ base = pagedir2.pageset_size;
+ } else {
+ prepare_status(1, 1, "Writing caches...");
+ }
+
+ start_time = jiffies;
+
+ /* Initialise page transformers */
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.write_init)
+ this_filter->ops.filter.write_init(whichtowrite);
+ }
+
+ PRINTFREEMEM("after initialising page transformers");
+
+ /* Initialise writer */
+ active_writer->ops.filter.write_init(whichtowrite);
+ PRINTFREEMEM("after initialising writer");
+
+ get_first_pbe(&pbe, pagedir);
+
+ /* Write the data */
+ for (i=0; i<size; i++) {
+ int was_mapped = 0;
+ /* Status update */
+ if (!(i&0x1FF))
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1, ".");
+ if (((i+base) >= nextupdate) ||
+ (!(i%(1 << (20 - PAGE_SHIFT)))))
+ nextupdate = update_status(i + base, barmax,
+ " %d/%d MB ", MB(base+i+1), MB(barmax));
+ if ((i == (size - 5)) &&
+ TEST_ACTION_STATE(SUSPEND_PAUSE_NEAR_PAGESET_END))
+ check_shift_keys(1, "Five more pages to write.");
+ suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1,
+ "Submitting page %d/%d.\n", i, size);
+
+ /* Write */
+ was_mapped = suspend_map_kernel_page(pbe.address, 1);
+ if (TEST_ACTION_STATE(SUSPEND_TEST_FILTER_SPEED))
+ ret = first_filter->ops.filter.write_chunk(pbe.origaddress);
+ else
+ ret = first_filter->ops.filter.write_chunk(pbe.address);
+ if (!was_mapped)
+ suspend_map_kernel_page(pbe.address, 0);
+
+ if (ret) {
+ printk("Write chunk returned %d.\n", ret);
+ abort_suspend("Failed to write a chunk of the "
+ "image.");
+ error = -1;
+ goto write_pageset_free_buffers;
+ }
+
+ /* Interactivity */
+ check_shift_keys(0, NULL);
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED)) {
+ abort_suspend("Aborting as requested.");
+ error = -1;
+ goto write_pageset_free_buffers;
+ }
+
+ /* Prepare next */
+ get_next_pbe(&pbe);
+ }
+
+ update_status(base+size, barmax, " %d/%d MB ",
+ MB(base+size), MB(barmax));
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1, "|\n");
+ PRINTFREEMEM("after writing data");
+
+write_pageset_free_buffers:
+
+ /* Flush data and cleanup */
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.write_cleanup)
+ this_filter->ops.filter.write_cleanup();
+ }
+ PRINTFREEMEM("after cleaning up transformers");
+ active_writer->ops.writer.write_cleanup();
+ PRINTFREEMEM("after cleaning up writer");
+
+ /* Statistics */
+ end_time = jiffies;
+
+ if ((end_time - start_time) && (!TEST_RESULT_STATE(SUSPEND_ABORTED))) {
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1,
+ "Time to write data: %d pages in %d jiffies => "
+ "MB written per second: %lu.\n",
+ size,
+ (end_time - start_time),
+ (MB((unsigned long) size) * HZ / (end_time - start_time)));
+ suspend_io_time[0][0] += size,
+ suspend_io_time[0][1] += (end_time - start_time);
+ }
+
+ PRINTFREEMEM("at end of write pageset");
+
+ /* Sanity checking */
+ if (real_nr_free_pages() != origfree) {
+ abort_suspend("Number of free pages at start and end of write "
+ "pageset don't match! (%d != %d)",
+ origfree, real_nr_free_pages());
+ }
+
+ suspend_store_free_mem(SUSPEND_FREE_IO, 0);
+ return error;
+}
+
+/* read_pageset()
+ *
+ * Description: Read a pageset from disk.
+ * Arguments: pagedir: Pointer to the pagedir to be saved.
+ * whichtowrite: Controls what debugging output is printed.
+ * overwrittenpagesonly: Whether to read the whole pageset or
+ * only part.
+ * Returns: Zero on success or -1 on failure.
+ */
+
+int read_pageset(struct pagedir * pagedir, int whichtoread,
+ int overwrittenpagesonly)
+{
+ int nextupdate = 0, result = 0, base = 0;
+ int start_time, end_time, finish_at = pagedir->pageset_size;
+ int barmax = pagedir1.pageset_size + pagedir2.pageset_size;
+ int i;
+ struct pbe2 pbe;
+ struct suspend_plugin_ops * this_filter, * first_filter = get_next_filter(NULL);
+
+ PRINTFREEMEM("at start of read pageset");
+
+ if (whichtoread == 1) {
+ prepare_status(1, 1, "Reading kernel & process data...");
+ } else {
+ prepare_status(1, 0, "Reading caches...");
+ if (overwrittenpagesonly)
+ barmax = finish_at = min(pageset1_size, pageset2_size);
+ else {
+ base = pagedir1.pageset_size;
+ }
+ }
+
+ start_time=jiffies;
+
+ /* Initialise page transformers */
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.read_init &&
+ this_filter->ops.filter.read_init(whichtoread)) {
+ abort_suspend("Failed to initialise a filter.");
+ result = 1;
+ goto read_pageset_free_buffers;
+ }
+ }
+
+ /* Initialise writer */
+ if (active_writer->ops.writer.read_init(whichtoread)) {
+ abort_suspend("Failed to initialise the writer.");
+ result = 1;
+ goto read_pageset_free_buffers;
+ }
+
+ get_first_pbe(&pbe, pagedir);
+
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1,
+ "Attempting to read %d pages.\n", finish_at);
+
+ /* Read the pages */
+ for (i=0; i< finish_at; i++) {
+ int was_mapped = 0;
+ /* Status */
+ if (!(i&0x1FF))
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1, ".");
+ if (((i+base) >= nextupdate) ||
+ (!(i%(1 << (20 - PAGE_SHIFT)))))
+ nextupdate = update_status(i+base, barmax,
+ " %d/%d MB ", MB(base+i+1), MB(barmax));
+ if ((i == (finish_at - 5)) &&
+ TEST_ACTION_STATE(SUSPEND_PAUSE_NEAR_PAGESET_END))
+ check_shift_keys(1, "Five more pages to read.");
+ suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1,
+ "Submitting page %d/%d.\n", i, finish_at);
+
+ was_mapped = suspend_map_kernel_page(pbe.address, 1);
+ result = first_filter->ops.filter.read_chunk(pbe.address, SUSPEND_ASYNC);
+ if (!was_mapped)
+ suspend_map_kernel_page(pbe.address, 0);
+
+ if (result) {
+ panic("Failed to read chunk %d/%d of the image.",
+ i, finish_at);
+ goto read_pageset_free_buffers;
+ }
+
+ /* Interactivity*/
+ check_shift_keys(0, NULL);
+
+ /* Prepare next */
+ get_next_pbe(&pbe);
+ }
+
+ update_status(base+finish_at, barmax, " %d/%d MB ",
+ MB(base+finish_at), MB(barmax));
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1, "|\n");
+
+read_pageset_free_buffers:
+
+ /* Finish I/O, flush data and cleanup reads. */
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.read_cleanup &&
+ this_filter->ops.filter.read_cleanup()) {
+ abort_suspend("Failed to cleanup a filter.");
+ result = 1;
+ }
+ }
+
+ if (active_writer->ops.writer.read_cleanup()) {
+ abort_suspend("Failed to cleanup the writer.");
+ result = 1;
+ }
+
+ /* Statistics */
+ end_time=jiffies;
+ if ((end_time - start_time) && (!TEST_RESULT_STATE(SUSPEND_ABORTED))) {
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 1,
+ "Time to read data: %d pages in %d jiffies => "
+ "MB read per second: %lu.\n",
+ finish_at,
+ (end_time - start_time),
+ (MB((unsigned long) finish_at) * HZ /
+ (end_time - start_time)));
+ suspend_io_time[1][0] += finish_at,
+ suspend_io_time[1][1] += (end_time - start_time);
+ }
+
+ PRINTFREEMEM("at end of read pageset");
+
+ suspend_store_free_mem(SUSPEND_FREE_IO, 1);
+ return result;
+}
+
+/* write_plugin_configs()
+ *
+ * Description: Store the configuration for each plugin in the image header.
+ * Returns: Int: Zero on success, Error value otherwise.
+ */
+static int write_plugin_configs(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+ char * buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ int len, index = 1;
+ struct plugin_header plugin_header;
+
+ if (!buffer) {
+ printk("Failed to allocate a buffer for saving "
+ "plugin configuration info.\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * We have to know which data goes with which plugin, so we at
+ * least write a length of zero for a plugin. Note that we are
+ * also assuming every plugin's config data takes <= PAGE_SIZE.
+ */
+
+ /* For each plugin (in registration order) */
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+
+ /* Get the data from the plugin */
+ len = 0;
+ if (this_plugin->save_config_info)
+ len = this_plugin->save_config_info(buffer);
+
+ /* Save the details of the plugin */
+ plugin_header.disabled = this_plugin->disabled;
+ plugin_header.type = this_plugin->type;
+ plugin_header.index = index++;
+ strncpy(plugin_header.name, this_plugin->name,
+ sizeof(plugin_header.name));
+ active_writer->ops.writer.write_header_chunk(
+ (char *) &plugin_header,
+ sizeof(plugin_header));
+
+ /* Save the size of the data and any data returned */
+ active_writer->ops.writer.write_header_chunk((char *) &len,
+ sizeof(int));
+ if (len)
+ active_writer->ops.writer.write_header_chunk(
+ buffer, len);
+ }
+
+ /* Write a blank header to terminate the list */
+ plugin_header.name[0] = '\0';
+ active_writer->ops.writer.write_header_chunk(
+ (char *) &plugin_header,
+ sizeof(plugin_header));
+
+ free_pages((unsigned long) buffer, 0);
+ return 0;
+}
+
+/* read_plugin_configs()
+ *
+ * Description: Reload plugin configurations from the image header.
+ * Returns: Int. Zero on success, error value otherwise.
+ */
+
+static int read_plugin_configs(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+ char * buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ int len, result = 0;
+ struct plugin_header plugin_header;
+
+ if (!buffer) {
+ printk("Failed to allocate a buffer for reloading plugin "
+ "configuration info.\n");
+ return -ENOMEM;
+ }
+
+ /* All plugins are initially disabled. That way, if we have a plugin
+ * loaded now that wasn't loaded when we suspended, it won't be used
+ * in trying to read the data.
+ */
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list)
+ this_plugin->disabled = 1;
+
+ /* Get the first plugin header */
+ result = active_writer->ops.writer.read_header_chunk(
+ (char *) &plugin_header, sizeof(plugin_header));
+ if (!result) {
+ printk("Failed to read the next plugin header.\n");
+ free_pages((unsigned long) buffer, 0);
+ return -EINVAL;
+ }
+
+ /* For each plugin (in registration order) */
+ while (plugin_header.name[0]) {
+
+ /* Find the plugin */
+ this_plugin = find_plugin_given_name(plugin_header.name);
+
+ if (!this_plugin) {
+ /*
+ * Is it used? Only need to worry about filters. The active
+ * writer must be loaded!
+ */
+ if ((!plugin_header.disabled) && (plugin_header.type == FILTER_PLUGIN)) {
+ suspend_early_boot_message(1, "It looks like we need plugin %s for reading the image "
+ "but it hasn't been registered.\n",
+ plugin_header.name);
+ if (!(test_suspend_state(SUSPEND_CONTINUE_REQ))) {
+ active_writer->ops.writer.invalidate_image();
+ result = -EINVAL;
+ noresume_reset_plugins();
+ free_pages((unsigned long) buffer, 0);
+ return -EINVAL;
+ }
+ } else
+ printk("Plugin %s configuration data found, but the plugin "
+ "hasn't registered. Looks like it was disabled, so "
+ "we're ignoring it's data.",
+ plugin_header.name);
+ }
+
+ /* Get the length of the data (if any) */
+ result = active_writer->ops.writer.read_header_chunk(
+ (char *) &len, sizeof(int));
+ if (!result) {
+ printk("Failed to read the length of the plugin %s's"
+ " configuration data.\n",
+ plugin_header.name);
+ free_pages((unsigned long) buffer, 0);
+ return -EINVAL;
+ }
+
+ /* Read any data and pass to the plugin (if we found one) */
+ if (len) {
+ active_writer->ops.writer.read_header_chunk(buffer, len);
+ if (this_plugin) {
+ if (!this_plugin->save_config_info) {
+ printk("Huh? Plugin %s appears to have a "
+ "save_config_info, but not a "
+ "load_config_info function!\n",
+ this_plugin->name);
+ } else
+ this_plugin->load_config_info(buffer, len);
+ }
+ }
+
+ if (this_plugin) {
+ /* Now move this plugin to the tail of its lists. This will put it
+ * in order. Any new plugins will end up at the top of the lists.
+ * They should have been set to disabled when loaded (people will
+ * normally not edit an initrd to load a new module and then
+ * suspend without using it!).
+ */
+
+ suspend_move_plugin_tail(this_plugin);
+
+ /*
+ * We apply the disabled state; plugins don't need to save whether they
+ * were disabled and if they do, we override them anyway.
+ */
+ this_plugin->disabled = plugin_header.disabled;
+ }
+
+ /* Get the next plugin header */
+ result = active_writer->ops.writer.read_header_chunk(
+ (char *) &plugin_header, sizeof(plugin_header));
+
+ if (!result) {
+ printk("Failed to read the next plugin header.\n");
+ free_pages((unsigned long) buffer, 0);
+ return -EINVAL;
+ }
+
+ }
+
+ free_pages((unsigned long) buffer, 0);
+ return 0;
+}
+
+/* write_image_header()
+ *
+ * Description: Write the image header after write the image proper.
+ * Returns: Int. Zero on success or -1 on failure.
+ */
+
+int write_image_header(void)
+{
+ int i, nextupdate = 0, ret;
+ int total = pagedir1.pageset_size+pagedir2.pageset_size+2;
+ int progress = total-1;
+ char * header_buffer = NULL;
+
+ /* First, relativise all range information */
+ if (get_rangepages_list())
+ return -1;
+
+ if (unused_ranges)
+ unused_ranges = RANGE_RELATIVE(unused_ranges);
+
+ relativise_chain(&pagedir1.origranges);
+ relativise_chain(&pagedir1.destranges);
+ relativise_chain(&pagedir1.allocdranges);
+
+ if ((ret = active_writer->ops.writer.prepare_save_ranges())) {
+ abort_suspend("Active writer's prepare_save_ranges "
+ "function failed.");
+ goto write_image_header_abort1;
+ }
+
+ relativise_ranges();
+
+ /* Now prepare to write the header */
+ if ((ret = active_writer->ops.writer.write_header_init())) {
+ abort_suspend("Active writer's write_header_init"
+ " function failed.");
+ goto write_image_header_abort2;
+ }
+
+ /* Get a buffer */
+ header_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ if (!header_buffer) {
+ abort_suspend("Out of memory when trying to get page "
+ "for header!");
+ goto write_image_header_abort3;
+ }
+
+ /* Write the meta data */
+ fill_suspend_header((struct suspend_header *) header_buffer);
+ active_writer->ops.writer.write_header_chunk(header_buffer,
+ sizeof(struct suspend_header));
+
+ /* Write plugin configurations */
+ if ((ret = write_plugin_configs())) {
+ abort_suspend("Failed to write plugin configs.");
+ goto write_image_header_abort3;
+ }
+
+ /* Write range pages */
+ suspend_message(SUSPEND_HEADER, SUSPEND_LOW, 1,
+ name_suspend "Writing %d range pages.\n",
+ num_range_pages);
+
+ for (i=1; i<=num_range_pages; i++) {
+ unsigned long * this_range_page = get_rangepages_list_entry(i);
+ /* Status update */
+ suspend_message(SUSPEND_HEADER, SUSPEND_VERBOSE, 1, "%d/%d: %p.\n",
+ i, num_range_pages, this_range_page);
+
+ if (i >= nextupdate)
+ nextupdate = update_status(progress + i, total, NULL);
+
+ /* Check for aborting/pausing */
+ check_shift_keys(0, NULL);
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED)) {
+ abort_suspend("Aborting as requested.");
+ goto write_image_header_abort3;
+ }
+
+ /* Write one range page */
+ active_writer->ops.writer.write_header_chunk(
+ (char *) this_range_page, PAGE_SIZE);
+
+ if (ret) {
+ abort_suspend("Failed writing a page. "
+ "Error number was %d.", ret);
+ goto write_image_header_abort3;
+ }
+ }
+
+ update_status(total - 1, total, NULL);
+
+ /* Flush data and let writer cleanup */
+ if (active_writer->ops.writer.write_header_cleanup()) {
+ abort_suspend("Failed to cleanup writing header.");
+ goto write_image_header_abort2;
+ }
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED))
+ goto write_image_header_abort2;
+
+ suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1, "|\n");
+ update_status(total, total, NULL);
+
+ MDELAY(1000);
+ free_pages((unsigned long) header_buffer, 0);
+
+ return 0;
+
+ /*
+ * Aborting. We need to...
+ * - let the writer cleanup (if necessary)
+ * - revert ranges to absolute values
+ */
+write_image_header_abort3:
+ active_writer->ops.writer.write_header_cleanup();
+
+write_image_header_abort2:
+ absolutise_ranges();
+
+ put_rangepages_list();
+
+ if (active_writer->ops.writer.post_load_ranges)
+ active_writer->ops.writer.post_load_ranges();
+
+write_image_header_abort1:
+ if (get_rangepages_list())
+ panic("Unable to allocate rangepageslist.");
+
+ absolutise_chain(&pagedir1.origranges);
+ absolutise_chain(&pagedir1.destranges);
+ absolutise_chain(&pagedir1.allocdranges);
+
+ put_rangepages_list();
+
+ free_pages((unsigned long) header_buffer, 0);
+ return -1;
+}
+
+extern int suspend_early_boot_message(int can_erase_image, char *reason, ...);
+
+/* sanity_check()
+ *
+ * Description: Perform a few checks, seeking to ensure that the kernel being
+ * booted matches the one suspended. They need to match so we can
+ * be _sure_ things will work. It is not absolutely impossible for
+ * resuming from a different kernel to work, just not assured.
+ * Arguments: Struct suspend_header. The header which was saved at suspend
+ * time.
+ */
+static int sanity_check(struct suspend_header *sh)
+{
+ if (sh->version_code != LINUX_VERSION_CODE)
+ return suspend_early_boot_message(1, "Incorrect kernel version");
+
+ if (sh->num_physpages != num_physpages)
+ return suspend_early_boot_message(1, "Incorrect memory size");
+
+ if (strncmp(sh->machine, system_utsname.machine, 65))
+ return suspend_early_boot_message(1, "Incorrect machine type");
+
+ if (strncmp(sh->version, system_utsname.version, 65))
+ return suspend_early_boot_message(1, "Incorrect version");
+
+ if (sh->num_cpus != num_online_cpus())
+ return suspend_early_boot_message(1, "Incorrect number of cpus");
+
+ if (sh->page_size != PAGE_SIZE)
+ return suspend_early_boot_message(1, "Incorrect PAGE_SIZE");
+
+ return 0;
+}
+
+/* noresume_reset_plugins
+ *
+ * Description: When we read the start of an image, plugins (and especially the
+ * active writer) might need to reset data structures if we decide
+ * to invalidate the image rather than resuming from it.
+ */
+
+static void noresume_reset_plugins(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (this_plugin->ops.filter.noresume_reset)
+ this_plugin->ops.filter.noresume_reset();
+ }
+}
+
+/* __read_primary_suspend_image
+ *
+ * Description: Test for the existence of an image and attempt to load it.
+ * Returns: Int. Zero if image found and pageset1 successfully loaded.
+ * Error if no image found or loaded.
+ */
+static int __read_primary_suspend_image(void)
+{
+ int i, result = 0;
+ char * header_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ struct suspend_header * suspend_header;
+ struct range * last_range_page = NULL;
+
+ if (!header_buffer)
+ return -ENOMEM;
+
+ set_suspend_state(SUSPEND_RUNNING);
+
+ /* Check for an image */
+ if (!(result = active_writer->ops.writer.image_exists())) {
+ result = -ENODATA;
+ noresume_reset_plugins();
+ goto out;
+ }
+
+ /* Check for noresume command line option */
+ if (test_suspend_state(SUSPEND_NORESUME_SPECIFIED)) {
+ active_writer->ops.writer.invalidate_image();
+ result = -EINVAL;
+ noresume_reset_plugins();
+ goto out;
+ }
+
+ /* Check whether we've resumed before */
+ if (test_suspend_state(SUSPEND_RESUMED_BEFORE)) {
+ suspend_early_boot_message(1, NULL);
+ if (!(test_suspend_state(SUSPEND_CONTINUE_REQ))) {
+ active_writer->ops.writer.invalidate_image();
+ result = -EINVAL;
+ noresume_reset_plugins();
+ goto out;
+ }
+ }
+
+ clear_suspend_state(SUSPEND_CONTINUE_REQ);
+
+ /*
+ * Prepare the active writer for reading the image header. The
+ * activate writer might read its own configuration or set up
+ * a network connection here.
+ *
+ * NB: This call may never return because there might be a signature
+ * for a different image such that we warn the user and they choose
+ * to reboot. (If the device ids look erroneous (2.4 vs 2.6) or the
+ * location of the image might be unavailable if it was stored on a
+ * network connection.
+ */
+
+ if ((result = active_writer->ops.writer.read_header_init())) {
+ noresume_reset_plugins();
+ goto out;
+ }
+
+ /* Read suspend header */
+ if ((result = active_writer->ops.writer.read_header_chunk(
+ header_buffer, sizeof(struct suspend_header))) < 0) {
+ noresume_reset_plugins();
+ goto out;
+ }
+
+ suspend_header = (struct suspend_header *) header_buffer;
+
+ /*
+ * NB: This call may also result in a reboot rather than returning.
+ */
+
+ if (sanity_check(suspend_header)) { /* Is this the same machine? */
+ active_writer->ops.writer.invalidate_image();
+ result = -EINVAL;
+ noresume_reset_plugins();
+ goto out;
+ }
+
+ /*
+ * ----------------------------------------------------
+ * We have an image and it looks like it will load okay.
+ * ----------------------------------------------------
+ */
+
+ /* Get metadata from header. Don't override commandline parameters.
+ *
+ * We don't need to save the image size limit because it's not used
+ * during resume and will be restored with the image anyway.
+ */
+
+ orig_mem_free = suspend_header->orig_mem_free;
+ memcpy((char *) &pagedir_resume,
+ (char *) &suspend_header->pagedir, sizeof(pagedir_resume));
+ unused_ranges = suspend_header->unused_ranges;
+ num_range_pages = suspend_header->num_range_pages;
+ suspend_result = suspend_header->param0;
+ if (!suspend_act_used)
+ suspend_action = suspend_header->param1;
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ if (!suspend_dbg_used)
+ suspend_debug_state = suspend_header->param2;
+#endif
+ if (!suspend_lvl_used)
+ suspend_default_console_level = console_loglevel = suspend_header->param3;
+ clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL);
+ pagedir1.pageset_size = pagedir_resume.pageset_size;
+ pagedir2.pageset_size = suspend_header->pageset_2_size;
+ for (i = 0; i < 4; i++)
+ suspend_io_time[i/2][i%2] =
+ suspend_header->io_time[i/2][i%2];
+
+ set_suspend_state(SUSPEND_NOW_RESUMING);
+
+ /* Read plugin configurations */
+ if ((result = read_plugin_configs())) {
+ noresume_reset_plugins();
+ num_range_pages = pagedir1.pageset_size =
+ pagedir2.pageset_size = 0;
+ unused_ranges = NULL;
+ goto out;
+ }
+
+ suspend2_prepare_console();
+
+ /* Read range pages */
+ check_shift_keys(1, "About to read pagedir.");
+
+ for (i=0; i < num_range_pages; i++) {
+ /* Get a page into which we will load the data */
+ struct range * this_range_page =
+ (struct range *) get_zeroed_page(GFP_ATOMIC);
+ if (!this_range_page) {
+ abort_suspend("Unable to allocate a pagedir.");
+ result = -ENOMEM;
+ noresume_reset_plugins();
+ goto outfreeingrangepages;
+ }
+
+ /* Link to previous page */
+ if (i == 0)
+ first_range_page = this_range_page;
+ else
+ *RANGEPAGELINK(last_range_page) =
+ (i | (unsigned long) this_range_page);
+
+ /* Read this page */
+ if ((result = active_writer->ops.writer.read_header_chunk(
+ (char *) this_range_page, PAGE_SIZE)) < 0) {
+ printk("Active writer's read_header_chunk routine "
+ "returned %d.\n", result);
+ free_page((unsigned long) this_range_page);
+ noresume_reset_plugins();
+ goto outfreeingrangepages;
+ }
+
+ last_range_page = this_range_page;
+ }
+
+ /* Set the last page's link to its index */
+ *RANGEPAGELINK(last_range_page) = i;
+
+ /* Clean up after reading the header */
+ if ((result = active_writer->ops.writer.read_header_cleanup())) {
+ noresume_reset_plugins();
+ goto outfreeingrangepages;
+ }
+
+ /* Okay.
+ *
+ * Now we need to move the range pages to a place such that they won't
+ * get overwritten while being used when copying the original kernel
+ * back. To achieve this, we need to absolutise them where they are
+ * now, prepare a bitmap of pages that collide and then relativise the
+ * range pages again. Having done that, we can relocate the range
+ * pages so that they don't collide with the image being restored,
+ * and absolutise them in that location.
+ */
+
+ if (get_rangepages_list()) {
+ result = -ENOMEM;
+ noresume_reset_plugins();
+ goto outfreeingrangepages;
+ }
+
+ /* Absolutise ranges so they can be used for building the map of
+ * pages that will be overwritten. */
+ absolutise_ranges();
+ absolutise_chain(&pagedir_resume.origranges);
+
+ /* Mark the pages used by the current and original kernels */
+ warmup_collision_cache();
+
+ /* Prepare to move the pages so they don't conflict */
+ relativise_chain(&pagedir_resume.origranges);
+ relativise_ranges();
+
+ /* Relocate the pages */
+ relocate_rangepages();
+
+ /* Make sure the rangepages list is correct */
+ put_rangepages_list();
+ get_rangepages_list();
+
+ /* Absolutise in final place */
+ absolutise_ranges();
+
+ /* Done.
+ *
+ * Now we can absolutise all the pointers to the range chains.
+ */
+
+ set_chain_names(&pagedir_resume);
+
+ absolutise_chain(&pagedir_resume.origranges);
+
+ /*
+ * We don't want the original destination ranges (the locations where
+ * the atomic copy of pageset1 was stored at suspend time); we release
+ * the chain's elements before getting new ones. (The kernel running
+ * right now could be using pages that were free when we suspended).
+ */
+
+ absolutise_chain(&pagedir_resume.destranges);
+
+ absolutise_chain(&pagedir_resume.allocdranges);
+
+ if (unused_ranges)
+ unused_ranges = RANGE_ABSOLUTE(unused_ranges);
+
+ put_rangepages_list();
+
+ /*
+ * The active writer should be using chains to record where it stored
+ * the data. Give it a chance to absolutise them.
+ */
+ if (active_writer->ops.writer.post_load_ranges)
+ active_writer->ops.writer.post_load_ranges();
+
+ /*
+ * Get the addresses of pages into which we will load the kernel to
+ * be copied back
+ */
+ put_range_chain(&pagedir_resume.destranges);
+
+ if (get_pageset1_load_addresses()) {
+ result = -ENOMEM;
+ noresume_reset_plugins();
+ goto outfreeingrangepages;
+ }
+
+ /* Read the original kernel back */
+ check_shift_keys(1, "About to read pageset 1.");
+
+ if (read_pageset(&pagedir_resume, 1, 0)) {
+ prepare_status(1, 1, "Failed to read pageset 1.");
+ result = -EPERM;
+ noresume_reset_plugins();
+ goto outfreeingrangepages;
+ }
+
+ PRINTFREEMEM("after loading image.");
+ check_shift_keys(1, "About to restore original kernel.");
+ result = 0;
+
+ if (active_writer->ops.writer.mark_resume_attempted)
+ active_writer->ops.writer.mark_resume_attempted();
+
+out:
+ free_pages((unsigned long) header_buffer, 0);
+ return result;
+outfreeingrangepages:
+ //FIXME Test i post loop and reset memory structures.
+ {
+ int j;
+ struct range * this_range_page = first_range_page;
+ struct range * next_range_page;
+ for (j = 0; j < i; j++) {
+ next_range_page = (struct range *)
+ (((unsigned long)
+ *RANGEPAGELINK(this_range_page)) & PAGE_MASK);
+ free_page((unsigned long) this_range_page);
+ this_range_page = next_range_page;
+ }
+ }
+ goto out;
+}
+
+/* read_primary_suspend_image()
+ *
+ * Description: Attempt to read the header and pageset1 of a suspend image.
+ * Handle the outcome, complaining where appropriate.
+ */
+int read_primary_suspend_image(void)
+{
+ int error;
+
+ error = __read_primary_suspend_image();
+
+ switch (error) {
+ case 0:
+ case -ENODATA:
+ case -EINVAL: /* non fatal error */
+ MDELAY(1000);
+ return error;
+ case -EIO:
+ printk(KERN_CRIT name_suspend "I/O error\n");
+ break;
+ case -ENOENT:
+ printk(KERN_CRIT name_suspend "No such file or directory\n");
+ break;
+ case -EPERM:
+ printk(KERN_CRIT name_suspend "Sanity check error\n");
+ break;
+ default:
+ printk(KERN_CRIT name_suspend "Error %d resuming\n", error);
+ break;
+ }
+ abort_suspend("Error %d in read_primary_suspend_image",error);
+ return error;
+}
+
+/* read_secondary_pagedir()
+ *
+ * Description: Read in part or all of pageset2 of an image, depending upon
+ * whether we are suspending and have only overwritten a portion
+ * with pageset1 pages, or are resuming and need to read them
+ * all.
+ * Arguments: Int. Boolean. Read only pages which would have been
+ * overwritten by pageset1?
+ * Returns: Int. Zero if no error, otherwise the error value.
+ */
+int read_secondary_pagedir(int overwrittenpagesonly)
+{
+ int result = 0;
+
+ if (!pageset2_size)
+ return 0;
+
+ suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1,
+ "Beginning of read_secondary_pagedir: ");
+
+ result = read_pageset(&pagedir2, 2, overwrittenpagesonly);
+
+ update_status(100, 100, NULL);
+ check_shift_keys(1, "Pagedir 2 read.");
+
+ suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1, "\n");
+ return result;
+}


2004-11-24 14:21:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

On Wed, Nov 24, 2004 at 11:57:55PM +1100, Nigel Cunningham wrote:
> While eating memory, we will potentially trigger this a lot. We
> therefore disable the message when suspending.

So call the allocator with __GFP_NOWARN

2004-11-24 14:33:12

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 49/51: Checksumming

A plugin for verifying the consistency of an image. Working with kdb, it
can look up the locations of variations. There will always be some
variations shown, simply because we're touching memory before we get
here and as we check the image.

diff -ruN 855-checksumming-old/kernel/power/Kconfig 855-checksumming-new/kernel/power/Kconfig
--- 855-checksumming-old/kernel/power/Kconfig 2004-11-11 15:24:19.166594960 +1100
+++ 855-checksumming-new/kernel/power/Kconfig 2004-11-09 12:45:35.000000000 +1100
@@ -206,6 +206,15 @@

For normal usage, this option can be turned off.

+ config SOFTWARE_SUSPEND_CHECKSUMS
+ tristate ' Compile checksum module'
+ depends on SOFTWARE_SUSPEND2_CORE
+ ---help---
+ This option enables compilation of a checksumming module, which can
+ be used to verify the correct operation of suspend.
+
+ For normal usage, this option can be turned off.
+
endif

endif
diff -ruN 855-checksumming-old/kernel/power/Makefile 855-checksumming-new/kernel/power/Makefile
--- 855-checksumming-old/kernel/power/Makefile 2004-11-11 15:24:19.166594960 +1100
+++ 855-checksumming-new/kernel/power/Makefile 2004-11-08 14:38:16.000000000 +1100
@@ -21,6 +21,7 @@
obj-$(CONFIG_SOFTWARE_SUSPEND_GZIP_COMPRESSION) += suspend_gzip.o
obj-$(CONFIG_SOFTWARE_SUSPEND_DEVICE_MAPPER) += suspend_dm.o
obj-$(CONFIG_SOFTWARE_SUSPEND_SWAPWRITER) += suspend_block_io.o suspend_swap.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_CHECKSUMS) += suspend_checksums.o

obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o $(swsusp-smp-y) disk.o

diff -ruN 855-checksumming-old/kernel/power/suspend_checksums.c 855-checksumming-new/kernel/power/suspend_checksums.c
--- 855-checksumming-old/kernel/power/suspend_checksums.c 1970-01-01 10:00:00.000000000 +1000
+++ 855-checksumming-new/kernel/power/suspend_checksums.c 2004-11-11 07:31:01.000000000 +1100
@@ -0,0 +1,610 @@
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#ifdef CONFIG_KDB
+#include <linux/kdb.h>
+#include <linux/kdbprivate.h>
+#endif
+#include <linux/module.h>
+
+#include "suspend.h"
+#include "plugins.h"
+#include "pageflags.h"
+#include "proc.h"
+
+#define CHECKSUMS_PER_PAGE ((PAGE_SIZE - sizeof(void *)) / sizeof(unsigned long))
+#define NEXT_CHECKSUM_PAGE(page) *((unsigned long *) (((char *) (page)) + PAGE_SIZE - sizeof(void *)))
+static int checksum_pages;
+static unsigned long * first_checksum_page, *last_checksum_page;
+static int num_reload_pages = 0;
+
+struct reload_data
+{
+ int pageset;
+ int pagenumber;
+ struct page * page_address;
+ char * base_version;
+ char * compared_version;
+ struct reload_data * next;
+};
+
+static struct reload_data * first_reload_data, * last_reload_data;
+
+static unsigned long suspend_page_checksum(struct page * page)
+{
+ unsigned long * virt;
+ int i;
+ unsigned long value = 0;
+
+ virt = (unsigned long *) kmap_atomic(page, KM_USER0);
+ for (i = 0; i < (PAGE_SIZE / sizeof(unsigned long)); i++)
+ value += *(virt + i);
+ kunmap_atomic(virt, KM_USER0);
+ return value;
+}
+
+extern void get_first_pbe(struct pbe2 * pbe, struct pagedir * pagedir);
+extern void get_next_pbe(struct pbe2 * pbe);
+
+static void suspend_calculate_checksums(void)
+{
+ struct pbe2 pbe;
+ int i = 0, page_index = 0, whichpagedir = 1;
+ unsigned long * current_checksum_page = first_checksum_page;
+
+ if (!first_checksum_page) {
+ prepare_status(1, 0, "Unable to checksum at this point.");
+ return;
+ }
+
+ prepare_status(1, 0, "Calculating checksums... ");
+ //printk("First checksum page is %p.\n", current_checksum_page);
+
+ get_first_pbe(&pbe, &pagedir1);
+
+ do {
+ //printk("Page number %d... Orig address %p.", i, pbe.origaddress);
+ *(current_checksum_page + page_index) =
+ suspend_page_checksum(pbe.origaddress);
+ //printk("Checksum calculated as %lx.\n", *(current_checksum_page + page_index));
+ i++;
+ page_index++;
+ if (page_index == CHECKSUMS_PER_PAGE) {
+ page_index = 0;
+ current_checksum_page = (unsigned long *)
+ NEXT_CHECKSUM_PAGE(current_checksum_page);
+ //printk("Moving to new checksum page %p.\n", current_checksum_page);
+ }
+ if (whichpagedir == 1) {
+ if (pagedir1.pageset_size == i) {
+ //if (test_suspend_state(SUSPEND_PAGESET2_NOT_LOADED))
+ goto out;
+ get_first_pbe(&pbe, &pagedir2);
+ whichpagedir = 2;
+ i = 0;
+ }
+ } else {
+ if (pagedir2.pageset_size == i)
+ goto out;
+ }
+ get_next_pbe(&pbe);
+ } while(1);
+
+out:
+ prepare_status(1, 0, "Checksums done.");
+}
+
+void suspend_check_checksums(void)
+{
+ struct pbe2 pbe;
+ int i = 0, page_index = 0, whichpagedir = 1, num_differences = 0;
+ unsigned long * current_checksum_page = first_checksum_page;
+ unsigned long sum_now;
+ struct reload_data * next_reload_data = first_reload_data;
+
+ if (!first_checksum_page) {
+ prepare_status(1, 0, "Unable to checksum at this point.");
+ return;
+ }
+
+ //prepare_status(1, 0, "Checking checksums... ");
+
+ get_first_pbe(&pbe, &pagedir1);
+
+ do {
+ /* Also ignore the page containing our variables */
+ if (PageChecksumIgnore(pbe.origaddress) || (pbe.origaddress == virt_to_page(&i)))
+ goto skip;
+
+ sum_now = suspend_page_checksum(pbe.origaddress);
+ if (sum_now != *(current_checksum_page + page_index)) {
+ num_differences++;
+ if (next_reload_data) {
+ char * virt;
+ next_reload_data->pageset = whichpagedir;
+ next_reload_data->pagenumber = i;
+ next_reload_data->page_address = pbe.origaddress;
+ virt = kmap_atomic(pbe.origaddress, KM_USER0);
+ memcpy(next_reload_data->compared_version,
+ virt, PAGE_SIZE);
+ kunmap_atomic(virt, KM_USER0);
+ next_reload_data = next_reload_data->next;
+ }
+ }
+skip:
+ i++;
+ page_index++;
+ if (page_index == CHECKSUMS_PER_PAGE) {
+ page_index = 0;
+ current_checksum_page = (unsigned long *)
+ NEXT_CHECKSUM_PAGE(current_checksum_page);
+ }
+ if (whichpagedir == 1) {
+ if (pagedir1.pageset_size == i) {
+ //if (test_suspend_state(SUSPEND_PAGESET2_NOT_LOADED))
+ goto out;
+ get_first_pbe(&pbe, &pagedir2);
+ whichpagedir = 2;
+ i = 0;
+ }
+ } else {
+ if (pagedir2.pageset_size == i)
+ goto out;
+ }
+ get_next_pbe(&pbe);
+
+ } while(1);
+
+out:
+ //printk("%d/%d different.\n", num_differences, i);
+ //prepare_status(1, 0, "Differencing done.");
+ return;
+}
+
+/*
+ * free_reload_data.
+ *
+ * Reload data begins on a page boundary.
+ */
+static void suspend_free_reload_data(void)
+{
+ struct reload_data * this_data = first_reload_data;
+ struct reload_data *prev_reload_data = this_data;
+
+ while (this_data) {
+ if (this_data->compared_version) {
+ ClearPageNosave(virt_to_page(this_data->compared_version));
+ free_pages((unsigned long) this_data->compared_version, 0);
+ }
+
+ if (this_data->base_version) {
+ ClearPageNosave(virt_to_page(this_data->base_version));
+ free_pages((unsigned long) this_data->base_version, 0);
+ }
+
+ this_data = this_data->next;
+
+ if (!(((unsigned long) this_data) & ~PAGE_MASK)) {
+ //printk("Linking %p to %p.\n", prev_reload_data, this_data);
+ prev_reload_data->next = this_data;
+ prev_reload_data = this_data;
+ }
+ }
+
+ this_data = first_reload_data;
+ while (this_data) {
+ prev_reload_data = this_data;
+ this_data = this_data->next;
+ //printk("Freeing reload page %p.\n", prev_reload_data);
+ free_pages((unsigned long) prev_reload_data, 0);
+ num_reload_pages--;
+ }
+
+ first_reload_data = last_reload_data = NULL;
+
+}
+
+/* suspend_reread_pages()
+ *
+ * Description: Reread pages from an image for diagnosing differences.
+ * Arguments: page_list: A list containing information on pages
+ * to be reloaded, sorted by pageset and
+ * page index.
+ * Returns: Zero on success or -1 on failure.
+ */
+
+static int suspend_reread_pages(struct reload_data * page_list)
+{
+ int result = 0, whichtoread;
+ long i;
+ struct pbe2 pbe;
+ struct list_head *filter;
+ struct suspend_plugin_ops * this_filter, * first_filter = get_next_filter(NULL);
+
+ if (!page_list)
+ return 0;
+
+ PRINTFREEMEM("at start of read pageset");
+
+ for (whichtoread = page_list->pageset; whichtoread <= 2; whichtoread++) {
+ struct pagedir * pagedir;
+
+ switch (whichtoread) {
+ case 1:
+ pagedir = &pagedir1;
+ break;
+ case 2:
+ pagedir = &pagedir2;
+ break;
+ default:
+ goto out;
+ }
+
+ suspend_message(SUSPEND_IO, SUSPEND_LOW, 0,
+ "Reread pages from pagedir %d.\n", whichtoread);
+
+ /* Initialise page transformers */
+ list_for_each(filter, &suspend_filters) {
+ this_filter = list_entry(filter, struct suspend_plugin_ops,
+ ops.filter.filter_list);
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.read_init &&
+ this_filter->ops.filter.read_init(whichtoread)) {
+ abort_suspend("Failed to initialise a filter.");
+ return 1;
+ }
+ }
+
+ /* Initialise writer */
+ if (active_writer->ops.writer.read_init(whichtoread)) {
+ abort_suspend("Failed to initialise the writer.");
+ result = 1;
+ goto reread_free_buffers;
+ }
+
+ get_first_pbe(&pbe, pagedir);
+
+ /* Read the pages */
+ for (i=0; i< pagedir->pageset_size; i++) {
+ /* Read */
+ result = first_filter->ops.filter.read_chunk(
+ virt_to_page(page_list->base_version),
+ SUSPEND_SYNC);
+
+ if (result) {
+ abort_suspend("Failed to read a chunk of the image.");
+ goto reread_free_buffers;
+ }
+
+ /* Interactivity*/
+ check_shift_keys(0, NULL);
+
+ /* Prepare next */
+ get_next_pbe(&pbe);
+
+ /* Got the one we're after? */
+ if (i == page_list->pagenumber)
+ page_list = page_list->next;
+
+ if (page_list->pageset != whichtoread)
+ break;
+ }
+
+reread_free_buffers:
+
+ /* Cleanup reads from this pageset. */
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.read_cleanup &&
+ this_filter->ops.filter.read_cleanup()) {
+ abort_suspend("Failed to cleanup a filter.");
+ result = 1;
+ }
+ }
+
+ if (active_writer->ops.writer.read_cleanup()) {
+ abort_suspend("Failed to cleanup the writer.");
+ result = 1;
+ }
+ }
+out:
+ printk("\n");
+
+ return result;
+}
+static void suspend_free_checksum_pages(void)
+{
+ unsigned long * next_checksum_page;
+
+ while(first_checksum_page) {
+ next_checksum_page =
+ (unsigned long *) NEXT_CHECKSUM_PAGE(first_checksum_page);
+ free_pages((unsigned long) first_checksum_page, 0);
+ first_checksum_page = next_checksum_page;
+ }
+ last_checksum_page = NULL;
+ checksum_pages = 0;
+ suspend_store_free_mem(SUSPEND_FREE_CHECKSUM_PAGES, 1);
+}
+
+#define PRINTABLE(a) (((a) < 32 || (a) > 122) ? '.' : (a))
+extern int PageRangePage(char * seeking);
+
+static void local_print_location(
+ unsigned char * real,
+ unsigned char * original,
+ unsigned char * resumetime)
+{
+ int i;
+
+ for (i = 0; i < 8; i++)
+ if (*(original + i) != *(resumetime + i))
+ break;
+ if (i == 8)
+ return;
+
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, "%p", real);
+ if (PageNosave(virt_to_page(real)))
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ " [NoSave]");
+ if (PageRangePage(real))
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ " [RangePage]");
+ if (PageSlab(virt_to_page(real)))
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ " [Slab]");
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, "\n");
+
+#ifdef CONFIG_KDB
+ for (i = 0; i < 8; i++) {
+ static const char *last_sym = NULL;
+ if (*(original + i) != *(resumetime + i)) {
+ kdb_symtab_t symtab;
+
+ kdbnearsym((unsigned long) real + i,
+ &symtab);
+
+ if ((!symtab.sym_name) ||
+ (symtab.sym_name == last_sym))
+ continue;
+
+ last_sym = symtab.sym_name;
+
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_LOW, 1,
+ "%p = %s\n",
+ symtab.sym_start,
+ symtab.sym_name);
+ }
+ }
+#endif
+
+ for (i = 0; i < 8; i++)
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ "%2x ", *(original + i));
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, " ");
+ for (i = 0; i < 8; i++)
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ "%c", PRINTABLE(*(original + i)));
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, " ");
+
+ for (i = 0; i < 8; i++)
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ "%2x ", *(resumetime + i));
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, " ");
+ for (i = 0; i < 8; i++)
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1,
+ "%c", PRINTABLE(*(resumetime + i)));
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, "\n\n");
+}
+
+static int suspend_allocate_reload_data(int pages)
+{
+ struct reload_data * this_data;
+ unsigned long data_start;
+ int i;
+
+ if (num_reload_pages >= pages)
+ return 0;
+
+ for (i = 1; i <= pages; i++) {
+ data_start = suspend2_get_grabbed_pages(0);
+
+ if (!data_start)
+ return -ENOMEM;
+
+ SetPageChecksumIgnore(virt_to_page(data_start));
+ this_data = (struct reload_data *) data_start;
+ num_reload_pages++;
+
+ while (data_start ==
+ ((((unsigned long) (this_data + 1)) - 1) & PAGE_MASK)) {
+ struct page * page;
+ unsigned long virt;
+
+ virt = suspend2_get_grabbed_pages(0);
+ if (!virt) {
+ printk("Couldn't get a page in which to store "
+ "a changed page.\n");
+ return -ENOMEM;
+ }
+ page = virt_to_page(virt);
+
+ this_data->compared_version = (char *) virt;
+ SetPageNosave(page);
+ SetPageChecksumIgnore(page);
+
+ virt = suspend2_get_grabbed_pages(0);
+ if (!virt) {
+ printk("Couldn't get a page in which to store "
+ "a baseline page.\n");
+ return -ENOMEM;
+ }
+ page = virt_to_page(virt);
+
+ this_data->base_version = (char *) virt;
+ SetPageNosave(page);
+ SetPageChecksumIgnore(page);
+
+ if (last_reload_data)
+ last_reload_data->next = this_data;
+ else
+ first_reload_data = this_data;
+
+ last_reload_data = this_data;
+
+ this_data++;
+ }
+
+ check_shift_keys(0, NULL);
+ }
+
+ return 0;
+}
+
+static void suspend_print_differences(void)
+{
+ struct reload_data * this_data = first_reload_data;
+ int i;
+
+ suspend_reread_pages(first_reload_data);
+
+ if (get_rangepages_list())
+ return;
+
+ while (this_data) {
+ if (this_data->pageset &&
+ this_data->pagenumber) {
+ suspend_message(SUSPEND_INTEGRITY, SUSPEND_MEDIUM, 1,
+ "Pagedir %d. Page %d. Address %p."
+ " Base %p. Copy %p.\n",
+ this_data->pageset,
+ this_data->pagenumber,
+ page_address(this_data->page_address),
+ this_data->base_version,
+ this_data->compared_version);
+ for (i= 0; i < (PAGE_SIZE / 8); i++) {
+ local_print_location(
+ page_address(this_data->page_address) + i * 8,
+ this_data->base_version + i * 8,
+ this_data->compared_version + i * 8);
+ check_shift_keys(0, NULL);
+ }
+ check_shift_keys(1, NULL);
+ } else
+ return;
+ this_data = this_data->next;
+ }
+
+ put_rangepages_list();
+}
+
+int __suspend_allocate_checksum_pages(void)
+{
+ int pages_required =
+ (pageset1_size + pageset2_size) / CHECKSUMS_PER_PAGE;
+ unsigned long this_page;
+
+ while (checksum_pages <= pages_required) {
+ this_page = suspend2_get_grabbed_pages(0);
+ if (!this_page)
+ return -ENOMEM;
+
+ if (!first_checksum_page)
+ first_checksum_page =
+ (unsigned long *) this_page;
+ else
+ NEXT_CHECKSUM_PAGE(last_checksum_page) = this_page;
+
+ last_checksum_page = (unsigned long *) this_page;
+ SetPageChecksumIgnore(virt_to_page(this_page));
+ checksum_pages++;
+ }
+ suspend_store_free_mem(SUSPEND_FREE_CHECKSUM_PAGES, 0);
+
+ return suspend_allocate_reload_data(2);
+}
+
+static int suspend_checksum_init(void)
+{
+ if (allocate_local_pageflags(&checksum_map, 0))
+ return 1;
+ PRINTFREEMEM("after allocating checksum map");
+ suspend_store_free_mem(5, 0);
+ return 0;
+}
+
+
+static void suspend_checksum_cleanup(void)
+{
+ suspend_free_reload_data();
+ suspend_free_checksum_pages();
+
+ free_local_pageflags(&checksum_map);
+ PRINTFREEMEM("after freeing checksum map");
+ suspend_store_free_mem(SUSPEND_FREE_CHECKSUM_MAP, 1);
+}
+
+static struct suspend_plugin_ops checksum_ops =
+{
+ .name = "Checksum",
+ .type = CHECKSUM_PLUGIN,
+ .initialise = suspend_checksum_init,
+ .cleanup = suspend_checksum_cleanup,
+ .ops = {
+ .checksum = {
+ .calculate_checksums = suspend_calculate_checksums,
+ .check_checksums = suspend_check_checksums,
+ .print_differences = suspend_print_differences,
+ .allocate_pages = __suspend_allocate_checksum_pages,
+ }
+ }
+};
+
+static struct suspend_proc_data proc_params[] = {
+ { .filename = "disable_checksumming",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &checksum_ops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ }
+ },
+};
+
+static __init int checksum_load(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+ int result;
+
+ if (!(result = suspend_register_plugin(&checksum_ops))) {
+ printk("Software Suspend Checksum Module\n");
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&proc_params[i]);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void checksum_unload(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ printk("Software Suspend Checksum module unloading.\n");
+
+ for (i=0; i< numfiles; i++)
+ suspend_unregister_procfile(&proc_params[i]);
+ suspend_unregister_plugin(&checksum_ops);
+}
+
+module_init(checksum_load);
+module_exit(checksum_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Suspend2 checksum module");
+#else
+late_initcall(checksum_load);
+#endif


2004-11-24 14:09:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge

On Wed, Nov 24, 2004 at 11:56:35PM +1100, Nigel Cunningham wrote:
> Hi everyone.
>
> I know that I still have work to do on suspend2, but thought it was high
> time I got around to properly submitting the code for review, so here
> goes.
>
> I have it split up into 51 patches, of which most are less than 20k,
> although there are three 50k patches. Changes to the rest of the kernel
> tree come first, then the core. The full tree can be found at

Your way of merging looks rather wrong. Please submit changes against the
current swsusp code that introduce one feature after another to bring it
at the level you want. You'll surely have to rewrok it a lot until all
reviewers are happy.

And most importantly for each patch explain exactly what feature it
implements and why, etc.. "swsusp2" tells exactly nothing about the
changed you do.

2004-11-24 14:55:59

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 28/51: Suspend memory pool hooks.

We save the image in two pages (LRU and the rest). In order to maintain
a consistent image, we satisfy all page allocations from our own memory
pool while saving the image and reloading the LRU. This allows us to
safely use high level routines which might allocate slab etc and not
free it again by the time we do our atomic copy. We simply save all of
the pages in the pool when making our atomic copy of the non-LRU pages,
without having to worry about exactly how they were or weren't used.

diff -ruN 815-add-suspend-memory-pool-hooks-old/mm/page_alloc.c 815-add-suspend-memory-pool-hooks-new/mm/page_alloc.c
--- 815-add-suspend-memory-pool-hooks-old/mm/page_alloc.c 2004-11-06 09:26:49.168250960 +1100
+++ 815-add-suspend-memory-pool-hooks-new/mm/page_alloc.c 2004-11-04 16:27:41.000000000 +1100
@@ -277,6 +277,11 @@

arch_free_page(page, order);

+ if (unlikely(test_suspend_state(SUSPEND_USE_MEMORY_POOL))) {
+ suspend2_free_pool_pages(page, order);
+ return;
+ }
+
mod_page_state(pgfree, 1 << order);
for (i = 0 ; i < (1 << order) ; ++i)
free_pages_check(__FUNCTION__, page + i);
@@ -507,6 +512,11 @@

arch_free_page(page, 0);

+ if (unlikely(test_suspend_state(SUSPEND_USE_MEMORY_POOL))) {
+ suspend2_free_pool_pages(page, 0);
+ return;
+ }
+
kernel_map_pages(page, 1, 0);
inc_page_state(pgfree);
if (PageAnon(page))
@@ -609,6 +619,20 @@
int do_retry;
int can_try_harder;

+ if (unlikely(test_suspend_state(SUSPEND_USE_MEMORY_POOL))) {
+ /*
+ * When pool enabled, processes get allocations
+ * from a special pool so the image size doesn't
+ * vary (all the pages in the pool are saved,
+ * used or not).
+ *
+ * The only process that should be running is
+ * suspend, so the demand should be very
+ * predicatable.
+ */
+ return suspend2_get_pool_pages(gfp_mask, order);
+ }
+
might_sleep_if(wait);

/*


2004-11-24 14:58:19

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 43/51: Utility functions.

These are the routines that I think could possibly be useful elsewhere
too.

- A snprintf routine that returns the number of bytes actually put into
the buffer, not the number that would have been put in if the buffer was
big enough.
- Routine for finding a proc dir entry (we use it to find /proc/splash
when)
- Support routines for dynamically allocated pageflags. Save those
precious bits!

diff -ruN 834-utility-old/kernel/power/utility.c 834-utility-new/kernel/power/utility.c
--- 834-utility-old/kernel/power/utility.c 1970-01-01 10:00:00.000000000 +1000
+++ 834-utility-new/kernel/power/utility.c 2004-11-04 16:27:41.000000000 +1100
@@ -0,0 +1,152 @@
+/*
+ * kernel/power/utility.c
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Routines that only suspend uses at the moment, but which might move
+ * when we merge because they're generic.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/proc_fs.h>
+#include <asm/string.h>
+
+#include "pageflags.h"
+
+extern int suspend_snprintf(char * buffer, int buffer_size, const char *fmt, ...);
+extern struct proc_dir_entry * find_proc_dir_entry(const char *name, struct proc_dir_entry *parent);
+
+/*
+ * suspend_snprintf
+ *
+ * Functionality : Print a string with parameters to a buffer of a
+ * limited size. Unlike vsnprintf, we return the number
+ * of bytes actually put in the buffer, not the number
+ * that would have been put in if it was big enough.
+ */
+int suspend_snprintf(char * buffer, int buffer_size, const char *fmt, ...)
+{
+ int result;
+ va_list args;
+
+ if (!buffer_size) {
+ return 0;
+ }
+
+ va_start(args, fmt);
+ result = vsnprintf(buffer, buffer_size, fmt, args);
+ va_end(args);
+
+ if (result > buffer_size) {
+ return buffer_size;
+ }
+
+ return result;
+}
+
+/*
+ * find_proc_dir_entry.
+ *
+ * Based on remove_proc_entry.
+ * This will go shortly, once user space utilities
+ * are updated to look at /proc/suspend/all_settings.
+ */
+
+struct proc_dir_entry * find_proc_dir_entry(const char *name, struct proc_dir_entry *parent)
+{
+ struct proc_dir_entry **p;
+ int len;
+
+ len = strlen(name);
+ for (p = &parent->subdir; *p; p=&(*p)->next ) {
+ if (proc_match(len, name, *p)) {
+ return *p;
+ }
+ }
+ return NULL;
+}
+
+/* ------------- Dynamically Allocated Page Flags --------------- */
+
+#define BITS_PER_PAGE (PAGE_SIZE * 8)
+#define PAGES_PER_BITMAP ((max_mapnr + BITS_PER_PAGE - 1) / BITS_PER_PAGE)
+#define BITMAP_ORDER (get_bitmask_order((PAGES_PER_BITMAP) - 1))
+
+/* clear_map
+ *
+ * Description: Clear an array used to store local page flags.
+ * Arguments: unsigned long *: The pagemap to be cleared.
+ */
+
+void clear_map(unsigned long * pagemap)
+{
+ int size = (1 << BITMAP_ORDER) * PAGE_SIZE;
+
+ memset(pagemap, 0, size);
+}
+
+/* allocate_local_pageflags
+ *
+ * Description: Allocate a bitmap for local page flags.
+ * Arguments: unsigned long **: Pointer to the bitmap.
+ * int: Whether to set nosave flags for the
+ * newly allocated pages.
+ * Note: This looks suboptimal, but remember that we might be allocating
+ * the Nosave bitmap here.
+ */
+int allocate_local_pageflags(unsigned long ** pagemap, int setnosave)
+{
+ unsigned long * check;
+ int i;
+ if (*pagemap) {
+ printk("Error. Local pageflags map already allocated.\n");
+ clear_map(*pagemap);
+ } else {
+ check = (unsigned long *) __get_free_pages(GFP_ATOMIC,
+ BITMAP_ORDER);
+ if (!check) {
+ printk("Error. Unable to allocate memory for local page flags.");
+ return 1;
+ }
+ clear_map(check);
+ *pagemap = check;
+ if (setnosave) {
+ struct page * firstpage =
+ virt_to_page((unsigned long) check);
+ for (i = 0; i < (1 << BITMAP_ORDER); i++)
+ SetPageNosave(firstpage + i);
+ }
+ }
+ return 0;
+}
+
+/* freemap
+ *
+ * Description: Free a local pageflags bitmap.
+ * Arguments: unsigned long **: Pointer to the bitmap being freed.
+ * Note: Map being freed might be Nosave.
+ */
+int free_local_pageflags(unsigned long ** pagemap)
+{
+ int i;
+ if (!*pagemap)
+ return 1;
+ else {
+ struct page * firstpage =
+ virt_to_page((unsigned long) *pagemap);
+ for (i = 0; i < (1 << BITMAP_ORDER); i++)
+ ClearPageNosave(firstpage + i);
+ free_pages((unsigned long) *pagemap, BITMAP_ORDER);
+ *pagemap = NULL;
+ return 0;
+ }
+}
+
+EXPORT_SYMBOL(suspend_snprintf);
+EXPORT_SYMBOL(allocate_local_pageflags);
+EXPORT_SYMBOL(free_local_pageflags);
+EXPORT_SYMBOL(find_proc_dir_entry);


2004-11-24 14:09:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

On Wed, Nov 24, 2004 at 11:59:02PM +1100, Nigel Cunningham wrote:
> Here we add simple hooks so that the user can interact with suspend
> while it is running. (Hmm. The serial console condition could be
> simplified :>). The hooks allow you to do such things as:
>
> - cancel suspending
> - change the amount of detail of debugging info shown
> - change what debugging info is shown
> - pause the process
> - single step
> - toggle rebooting instead of powering down

And why would we want this? If the users calls the suspend call
he surely wants to suspend, right?

After all we don't have inkernel hooks to allow a user to read instead
write after calling sys_write.

2004-11-24 14:03:46

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 15/51: Disable pdflush during suspend.

Here we disable pdflush once we've finished syncing data to disk. We
might be writing to a swap file, and don't want to corrupt the user's
disk by writing invalid data to their superblock.


diff -ruN 504-disable-pdflush-during-suspend-old/mm/page-writeback.c 504-disable-pdflush-during-suspend-new/mm/page-writeback.c
--- 504-disable-pdflush-during-suspend-old/mm/page-writeback.c 2004-11-03 21:54:16.000000000 +1100
+++ 504-disable-pdflush-during-suspend-new/mm/page-writeback.c 2004-11-04 16:27:40.000000000 +1100
@@ -29,6 +29,7 @@
#include <linux/sysctl.h>
#include <linux/cpu.h>
#include <linux/syscalls.h>
+#include <linux/suspend.h>

/*
* The maximum number of pages to writeout in a single bdflush/kupdate
@@ -369,6 +370,12 @@
.for_kupdate = 1,
};

+ if (test_suspend_state(SUSPEND_DISABLE_SYNCING)) {
+ start_jif = jiffies;
+ next_jif = start_jif + (dirty_writeback_centisecs * HZ) / 100;
+ goto out;
+ }
+
sync_supers();

get_writeback_state(&wbs);
@@ -389,6 +396,8 @@
}
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
}
+
+out:
if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ;
if (dirty_writeback_centisecs)


2004-11-24 13:49:47

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 37/51: Memory pool support.

This is the memory pool support. It handles all pages freed and
allocated between the preparation of the image and the completion of
resuming, except prior to restoring the original kernel at resume time.
It is designed for speed and to match the fact that suspend2 just about
exclusively uses order 0 allocations. ("Just about" is why a couple of
order one and two allocations are also available).

diff -ruN 827-memory-pool-old/kernel/power/memory_pool.c 827-memory-pool-new/kernel/power/memory_pool.c
--- 827-memory-pool-old/kernel/power/memory_pool.c 1970-01-01 10:00:00.000000000 +1000
+++ 827-memory-pool-new/kernel/power/memory_pool.c 2004-11-16 22:17:29.000000000 +1100
@@ -0,0 +1,378 @@
+/*
+ * kernel/power/memory_pool.c
+ *
+ * Copyright (C) 2003,2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * It contains routines for managing the memory pool during software suspend
+ * operation.
+ *
+ * The memory pool is a pool of pages from which page allocations
+ * are satisfied while we are suspending, and into which freed pages are
+ * released. In this way, we can keep the image size static and consistent
+ * while still using normal I/O routines to save the image and while saving
+ * the image in two parts.
+ *
+ * During suspend, almost all of the page allocations are order zero. Provision
+ * is made for one order one and one order two allocation. This provision is
+ * utilised by the swapwriter for allocating memory which is used for structures
+ * containing header page. (It could be made to use order zero allocations; this
+ * just hasn't been done yet).
+ */
+
+#define SUSPEND_MEMORY_POOL_C
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/highmem.h>
+
+#include "suspend.h"
+#include "plugins.h"
+#include "pageflags.h"
+
+/* We keep high memory pages that are freed, but don't use them */
+struct memory_pool {
+ struct list_head contents[MAX_ORDER];
+ int level[MAX_ORDER];
+};
+
+static struct memory_pool normal_pool, highmem_pool;
+
+static int suspend_pool_level_limit[MAX_ORDER];
+static spinlock_t suspend_memory_pool_lock = SPIN_LOCK_UNLOCKED;
+
+static int min_pool_level = 0;
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+/* display_memory_pool_pages()
+ *
+ * Description: Display the current contents of the memory pool.
+ */
+static void __display_memory_pool_pages(struct memory_pool * pool)
+{
+ int order;
+
+ for (order = 0; order < MAX_ORDER; order++) {
+ struct page * page;
+ int index = 0;
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1,
+ "- Order %d:\n", order);
+ list_for_each_entry(page, &pool->contents[order], lru) {
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1,
+ "[%p] ", page);
+ index++;
+ if (!(index%8))
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1,
+ "\n");
+ }
+
+ if (pool->level[order])
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1,
+ "(%d entries)\n", pool->level[order]);
+ else
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1,
+ "(empty)\n");
+ }
+}
+
+void display_memory_pool_pages(void)
+{
+ if (!TEST_DEBUG_STATE(SUSPEND_MEM_POOL))
+ return;
+
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1, "Memory pool:\n");
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1, "Normal pages:\n");
+ __display_memory_pool_pages(&normal_pool);
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1, "High pages:\n");
+ __display_memory_pool_pages(&highmem_pool);
+}
+#else
+#define display_memory_pool_pages() do { } while(0)
+#endif
+
+__init void initialise_pool(struct memory_pool * pool)
+{
+ int i;
+
+ for (i = 0; i < MAX_ORDER; i++) {
+ pool->level[i] = 0;
+ INIT_LIST_HEAD(&pool->contents[i]);
+ }
+
+ suspend_pool_level_limit[1] = 1;
+ suspend_pool_level_limit[2] = 1;
+}
+
+__init void suspend_memory_pool_init(void)
+{
+ /* Initialise lists */
+ initialise_pool(&normal_pool);
+ initialise_pool(&highmem_pool);
+}
+
+/* get_from_pool()
+ *
+ * Description: Remove head of a pool list
+ */
+
+static struct page * get_from_pool(struct memory_pool * pool, int order)
+{
+ struct page * page;
+ int j;
+
+ if (!pool->level[order])
+ return 0;
+
+ page = list_entry(pool->contents[order].next, struct page, lru);
+ list_del_init(&page->lru);
+ pool->level[order]--;
+
+ for (j = 0; j < (1 << order); j++)
+ ClearPageChecksumIgnore(page + j);
+
+ if (page_count(page) != 1)
+ printk("Error getting page %p from memory pool. "
+ "Page count is %d (should be 1).\n",
+ page,
+ page_count(page));
+
+ BUG_ON(PageLRU(page) || PageActive(page));
+
+ display_memory_pool_pages();
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_MEDIUM, 0,
+ "\r%4d %4d %4d.",
+ normal_pool.level[0],
+ normal_pool.level[1],
+ normal_pool.level[2]);
+
+ return page;
+}
+
+/* add_to_pool()
+ *
+ * Description: Insert new head in a pool list
+ */
+
+static void add_to_pool(struct memory_pool * pool, int order, struct page * this)
+{
+ int j;
+
+ pool->level[order]++;
+ list_add(&this->lru, &pool->contents[order]);
+ for (j = 0; j < (1 << order); j++)
+ SetPageChecksumIgnore(this + j);
+
+ display_memory_pool_pages();
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_MEDIUM, 0,
+ "\r%4d %4d %4d.",
+ normal_pool.level[0],
+ normal_pool.level[1],
+ normal_pool.level[2]);
+}
+
+/* suspend_memory_pool_level()
+ *
+ * Description: Returns the number of pages currently in the pool.
+ * Returns: Int. Number of pages in the pool.
+ */
+int suspend_memory_pool_level(int only_lowmem)
+{
+ int order, sum = 0;
+
+ for (order = 0; order < MAX_ORDER; order++)
+ sum += normal_pool.level[order] * (1 << order);
+
+ if (!only_lowmem)
+ for (order = 0; order < MAX_ORDER; order++)
+ sum += highmem_pool.level[order] * (1 << order);
+ return sum;
+}
+
+/* fill_suspend_memory_pool()
+ *
+ * Description: Fill the memory pool from the main free memory pool in the
+ * first instance, or grabbed pages if that fails.
+ * We allocate @sizesought order 0 pages, plus 1 each
+ * of the higher order allocations.
+ * Arguments: int. Number of order zero pages requested.
+ * Returns: int. Number of order zero pages obtained.
+ */
+int fill_suspend_memory_pool(int sizesought)
+{
+ int i = 0, order, orig_state =
+ test_suspend_state(SUSPEND_USE_MEMORY_POOL);
+ unsigned long *this = NULL;
+ unsigned long flags;
+
+ spin_lock_irqsave(&suspend_memory_pool_lock, flags);
+
+ /* Pools must not be active for this to work */
+ clear_suspend_state(SUSPEND_USE_MEMORY_POOL);
+
+ suspend_pool_level_limit[0] = sizesought;
+
+ for (order = MAX_ORDER; order >= 0; order--) {
+ int wanted = suspend_pool_level_limit[order] -
+ normal_pool.level[order];
+ for (i = normal_pool.level[order];
+ i < suspend_pool_level_limit[order]; i++) {
+ this = (unsigned long *) get_grabbed_pages(order);
+ if (!this) {
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_ERROR, 1,
+ "%d order %d pages wanted for suspend "
+ "memory pool, got %d.\n",
+ wanted, order, i - 1);
+ break;
+ }
+ add_to_pool(&normal_pool, order, virt_to_page(this));
+ }
+ }
+
+ if (orig_state)
+ set_suspend_state(SUSPEND_USE_MEMORY_POOL);
+
+ min_pool_level = normal_pool.level[0];
+
+ spin_unlock_irqrestore(&suspend_memory_pool_lock, flags);
+
+ return 0;
+}
+
+/* empty_suspend_memory_pool()
+ *
+ * Description: Drain our memory pool.
+ */
+void __empty_suspend_memory_pool(struct memory_pool * pool)
+{
+ int order;
+ struct page * this;
+
+ for (order = 0; order < MAX_ORDER; order++)
+ while ((this = get_from_pool(pool, order)))
+ __free_pages(this, order);
+
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_LOW, 1,
+ "Min pool level was %d/%d.\n", min_pool_level, suspend_pool_level_limit[0]);
+}
+
+void empty_suspend_memory_pool(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&suspend_memory_pool_lock, flags);
+
+ display_memory_pool_pages();
+
+ /* Pool must not be active for this to work */
+ clear_suspend_state(SUSPEND_USE_MEMORY_POOL);
+
+ __empty_suspend_memory_pool(&normal_pool);
+ __empty_suspend_memory_pool(&highmem_pool);
+
+ spin_unlock_irqrestore(&suspend_memory_pool_lock, flags);
+}
+
+/* get_suspend_pool_pages()
+ *
+ * Description: Our equivalent to __alloc_pages (minus zone mask).
+ * May be called from interrupt context.
+ * Arguments: unsigned int: Mask. We really only care about __GFP_WAIT.
+ * We're giving normal zone pages regardless.
+ * order: The number of pages (1 << order) wanted.
+ * Returns: struct page *: Pointer (possibly NULL) to pages allocated.
+ */
+struct page * get_suspend_pool_pages(unsigned int gfp_mask, unsigned int order)
+{
+ unsigned long flags;
+ struct page * page;
+
+ if (order > 0) {
+ spin_lock_irqsave(&suspend_memory_pool_lock, flags);
+ if (!normal_pool.level[order]) {
+ printk("No order %d allocation available.\n",
+ order);
+ display_memory_pool_pages();
+ spin_unlock_irqrestore(
+ &suspend_memory_pool_lock,
+ flags);
+ return NULL;
+ }
+ goto check_and_return;
+ }
+
+try_again:
+ if ((!normal_pool.level[order]) && (!(gfp_mask & __GFP_WAIT))) {
+ spin_lock_irqsave(&suspend_memory_pool_lock, flags);
+ display_memory_pool_pages();
+ spin_unlock_irqrestore(&suspend_memory_pool_lock, flags);
+ return NULL;
+ }
+
+ while(!normal_pool.level[order]) {
+ if (active_writer->ops.writer.wait_on_io)
+ active_writer->ops.writer.wait_on_io(0);
+ schedule();
+ }
+
+ spin_lock_irqsave(&suspend_memory_pool_lock, flags);
+ if (!normal_pool.level[order]) {
+ spin_unlock_irqrestore(&suspend_memory_pool_lock, flags);
+ goto try_again;
+ }
+check_and_return:
+ page = get_from_pool(&normal_pool, order);
+
+ if (normal_pool.level[0] < min_pool_level)
+ min_pool_level = normal_pool.level[0];
+ if (!normal_pool.level[0])
+ printk("Normal pool empty.\n");
+
+ spin_unlock_irqrestore(&suspend_memory_pool_lock, flags);
+
+ return page;
+}
+
+/* free_suspend_pool_pages()
+ *
+ * Description: Our equivalent to __free_pages. Put freed pages into the pool.
+ * HighMem pages do still get freed to the normal pool because they
+ * aren't going to affect the consistency of our image - worse case,
+ * we write a few free pages.
+ * Arguments: Struct page *: First page to be freed.
+ * Unsigned int: Size of allocation being freed.
+ */
+void free_suspend_pool_pages(struct page *page, unsigned int order)
+{
+ unsigned long flags;
+ int i;
+ struct memory_pool * pool = &normal_pool;
+
+ suspend_message(SUSPEND_MEM_POOL, SUSPEND_VERBOSE, 1,
+ "Freeing page %p (%p), order %d.\n",
+ page_address(page), page, order);
+
+ if (PageHighMem(page))
+ pool = &highmem_pool;
+
+#ifdef CONFIG_MMU
+ set_page_count(page, 1);
+#else
+ for (i = 0; i < (1 << order); i++)
+ set_page_count(page + i, 1);
+#endif
+ if (pool == &normal_pool) {
+ char * address = page_address(page);
+ for (i = 0; i < (1 << order); i++) {
+ clear_page(address);
+ address += PAGE_SIZE;
+ }
+ }
+
+ spin_lock_irqsave(&suspend_memory_pool_lock, flags);
+ add_to_pool(pool, order, page);
+ spin_unlock_irqrestore(&suspend_memory_pool_lock, flags);
+ return;
+}
+
+EXPORT_SYMBOL(suspend_memory_pool_level);


2004-11-24 13:49:44

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 42/51: Suspend.c

Here's the heart of the core :> (No, that's not a typo).

- Device suspend/resume calls
- Power down
- Highest level routine
- all_settings proc entry handling


diff -ruN 832-suspend-old/kernel/power/suspend.c 832-suspend-new/kernel/power/suspend.c
--- 832-suspend-old/kernel/power/suspend.c 1970-01-01 10:00:00.000000000 +1000
+++ 832-suspend-new/kernel/power/suspend.c 2004-11-21 20:00:10.000000000 +1100
@@ -0,0 +1,2019 @@
+/*
+ * kernel/power/suspend2.c
+ *
+ * Copyright (C) 1998-2001 Gabor Kuti <[email protected]>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <[email protected]>
+ * Copyright (C) 2002-2003 Florent Chabaud <[email protected]>
+ * Copyright (C) 2002-2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file is to realize architecture-independent
+ * machine suspend feature using pretty near only high-level routines
+ *
+ * We'd like to thank the following people for their work:
+ *
+ * Pavel Machek <[email protected]>:
+ * Modifications, defectiveness pointing, being with Gabor at the very beginning,
+ * suspend to swap space, stop all tasks. Port to 2.4.18-ac and 2.5.17.
+ *
+ * Steve Doddi <[email protected]>:
+ * Support the possibility of hardware state restoring.
+ *
+ * Raph <[email protected]>:
+ * Support for preserving states of network devices and virtual console
+ * (including X and svgatextmode)
+ *
+ * Kurt Garloff <[email protected]>:
+ * Straightened the critical function in order to prevent compilers from
+ * playing tricks with local variables.
+ *
+ * Andreas Mohr <[email protected]>
+ *
+ * Alex Badea <[email protected]>:
+ * Fixed runaway init
+ *
+ * Jeff Snyder <[email protected]>
+ * ACPI patch
+ *
+ * Nathan Friess <[email protected]>
+ * Some patches.
+ *
+ * Michael Frank <[email protected]>
+ * Extensive testing and help with improving stability.
+ *
+ * Variable definitions which are needed if PM is enabled but
+ * SOFTWARE_SUSPEND is disabled are found near the top of process.c.
+ */
+
+#define SUSPEND_MAIN_C
+//#define DEBUG_DEVICE_TREE
+
+#include <linux/suspend.h>
+#include <linux/reboot.h>
+#include <linux/module.h>
+#include <linux/console.h>
+#include <linux/version.h>
+#include <linux/device.h>
+#include <linux/highmem.h>
+#include <asm/uaccess.h>
+
+#include "suspend.h"
+#include "block_io.h"
+#include "plugins.h"
+#include "proc.h"
+#include "pageflags.h"
+
+#ifdef CONFIG_X86
+#include <asm/i387.h> /* for kernel_fpu_end */
+#endif
+
+static unsigned long suspend_powerdown_method = 5; /* S5 = off */
+static int suspend_acpi_state_used = 0;
+
+static u32 pm_disk_mode_save;
+struct partial_device_tree * suspend_device_tree;
+EXPORT_SYMBOL(suspend_device_tree);
+
+#ifdef CONFIG_SMP
+static void ensure_on_processor_zero(void)
+{
+ set_cpus_allowed(current, cpumask_of_cpu(0));
+ BUG_ON(smp_processor_id() != 0);
+}
+#else
+#define ensure_on_processor_zero() do { } while(0)
+#endif
+
+#ifdef CONFIG_ACPI
+extern u32 acpi_leave_sleep_state (u8 sleep_state);
+#endif
+
+#ifdef DEBUG_DEVICE_TREE
+#include "../../drivers/base/power/power.h"
+#include <linux/device.h>
+
+void display_device_tree(struct partial_device_tree * tree, char * header)
+{
+ struct device * dev;
+
+ if (header)
+ printk(header);
+ printk(" === Tree %p ===\n\n", tree);
+
+ printk(" -- Active\n");
+ list_for_each_entry(dev, &tree->dpm_active, power.entry) {
+ printk(" %p->%p", dev, dev->parent);
+ printk(" %s\n", dev->kobj.k_name ? dev->kobj.k_name : "<null>");
+ }
+ printk("\n -- DPM Off\n");
+ list_for_each_entry(dev, &tree->dpm_off, power.entry) {
+ printk(" %p->%p", dev, dev->parent);
+ printk(" %s\n", dev->kobj.k_name ? dev->kobj.k_name : "<null>");
+ }
+
+ printk("\n -- DPM Off IRQ\n");
+ list_for_each_entry(dev, &tree->dpm_off_irq, power.entry) {
+ printk(" %p->%p", dev, dev->parent);
+ printk(" %s\n", dev->kobj.k_name ? dev->kobj.k_name : "<null>");
+ }
+ printk("\n--- Done ---\n");
+}
+#else
+#define display_device_tree(tree, header) do { } while(0)
+#endif
+
+/* suspend_drivers_resume
+ * @stage - One of...
+ */
+
+enum {
+ SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED,
+ SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED,
+ SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED,
+ SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED,
+ SUSPEND_DRIVERS_PRE_POWERDOWN,
+};
+
+void suspend_drivers_resume(int stage)
+{
+ switch (stage) {
+ case SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED:
+ BUG_ON(!irqs_disabled());
+ if (!TEST_ACTION_STATE(SUSPEND_DISABLE_SYSDEV_SUPPORT))
+ sysdev_resume();
+ dpm_power_up_tree(suspend_device_tree);
+ break;
+
+ case SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED:
+ BUG_ON(irqs_disabled());
+ display_device_tree(suspend_device_tree,
+ "suspend_drivers_resume stage 1.");
+ device_resume_tree(suspend_device_tree);
+ display_device_tree(suspend_device_tree,
+ "Post resume tree call.");
+ break;
+
+ case SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED:
+ BUG_ON(!irqs_disabled());
+ display_device_tree(&default_device_tree,
+ "suspend_drivers_resume stage 2.\n");
+ dpm_power_up_tree(&default_device_tree);
+ break;
+
+ case SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED:
+ BUG_ON(irqs_disabled());
+ device_resume_tree(&default_device_tree);
+ display_device_tree(&default_device_tree,
+ "Post power up.\n");
+#ifdef CONFIG_ACPI
+ if (suspend_acpi_state_used)
+ acpi_leave_sleep_state(suspend_acpi_state_used);
+#endif
+ display_device_tree(&default_device_tree,
+ "suspend_drivers_resume stage 3.\n");
+ device_resume_tree(&default_device_tree);
+ display_device_tree(&default_device_tree,
+ "Post resume default device tree.\n");
+#ifdef CONFIG_ACPI
+ if (suspend_acpi_state_used &&
+ pm_ops && pm_ops->finish)
+ pm_ops->finish(suspend_acpi_state_used);
+#endif
+ break;
+ }
+}
+
+/* suspend_drivers_suspend
+ * @stage - one of:
+ * 1: Power down drivers not used in writing the image
+ * 2: Quiesce drivers used in writing the image
+ * (prior to making atomic copy)
+ * 3: Power down drivers used in writing the image and
+ * enter the suspend mode if configured to do so.
+ *
+ */
+static int suspend_drivers_suspend(int stage)
+{
+ int result = 0;
+
+ switch (stage) {
+ case SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED:
+ BUG_ON(!irqs_disabled());
+ if (!result)
+ result = device_power_down_tree(
+ PM_SUSPEND_DISK, &default_device_tree);
+ display_device_tree(&default_device_tree,
+ "Post suspend power down device tree.\n");
+ break;
+
+ case SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED:
+ BUG_ON(irqs_disabled());
+ display_device_tree(&default_device_tree,
+ "suspend_drivers_suspend stage 1.\n");
+ result = device_suspend_tree(
+ PM_SUSPEND_DISK, &default_device_tree);
+ break;
+
+ case SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED:
+ BUG_ON(!irqs_disabled());
+ result = device_power_down_tree(PM_SUSPEND_DISK, suspend_device_tree);
+ if (!TEST_ACTION_STATE(SUSPEND_DISABLE_SYSDEV_SUPPORT))
+ sysdev_suspend(PM_SUSPEND_DISK);
+ break;
+
+ case SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED:
+ BUG_ON(irqs_disabled());
+ display_device_tree(suspend_device_tree,
+ "suspend_drivers_suspend stage 2.\n");
+ result = device_suspend_tree(
+ PM_SUSPEND_DISK, suspend_device_tree);
+ display_device_tree(suspend_device_tree,
+ "Post suspend device tree.\n");
+ break;
+
+ case SUSPEND_DRIVERS_PRE_POWERDOWN: /* Power down system */
+ BUG_ON(irqs_disabled());
+ display_device_tree(suspend_device_tree,
+ "suspend_drivers_suspend stage 3.\n");
+
+ result = device_suspend_tree(
+ PM_SUSPEND_DISK, suspend_device_tree);
+ break;
+ }
+ return result;
+}
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+void show_pcp_lists(void)
+{
+ int cpu, temperature;
+ struct zone *zone;
+
+ for_each_zone(zone) {
+ printk("%s per-cpu:", zone->name);
+
+ if (!zone->present_pages) {
+ printk(" empty\n");
+ continue;
+ } else
+ printk("\n");
+
+ for (cpu = 0; cpu < NR_CPUS; ++cpu) {
+ struct per_cpu_pageset *pageset;
+
+ if (!cpu_possible(cpu))
+ continue;
+
+ pageset = zone->pageset + cpu;
+
+ for (temperature = 0; temperature < 2; temperature++)
+ printk("cpu %d %s: low %d, high %d, batch %d, count %d.\n",
+ cpu,
+ temperature ? "cold" : "hot",
+ pageset->pcp[temperature].low,
+ pageset->pcp[temperature].high,
+ pageset->pcp[temperature].batch,
+ pageset->pcp[temperature].count);
+ }
+ }
+}
+#else
+#define show_pcp_lists() do { } while(0)
+#endif
+
+/* -------------------------------------------------------------------------- */
+
+static int suspend_version_specific_initialise(void)
+{
+ struct class * class;
+ struct class_device * class_dev;
+
+ suspend_save_avenrun();
+
+ PRINTFREEMEM("after draining local pages");
+ suspend_store_free_mem(SUSPEND_FREE_DRAIN_PCP, 0);
+
+ if (TEST_DEBUG_STATE(SUSPEND_FREEZER))
+ show_pcp_lists();
+
+ if (pm_ops) {
+ pm_disk_mode_save = pm_ops->pm_disk_mode;
+ pm_ops->pm_disk_mode = PM_DISK_PLATFORM;
+ }
+
+ BUG_ON(suspend_device_tree);
+ suspend_device_tree = device_create_tree();
+ if (IS_ERR(suspend_device_tree)) {
+ suspend_device_tree = NULL;
+ return -ENOMEM;
+ }
+
+ /* Now check for graphics class devices, so we can keep the display on while suspending */
+ class = class_find("graphics");
+ if (class) {
+ list_for_each_entry(class_dev, &class->children, node)
+ device_switch_trees(class_dev->dev, suspend_device_tree);
+ class_put(class);
+ }
+ return 0;
+}
+
+static void suspend_version_specific_cleanup(void)
+{
+ suspend_restore_avenrun();
+
+ if (pm_ops) pm_ops->pm_disk_mode = pm_disk_mode_save;
+
+ if (suspend_device_tree) {
+ device_merge_tree(suspend_device_tree, &default_device_tree);
+ device_destroy_tree(suspend_device_tree);
+ display_device_tree(&default_device_tree,
+ " ==== POST DEVICE MERGE TREE ====\n");
+ suspend_device_tree = NULL;
+ }
+}
+/* Variables to be preserved over suspend */
+int pageset1_sizelow = 0, pageset2_sizelow = 0;
+
+unsigned long orig_mem_free = 0;
+
+extern void do_suspend2_lowlevel(int resume);
+extern unsigned long header_storage_for_plugins(void);
+extern int suspend_initialise_plugin_lists(void);
+extern void suspend_relinquish_console(void);
+extern volatile int suspend_io_time[2][2];
+void empty_suspend_memory_pool(void);
+int read_primary_suspend_image(void);
+extern void display_nosave_pages(void);
+extern int num_writers;
+
+extern void suspend_console_proc_init(void);
+extern void suspend_console_proc_exit(void);
+extern int suspend2_prepare_console(void);
+extern void suspend2_cleanup_console(void);
+
+unsigned long * in_use_map = NULL;
+unsigned long * pageset2_map = NULL;
+unsigned long * checksum_map = NULL;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+unsigned long * unmap_map = NULL;
+#endif
+
+char * debug_info_buffer;
+
+int image_size_limit = 0;
+int max_async_ios = 128;
+
+/* Pagedir.c */
+extern void copy_pageset1(void);
+extern int allocate_local_pageflags(unsigned long ** pagemap, int setnosave);
+extern void free_pagedir(struct pagedir * p);
+extern int free_local_pageflags(unsigned long ** pagemap);
+
+/* Prepare_image.c */
+
+extern int prepare_image(void);
+
+unsigned long forced_ps1_size = 0, forced_ps2_size = 0;
+
+/* proc.c */
+extern int suspend_cleanup_proc(void);
+
+extern int suspend2_register_core(struct suspend2_core_ops * ops_pointer);
+extern void suspend2_unregister_core(void);
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+
+int suspend_free_mem_values[MAX_FREEMEM_SLOTS][2];
+/* These should match the enumerated type in suspend.h */
+static char * suspend_free_mem_descns[MAX_FREEMEM_SLOTS] = {
+ "Start/End ", /* 0 */
+ "Console Allocn ",
+ "Drain pcp ",
+ "InUse map ",
+ "PS2 map ",
+ "Checksum map ", /* 5 */
+ "Reload pages ",
+ "Init plugins ",
+ "Memory pool ",
+ "Freezer ",
+ "Eat Memory ", /* 10 */
+ "Syncing ",
+ "Grabbed Memory ",
+ "Range Pages ",
+ "Extra PD1 pages",
+ "Writer storage ", /* 15 */
+ "Header storage ",
+ "Checksum pages ",
+ "KStat data ",
+ "Debug Info ",
+ "Remove Image ", /* 20 */
+ "I/O ",
+ "I/O info ",
+ "Start one ",
+};
+
+/* store_free_mem
+ */
+
+
+void suspend_store_free_mem(int slot, int side)
+{
+ static int last_free_mem;
+ int this_free_mem = real_nr_free_pages() + suspend_amount_grabbed +
+ suspend_memory_pool_level(0);
+ int i;
+
+ BUG_ON(slot >= MAX_FREEMEM_SLOTS);
+
+ suspend_message(SUSPEND_MEMORY, SUSPEND_HIGH, 0,
+ "Last free mem was %d. Is now %d. ",
+ last_free_mem, this_free_mem);
+
+ if (slot == 0) {
+ if (!side)
+ for (i = 1; i < MAX_FREEMEM_SLOTS; i++) {
+ suspend_free_mem_values[i][0] = 0;
+ suspend_free_mem_values[i][1] = 0;
+ }
+ suspend_free_mem_values[slot][side] = this_free_mem;
+ } else
+ suspend_free_mem_values[slot][side] += this_free_mem - last_free_mem;
+ last_free_mem = this_free_mem;
+ suspend_message(SUSPEND_MEMORY, SUSPEND_HIGH, 0,
+ "%s value %d now %d.\n",
+ suspend_free_mem_descns[slot],
+ side,
+ suspend_free_mem_values[slot][side]);
+}
+
+/*
+ * display_free_mem
+ */
+static void display_free_mem(void)
+{
+ int i;
+
+
+ if (!TEST_DEBUG_STATE(SUSPEND_MEMORY))
+ return;
+
+ suspend_message(SUSPEND_MEMORY, SUSPEND_HIGH, 0,
+ "Start: %7d End: %7d.\n",
+ suspend_free_mem_values[0][0],
+ suspend_free_mem_values[0][1]);
+
+ for (i = 1; i < MAX_FREEMEM_SLOTS; i++)
+ if (suspend_free_mem_values[i][0] + suspend_free_mem_values[i][1])
+ suspend_message(SUSPEND_MEMORY, SUSPEND_HIGH, 0,
+ "%s %7d %7d.\n",
+ suspend_free_mem_descns[i],
+ suspend_free_mem_values[i][0],
+ suspend_free_mem_values[i][1]);
+}
+#endif
+
+/*
+ * save_image
+ * Result code (int): Zero on success, non zero on failure.
+ * Functionality : High level routine which performs the steps necessary
+ * to prepare and save the image after preparatory steps
+ * have been taken.
+ * Key Assumptions : Processes frozen, sufficient memory available, drivers
+ * suspended.
+ * Called from : do_suspend2_suspend_2
+ */
+extern struct pageset_sizes_result recalculate_stats(void);
+extern int write_pageset(struct pagedir * pagedir, int whichtowrite);
+extern int write_image_header(void);
+extern int read_secondary_pagedir(int overwrittenpagesonly);
+
+static int save_image(void)
+{
+ int temp_result;
+
+ if (RAM_TO_SUSPEND > max_mapnr) {
+ prepare_status(1, 1,
+ "Couldn't get enough free pages, on %ld pages short",
+ RAM_TO_SUSPEND - max_mapnr);
+ goto abort_saving;
+ }
+
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ " - Final values: %d and %d.\n",
+ pageset1_size,
+ pageset2_size);
+
+ /* Suspend devices we're not going to use in writing the image */
+ if (active_writer && active_writer->dpm_set_devices)
+ active_writer->dpm_set_devices();
+ suspend_drivers_suspend(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED);
+ local_irq_disable();
+ suspend_drivers_suspend(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED);
+ local_irq_enable();
+
+ check_shift_keys(1, "About to write pagedir2.");
+
+ temp_result = write_pageset(&pagedir2, 2);
+
+ check_shift_keys(1, "About to copy pageset 1.");
+
+ if (temp_result == -1 || TEST_RESULT_STATE(SUSPEND_ABORTED))
+ goto abort_saving;
+
+ prepare_status(1, 0, "Doing atomic copy...");
+
+ do_suspend2_lowlevel(0);
+
+ return 0;
+abort_saving:
+ suspend_drivers_resume(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED);
+ local_irq_disable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED);
+ local_irq_enable();
+
+ return -1;
+}
+
+int save_image_part1(void)
+{
+ int temp_result;
+
+ suspend_map_atomic_copy_pages();
+
+ suspend_checksum_calculate_checksums();
+
+ BUG_ON(!irqs_disabled());
+
+ if (!TEST_ACTION_STATE(SUSPEND_TEST_FILTER_SPEED))
+ copy_pageset1();
+
+ /*
+ * ---- FROM HERE ON, NEED TO REREAD PAGESET2 IF ABORTING!!! -----
+ *
+ */
+
+ suspend_unmap_atomic_copy_pages();
+
+ /*
+ * Other processors have waited for me to make the atomic copy of the
+ * kernel
+ */
+
+ smp_continue();
+
+#ifdef CONFIG_X86
+ kernel_fpu_end();
+#endif
+
+#ifdef CONFIG_PREEMPT
+ preempt_enable_no_resched();
+#endif
+
+ suspend_drivers_resume(SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED);
+
+ local_irq_enable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED);
+
+ update_status(pageset2_size, pageset1_size + pageset2_size, NULL);
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED))
+ goto abort_reloading_pagedir_two;
+
+ check_shift_keys(1, "About to write pageset1.");
+
+ /*
+ * End of critical section.
+ */
+
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ "-- Writing pageset1\n");
+
+ temp_result = write_pageset(&pagedir1, 1);
+
+ if (TEST_ACTION_STATE(SUSPEND_TEST_FILTER_SPEED)) {
+ /* We didn't overwrite any memory, so no reread needs to be done. */
+ return -1;
+ }
+
+ if (temp_result == -1 || TEST_RESULT_STATE(SUSPEND_ABORTED))
+ goto abort_reloading_pagedir_two;
+
+ check_shift_keys(1, "About to write header.");
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED))
+ goto abort_reloading_pagedir_two;
+
+ temp_result = write_image_header();
+
+ if (temp_result || (TEST_RESULT_STATE(SUSPEND_ABORTED)))
+ goto abort_reloading_pagedir_two;
+
+ check_shift_keys(1, "About to power down or reboot.");
+
+ return 0;
+
+abort_reloading_pagedir_two:
+ temp_result = read_secondary_pagedir(1);
+
+ /* If that failed, we're sunk. Panic! */
+ if (temp_result)
+ panic("Attempt to reload pagedir 2 while aborting "
+ "a suspend failed.");
+
+ return -1;
+
+}
+
+static void suspend_power_off(void)
+{
+ sys_reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2,
+ LINUX_REBOOT_CMD_POWER_OFF, NULL);
+
+ prepare_status(1, 0, "Probably not capable for powerdown.");
+ while (1)
+ cpu_relax();
+ /* NOTREACHED */
+}
+
+static void suspend_enter_acpi_state(u32 state)
+{
+ suspend_acpi_state_used = state;
+
+ if (pm_ops && pm_ops->prepare) {
+ if (!pm_ops->prepare(state)) {
+ if (pm_ops && pm_ops->enter)
+ pm_ops->enter(state);
+ else
+ printk("Failed to enter state.\n");
+ } else
+ printk("Prepare ops failed.\n");
+ } else
+ printk("No prepare ops.\n");
+}
+
+/*
+ * suspend_power_down
+ * Functionality : Powers down or reboots the computer once the image
+ * has been written to disk.
+ * Key Assumptions : Able to reboot/power down via code called or that
+ * the warning emitted if the calls fail will be visible
+ * to the user (ie printk resumes devices).
+ * Called From : do_suspend2_suspend_2
+ */
+
+extern asmlinkage long sys_reboot(int magic1, int magic2, unsigned int cmd,
+ void * arg);
+extern void apm_power_off(void);
+
+void suspend_power_down(void)
+{
+ if (TEST_ACTION_STATE(SUSPEND_REBOOT)) {
+ prepare_status(1, 0, "Ready to reboot.");
+ sys_reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2,
+ LINUX_REBOOT_CMD_RESTART, NULL);
+ }
+
+ suspend_drivers_suspend(SUSPEND_DRIVERS_PRE_POWERDOWN);
+
+ if (suspend_powerdown_method == 3) {
+ prepare_status(1, 0, "Seeking to suspend-to-ram.");
+ suspend_enter_acpi_state(PM_SUSPEND_MEM);
+ prepare_status(1, 0, "Entering S3 failed. Using normal powerdown.");
+ } else if (suspend_powerdown_method == 4) {
+ prepare_status(1, 0, "Seeking to enter ACPI suspend-to-disk state.");
+ suspend_enter_acpi_state(PM_SUSPEND_DISK);
+ prepare_status(1, 0, "Entering S4 failed. Using normal powerdown.");
+ } else
+ prepare_status(1, 0, "Powering down.");
+
+ /*
+ * FIXME: At resume, we'll still think we used S4 if we tried it.
+ * Does it matter?
+ */
+ suspend_acpi_state_used = 0;
+ suspend_power_off();
+}
+
+/*
+ * do_suspend2_resume_1
+ * Functionality : Preparatory steps for copying the original kernel back.
+ * Called From : do_suspend2_lowlevel
+ */
+
+static void do_suspend2_resume_1(void)
+{
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ name_suspend "About to copy pageset1 back...\n");
+
+ if (active_writer && active_writer->dpm_set_devices)
+ active_writer->dpm_set_devices();
+
+ suspend_drivers_suspend(SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED);
+ suspend_drivers_suspend(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED);
+ local_irq_disable(); /* irqs might have been re-enabled on us */
+ suspend_drivers_suspend(SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED);
+ local_irq_disable();
+ suspend_drivers_suspend(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED);
+ local_irq_enable();
+
+ suspend_map_atomic_copy_pages();
+
+ /* Get other cpus ready to restore their original contexts */
+ smp_suspend();
+
+ local_irq_disable();
+
+#ifdef CONFIG_PREEMPT
+ preempt_disable();
+#endif
+
+ barrier();
+ mb();
+
+ MDELAY(2000);
+}
+
+/*
+ * do_suspend2_resume_2
+ * Functionality : Steps taken after copying back the original kernel at
+ * resume.
+ * Key Assumptions : Will be able to read back secondary pagedir (if
+ * applicable).
+ * Called From : do_suspend2_lowlevel
+ */
+
+static void do_suspend2_resume_2(void)
+{
+ set_suspend_state(SUSPEND_NOW_RESUMING);
+ set_suspend_state(SUSPEND_PAGESET2_NOT_LOADED);
+
+ suspend_unmap_atomic_copy_pages();
+
+#ifdef CONFIG_PREEMPT
+ preempt_enable();
+#endif
+
+ local_irq_disable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED);
+ local_irq_enable();
+
+ suspend_drivers_resume(SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED);
+
+ suspend_post_restore_redraw();
+
+ check_shift_keys(1, "About to reload secondary pagedir.");
+
+ read_secondary_pagedir(0);
+ clear_suspend_state(SUSPEND_PAGESET2_NOT_LOADED);
+
+ suspend_checksum_print_differences();
+
+ prepare_status(0, 0, "Cleaning up...");
+
+ clear_suspend_state(SUSPEND_USE_MEMORY_POOL);
+
+ local_irq_disable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED);
+ local_irq_enable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED);
+}
+
+/*
+ * do_suspend2_suspend_1
+ * Functionality : Steps taken prior to saving CPU state and the image
+ * itself.
+ * Called From : do_suspend2_lowlevel
+ */
+
+static void do_suspend2_suspend_1(void)
+{
+ /* Save other cpu contexts */
+ smp_suspend();
+
+ suspend_drivers_suspend(SUSPEND_DRIVERS_USED_DEVICES_IRQS_ENABLED);
+
+ mb();
+ barrier();
+
+#ifdef CONFIG_PREEMPT
+ preempt_disable();
+#endif
+ local_irq_disable();
+ suspend_drivers_suspend(SUSPEND_DRIVERS_USED_DEVICES_IRQS_DISABLED);
+}
+
+/*
+ * do_suspend2_suspend_2
+ * Functionality : Steps taken after saving CPU state to save the
+ * image and powerdown/reboot or recover on failure.
+ * Key Assumptions : save_image returns zero on success; otherwise we need to
+ * clean up and exit. The state on exiting this routine
+ * should be essentially the same as if we have suspended,
+ * resumed and reached the end of do_suspend2_resume_2.
+ * Called From : do_suspend2_lowlevel
+ */
+static void do_suspend2_suspend_2(void)
+{
+ if (!save_image_part1())
+ suspend_power_down();
+
+ if (!TEST_RESULT_STATE(SUSPEND_ABORT_REQUESTED) &&
+ !TEST_ACTION_STATE(SUSPEND_TEST_FILTER_SPEED) &&
+ suspend_powerdown_method != 3)
+ printk(KERN_EMERG name_suspend
+ "Suspend failed, trying to recover...\n");
+ MDELAY(1000);
+
+ local_irq_disable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_DISABLED);
+ local_irq_enable();
+ suspend_drivers_resume(SUSPEND_DRIVERS_UNUSED_DEVICES_IRQS_ENABLED);
+
+ barrier();
+ mb();
+}
+
+static inline void lru_check_page(struct page * page)
+{
+ if (!PageLRU(page))
+ printk("Page %p/%p in inactivelist but not marked LRU.\n",
+ page, page_address(page));
+}
+
+/* get_debug_info
+ * Functionality: Store debug info in a buffer.
+ * Called from: suspend_try_suspend.
+ */
+
+#define SNPRINTF(a...) len += suspend_snprintf(debug_info_buffer + len, \
+ PAGE_SIZE - len - 1, ## a)
+
+static int get_suspend_debug_info(void)
+{
+ int len = 0;
+ if (!debug_info_buffer) {
+ debug_info_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ if (!debug_info_buffer) {
+ printk("Error! Unable to allocate buffer for"
+ "software suspend debug info.\n");
+ return 0;
+ }
+ }
+
+ SNPRINTF("Please include the following information in bug reports:\n");
+ SNPRINTF("- SUSPEND core : %s\n", SUSPEND_CORE_VERSION);
+ SNPRINTF("- Kernel Version : %s\n", UTS_RELEASE);
+ SNPRINTF("- Compiler vers. : %d.%d\n", __GNUC__, __GNUC_MINOR__);
+#ifdef CONFIG_MODULES
+ SNPRINTF("- Modules loaded : ");
+ {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
+ struct module *this_mod;
+ extern struct module *module_list;
+ this_mod = module_list;
+ while (this_mod) {
+ if (this_mod->name)
+ SNPRINTF("%s ", this_mod->name);
+ this_mod = this_mod->next;
+ }
+#else
+ extern int print_module_list_to_buffer(char * buffer, int size);
+ len+= print_module_list_to_buffer(debug_info_buffer + len,
+ PAGE_SIZE - len - 1);
+#endif
+ }
+ SNPRINTF("\n");
+#else
+ SNPRINTF("- No module support.\n");
+#endif
+ SNPRINTF("- Attempt number : %d\n", nr_suspends);
+ if (num_range_pages)
+ SNPRINTF("- Pageset sizes : %d (%d low) and %d (%d low).\n",
+ pagedir1.lastpageset_size,
+ pageset1_sizelow,
+ pagedir2.lastpageset_size,
+ pageset2_sizelow);
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ SNPRINTF("- Parameters : %ld %ld %ld %d %d %d %ld\n",
+ suspend_result,
+ suspend_action,
+ suspend_debug_state,
+ suspend_default_console_level,
+ image_size_limit,
+ max_async_ios,
+ suspend_powerdown_method);
+#else
+ SNPRINTF("- Parameters : %ld %ld %d %d %ld\n",
+ suspend_result,
+ suspend_action,
+ image_size_limit,
+ max_async_ios,
+ suspend_powerdown_method);
+#endif
+ if (num_range_pages)
+ SNPRINTF("- Calculations : Image size: %lu. "
+ "Ram to suspend: %ld.\n",
+ STORAGE_NEEDED(1), RAM_TO_SUSPEND);
+ SNPRINTF("- Limits : %lu pages RAM. Initial boot: %lu.\n",
+ max_mapnr, orig_mem_free);
+ SNPRINTF("- Overall expected compression percentage: %d.\n",
+ 100 - expected_compression_ratio());
+ len+= print_plugin_debug_info(debug_info_buffer + len,
+ PAGE_SIZE - len - 1);
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ SNPRINTF("- Debugging compiled in.\n");
+#endif
+#ifdef CONFIG_PREEMPT
+ SNPRINTF("- Preemptive kernel.\n");
+#endif
+#ifdef CONFIG_SMP
+ SNPRINTF("- SMP kernel.\n");
+#endif
+#ifdef CONFIG_HIGHMEM
+ SNPRINTF("- Highmem Support.\n");
+#endif
+ if (num_range_pages)
+ SNPRINTF("- Max ranges used: %d ranges in %d pages.\n",
+ max_ranges_used, num_range_pages);
+ if (suspend_io_time[0][1]) {
+ SNPRINTF("- I/O speed: Write %d MB/s",
+ (MB((unsigned long) suspend_io_time[0][0]) * HZ /
+ suspend_io_time[0][1]));
+ if (suspend_io_time[1][1])
+ SNPRINTF(", Read %d MB/s",
+ (MB((unsigned long) suspend_io_time[1][0]) * HZ /
+ suspend_io_time[1][1]));
+ SNPRINTF(".\n");
+ }
+ else if (num_range_pages)
+ SNPRINTF("- Suspend cancelled. No I/O speed stats.\n");
+
+ return len;
+}
+
+extern int PageInPagedir(struct pagedir * p, struct page * page);
+static unsigned long display_metadata_page;
+
+static char * state_string[3] = { "Source", "Destination", "Dest/Allocd" };
+
+void __display_metadata_state(int pagedir, int state)
+{
+ int i;
+
+ if (!state)
+ return;
+
+ printk("[ Pagedir %d:", pagedir);
+ for (i=0; i < 3; i++)
+ if (state & (1 << i))
+ printk("%s ", state_string[i]);
+ printk("]");
+}
+
+void display_metadata_state(struct page * page)
+{
+ __display_metadata_state(1, PageInPagedir(&pagedir1, page));
+ __display_metadata_state(2, PageInPagedir(&pagedir2, page));
+ if (PageNosave(page))
+ printk("[ NoSave ]");
+ if (PageReserved(page))
+ printk("[ Reserved ]");
+ if (PageHighMem(page))
+ printk("[ Highmem ]");
+}
+
+void proc_display_metadata_state(void)
+{
+ printk("Page number %lu. Struct page at %p, virt address %p:",
+ display_metadata_page,
+ mem_map + display_metadata_page,
+ page_address(mem_map + display_metadata_page));
+ display_metadata_state(mem_map + display_metadata_page);
+ printk("\n");
+}
+
+/*
+ * debuginfo_read_proc
+ * Functionality : Displays information that may be helpful in debugging
+ * software suspend.
+ */
+int debuginfo_read_proc(char * page, char ** start, off_t off, int count,
+ int *eof, void *data)
+{
+ int info_len, copy_len;
+
+ initialise_suspend_plugins();
+ info_len = get_suspend_debug_info();
+ cleanup_suspend_plugins();
+
+ copy_len = min(info_len - (int) off, count);
+ if (copy_len < 0)
+ copy_len = 0;
+
+ if (copy_len) {
+ memcpy(page, debug_info_buffer + off, copy_len);
+ *start = page;
+ }
+
+ if (copy_len + off == info_len)
+ *eof = 1;
+
+ free_pages((unsigned long) debug_info_buffer, 0);
+ debug_info_buffer = NULL;
+ return copy_len;
+}
+
+static int get_suspend_debug_info(void);
+
+static int allocate_bitmaps(void)
+{
+ suspend_message(SUSPEND_MEMORY, SUSPEND_VERBOSE, 1,
+ "Allocating in_use_map\n");
+ if (allocate_local_pageflags(&in_use_map, 1))
+ return 1;
+
+ suspend_store_free_mem(SUSPEND_FREE_IN_USE_MAP, 0);
+ PRINTFREEMEM("after allocating in_use_map");
+
+ if (allocate_local_pageflags(&pageset2_map, 1))
+ return 1;
+
+ suspend_store_free_mem(SUSPEND_FREE_PS2_MAP, 0);
+ PRINTFREEMEM("after allocating pageset2 map");
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ if (allocate_local_pageflags(&unmap_map, 1))
+ return 1;
+
+ suspend_store_free_mem(SUSPEND_FREE_UNMAP_MAP, 0);
+ PRINTFREEMEM("after allocating unmap map");
+#endif
+
+ suspend_store_free_mem(4, 0);
+
+ return 0;
+}
+
+static void free_metadata(void)
+{
+ put_rangepages_list();
+
+ free_ranges();
+ suspend_store_free_mem(SUSPEND_FREE_RANGE_PAGES, 1);
+ PRINTFREEMEM("after freeing ranges");
+
+ free_local_pageflags(&pageset2_map);
+ PRINTFREEMEM("after freeing pageset2 map");
+ suspend_store_free_mem(SUSPEND_FREE_PS2_MAP, 1);
+
+ free_local_pageflags(&in_use_map);
+ PRINTFREEMEM("after freeing inuse map");
+ suspend_store_free_mem(SUSPEND_FREE_IN_USE_MAP, 1);
+}
+
+/*
+ * software_suspend_pending
+ * Functionality : First level of code for software suspend invocations.
+ * Stores and restores load averages (to avoid a spike),
+ * allocates bitmaps, freezes processes and eats memory
+ * as required before suspending drivers and invoking
+ * the 'low level' code to save the state to disk.
+ * By the time we return from do_suspend2_lowlevel, we
+ * have either failed to save the image or successfully
+ * suspended and reloaded the image. The difference can
+ * be discerned by checking SUSPEND_ABORTED.
+ * Called From :
+ */
+
+extern void free_pageset_size_bloat(void);
+
+void do_activate(void)
+{
+ int i;
+
+ /* Suspend always runs on processor 0 */
+ ensure_on_processor_zero();
+
+ display_nosave_pages();
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_KEEP_IMAGE
+ if (TEST_RESULT_STATE(SUSPEND_KEPT_IMAGE)) {
+ if (TEST_ACTION_STATE(SUSPEND_KEEP_IMAGE)) {
+ printk("Image already stored:"
+ " powering down immediately.");
+ suspend_power_down();
+ return; /* It might now, but just in case we're using S3*/
+ } else {
+ printk("Invalidating previous image.\n");
+ active_writer->ops.writer.invalidate_image();
+ }
+ }
+#endif
+
+ printk(name_suspend "Initiating a software suspend cycle.\n");
+ set_suspend_state(SUSPEND_RUNNING);
+
+ max_ranges_used = 0;
+ nr_suspends++;
+ clear_suspend_state(SUSPEND_NOW_RESUMING);
+
+ suspend_io_time[0][0] = suspend_io_time[0][1] = suspend_io_time[1][0] =
+ suspend_io_time[1][1] = 0;
+
+ PRINTFREEMEM("at start of do_activate");
+ suspend_store_free_mem(SUSPEND_FREE_BASE, 0);
+
+ suspend2_prepare_console();
+
+ free_metadata(); /* We might have kept it */
+
+ if (suspend_version_specific_initialise())
+ goto out;
+
+ if (allocate_bitmaps())
+ goto out;
+
+ PRINTFREEMEM("after allocating bitmaps");
+
+ display_nosave_pages();
+
+ set_chain_names(&pagedir1);
+ set_chain_names(&pagedir2);
+
+ if (initialise_suspend_plugins())
+ goto out;
+
+ PRINTFREEMEM("after initialising plugins");
+ suspend_store_free_mem(SUSPEND_FREE_INIT_PLUGINS, 0);
+
+ /* Free up memory if necessary */
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_VERBOSE, 1,
+ "Preparing image.\n");
+ if (prepare_image() || TEST_RESULT_STATE(SUSPEND_ABORTED))
+ goto out;
+
+ PRINTFREEMEM("after preparing image");
+
+ if (TEST_ACTION_STATE(SUSPEND_FREEZER_TEST))
+ goto out;
+
+ display_nosave_pages();
+
+ if (!TEST_RESULT_STATE(SUSPEND_ABORTED)) {
+ prepare_status(1, 0, "Starting to save the image..");
+ save_image();
+ }
+
+out:
+ free_pageset_size_bloat();
+
+ PRINTFREEMEM("at 'out'");
+
+ i = get_suspend_debug_info();
+
+ suspend_store_free_mem(SUSPEND_FREE_DEBUG_INFO, 0);
+
+ clear_suspend_state(SUSPEND_USE_MEMORY_POOL);
+
+ free_pagedir(&pagedir2);
+ PRINTFREEMEM("after freeing pagedir 1");
+ free_pagedir(&pagedir1);
+ PRINTFREEMEM("after freeing pagedir 2");
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_KEEP_IMAGE
+ if (TEST_ACTION_STATE(SUSPEND_KEEP_IMAGE) &&
+ !TEST_ACTION_STATE(SUSPEND_ABORTED)) {
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ name_suspend "Not invalidating the image due "
+ "to Keep Image being enabled.\n");
+ SET_RESULT_STATE(SUSPEND_KEPT_IMAGE);
+ } else
+#endif
+ active_writer->ops.writer.invalidate_image();
+
+ empty_suspend_memory_pool();
+ PRINTFREEMEM("after freeing memory pool");
+ suspend_store_free_mem(SUSPEND_FREE_MEM_POOL, 1);
+
+ if (!TEST_ACTION_STATE(SUSPEND_KEEP_METADATA))
+ free_metadata();
+
+#ifdef CONFIG_DEBUG_PAGE_ALLOC
+ free_local_pageflags(&unmap_map);
+ PRINTFREEMEM("after freeing unmap map");
+ suspend_store_free_mem(SUSPEND_UNMAP_MAP, 1);
+#endif
+
+ if (debug_info_buffer) {
+ /* Printk can only handle 1023 bytes, including
+ * its level mangling. */
+ for (i = 0; i < 3; i++)
+ printk("%s", debug_info_buffer + (1023 * i));
+ free_pages((unsigned long) debug_info_buffer, 0);
+ debug_info_buffer = NULL;
+ }
+
+ PRINTFREEMEM("after freeing debug info buffer");
+ suspend_store_free_mem(SUSPEND_FREE_DEBUG_INFO, 1);
+
+ cleanup_suspend_plugins();
+
+ PRINTFREEMEM("after cleaning up suspend plugins");
+ suspend_store_free_mem(SUSPEND_FREE_INIT_PLUGINS, 1);
+
+ suspend_version_specific_cleanup();
+
+ display_nosave_pages();
+
+ thaw_processes(FREEZER_ALL_THREADS);
+
+ PRINTFREEMEM("after thawing processes");
+ suspend_store_free_mem(SUSPEND_FREE_FREEZER, 1);
+
+ suspend_store_free_mem(SUSPEND_FREE_BASE, 1);
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ display_free_mem();
+#endif
+
+ clear_suspend_state(SUSPEND_RUNNING);
+ PRINTFREEMEM("at end of do_activate");
+#ifdef CONFIG_PREEMPT
+#endif
+ suspend2_cleanup_console();
+
+}
+
+int attempt_to_parse_resume_device(void)
+{
+ struct list_head *writer;
+ struct suspend_plugin_ops * this_writer;
+ int result = 0;
+ mm_segment_t oldfs;
+
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+
+ active_writer = NULL;
+ clear_suspend_state(SUSPEND_RESUME_DEVICE_OK);
+ set_suspend_state(SUSPEND_DISABLED);
+
+ if (!num_writers) {
+ printk(name_suspend "No writers have been registered.\n");
+ goto out;
+ }
+
+ if (!resume2_file[0]) {
+ result = -EINVAL;
+ goto out;
+ }
+
+ list_for_each(writer, &suspend_writers) {
+ this_writer = list_entry(writer, struct suspend_plugin_ops,
+ ops.writer.writer_list);
+
+ /*
+ * Not sure why you'd want to disable a writer, but
+ * we should honour the flag if we're providing it
+ */
+ if (this_writer->disabled) {
+ printk(name_suspend
+ "Writer '%s' is disabled. Ignoring it.\n",
+ this_writer->name);
+ continue;
+ }
+
+ result = this_writer->ops.writer.parse_image_location(
+ resume2_file, (num_writers == 1));
+
+ switch (result) {
+ case -EINVAL:
+ /*
+ * For this writer, but not a valid
+ * configuration
+ */
+
+ printk(name_suspend
+ "Not able to successfully parse this "
+ "resume device. Suspending disabled.\n");
+ goto out;
+
+ case 0:
+ /*
+ * For this writer and valid.
+ */
+
+ active_writer = this_writer;
+
+ /* We may not have any filters compiled in */
+
+ set_suspend_state(SUSPEND_RESUME_DEVICE_OK);
+ clear_suspend_state(SUSPEND_DISABLED);
+ printk(name_suspend "Suspending enabled.\n");
+ goto out;
+
+ case 1:
+ /*
+ * Not for this writer. Try the next one.
+ */
+
+ break;
+ }
+ }
+ printk(name_suspend "No matching writer found. Suspending disabled.\n");
+ result = -EINVAL;
+out:
+ clear_suspend_state(SUSPEND_RUNNING);
+ set_fs(oldfs);
+ return result;
+}
+
+/*
+ *
+ */
+
+static void __suspend2_verify_checksums(void)
+{
+ if (checksum_plugin)
+ checksum_plugin->ops.checksum.check_checksums();
+}
+
+#define ALL_SETTINGS_VERSION 2
+
+/*
+ * suspend_write_compat_proc.
+ *
+ * This entry allows all of the settings to be set at once.
+ * It was originally for compatibility with pre- /proc/suspend
+ * versions, but has been retained because it makes saving and
+ * restoring the configuration simpler.
+ */
+static int suspend_write_compat_proc(struct file *file, const char * buffer,
+ unsigned long count, void * data)
+{
+ char * buf1 = (char *) get_zeroed_page(GFP_ATOMIC), *curbuf, *lastbuf;
+ char * buf2 = (char *) get_zeroed_page(GFP_ATOMIC);
+ int i, file_offset = 0, used_size = 0, reparse_resume_device = 0;
+ unsigned long nextval;
+ struct suspend_plugin_ops * plugin;
+ struct plugin_header * plugin_header = NULL;
+
+ if ((!buf1) || (!buf2))
+ return -ENOMEM;
+
+ while (file_offset < count) {
+ int length = count - file_offset;
+ if (length > (PAGE_SIZE - used_size))
+ length = PAGE_SIZE - used_size;
+
+ if (copy_from_user(buf1 + used_size, buffer + file_offset, length))
+ return -EFAULT;
+
+ curbuf = buf1;
+
+ if (!file_offset) {
+ /* Integers first */
+ for (i = 0; i < 8; i++) {
+ if (!*curbuf)
+ break;
+ lastbuf = curbuf;
+ nextval = simple_strtoul(curbuf, &curbuf, 0);
+ if (curbuf == lastbuf)
+ break;
+ switch (i) {
+ case 0:
+ if (nextval != ALL_SETTINGS_VERSION) {
+ printk("Error loading saved settings. This data is for version %ld, but kernel module uses format %d.\n",
+ nextval, ALL_SETTINGS_VERSION);
+ goto out;
+ }
+ case 1:
+ suspend_result = nextval;
+ break;
+ case 2:
+ suspend_action = nextval;
+ break;
+ case 3:
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ suspend_debug_state = nextval;
+#endif
+ break;
+ case 4:
+ suspend_default_console_level = nextval;
+#ifndef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ if (suspend_default_console_level > 1)
+ suspend_default_console_level = 1;
+#endif
+ break;
+ case 5:
+ image_size_limit = nextval;
+ break;
+ case 6:
+ max_async_ios = nextval;
+ if (max_async_ios > MAX_READAHEAD)
+ max_async_ios = MAX_READAHEAD;
+ if (max_async_ios < 1)
+ max_async_ios = 1;
+ break;
+ case 7:
+#ifdef CONFIG_ACPI
+ suspend_powerdown_method = nextval;
+ if (suspend_powerdown_method < 3)
+ suspend_powerdown_method = 3;
+ if (suspend_powerdown_method > 5)
+#endif
+ suspend_powerdown_method = 5;
+ break;
+ }
+
+ curbuf++;
+ while (*curbuf == ' ')
+ curbuf++;
+ }
+
+ if (count <= (curbuf - buf1))
+ goto out;
+ else {
+ list_for_each_entry(plugin, &suspend_plugins, plugin_list)
+ plugin->disabled = 1;
+ }
+ }
+
+ if (((unsigned long) curbuf & ~PAGE_MASK) + sizeof(plugin_header) > PAGE_SIZE)
+ goto shift_buffer;
+
+ /* Plugins */
+ plugin_header = (struct plugin_header *) curbuf;
+
+ if (((unsigned long) curbuf & ~PAGE_MASK) + sizeof(plugin_header) + plugin_header->data_length > PAGE_SIZE)
+ goto shift_buffer;
+
+ if (plugin_header->magic != 0xADEDC0DE) {
+ printk("Bad plugin data magic.\n");
+ break;
+ }
+
+ plugin = find_plugin_given_name(plugin_header->name);
+
+ if (plugin) { /* May validly have config saved for a plugin not now loaded */
+ if ((plugin->type == WRITER_PLUGIN) &&
+ ((!active_writer && plugin->disabled && !plugin_header->disabled) ||
+ (active_writer == plugin && plugin_header->disabled)))
+ reparse_resume_device = 1;
+ plugin->disabled = plugin_header->disabled;
+ if (plugin_header->data_length)
+ plugin->load_config_info(curbuf + sizeof(struct plugin_header),
+ plugin_header->data_length);
+ } else
+ printk("Data for plugin %s not used because not currently loaded.\n", plugin_header->name);
+
+ curbuf += sizeof(struct plugin_header) + plugin_header->data_length;
+
+shift_buffer:
+ if (!(curbuf - buf1))
+ break;
+
+ file_offset += curbuf - buf1;
+
+ used_size = PAGE_SIZE + buf1 - curbuf;
+ memcpy(buf2, curbuf, used_size);
+ memcpy(buf1, buf2, used_size);
+ }
+out:
+ free_pages((unsigned long) buf1, 0);
+ free_pages((unsigned long) buf2, 0);
+
+ if (reparse_resume_device) {
+ printk("Active writer disabled or no active writer and one or more just enabled. Reparsing resume device.\n");
+ attempt_to_parse_resume_device();
+ }
+
+ return count;
+}
+
+/*
+ * suspend_read_compat_proc.
+ *
+ * Like it's _write_ sibling, this entry allows all of the settings
+ * to be read at once.
+ * It too was originally for compatibility with pre- /proc/suspend
+ * versions, but has been retained because it makes saving and
+ * restoring the configuration simpler.
+ */
+static int suspend_read_compat_proc(char * page, char ** start, off_t off, int count,
+ int *eof, void *data)
+{
+ struct suspend_plugin_ops * this_plugin;
+ char * buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ int index = 1, file_pos = 0, page_offset = 0, len;
+ int copy_len = 0;
+ struct plugin_header plugin_header;
+
+ if (!buffer) {
+ printk("Failed to allocate a buffer for getting "
+ "plugin configuration info.\n");
+ return -ENOMEM;
+ }
+
+ plugin_header.magic = 0xADEDC0DE;
+
+ len = sprintf(buffer, "%d %ld %ld %ld %d %d %d %ld\n",
+ ALL_SETTINGS_VERSION,
+ suspend_result,
+ suspend_action,
+ suspend_debug_state,
+ suspend_default_console_level,
+ image_size_limit,
+ max_async_ios,
+ suspend_powerdown_method);
+
+ if (len >= off) {
+ copy_len = (len < off + count) ? len - off : count - off;
+ memcpy(page, buffer + off, copy_len);
+ page_offset+= copy_len;
+ }
+
+ file_pos += len;
+
+ /*
+ * We have to know which data goes with which plugin, so we at
+ * least write a length of zero for a plugin. Note that we are
+ * also assuming every plugin's config data takes <= PAGE_SIZE.
+ */
+
+ /* For each plugin (in registration order) */
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+
+ /* Get the data from the plugin */
+ if (this_plugin->save_config_info) {
+ plugin_header.data_length = this_plugin->save_config_info(buffer);
+ } else
+ plugin_header.data_length = 0;
+
+ if (file_pos > (off + count)) {
+ file_pos += sizeof(struct plugin_header) + plugin_header.data_length;
+ continue;
+ }
+
+ len = 0;
+ if ((file_pos + sizeof(struct plugin_header) >= off) &&
+ (file_pos < (off + count))) {
+
+ /* Save the details of the plugin */
+ memcpy(plugin_header.name, this_plugin->name,
+ SUSPEND_MAX_PLUGIN_NAME_LENGTH);
+ plugin_header.disabled = this_plugin->disabled;
+ plugin_header.type = this_plugin->type;
+ plugin_header.index = index++;
+
+ copy_len = sizeof(struct plugin_header);
+
+ if (copy_len + page_offset > count)
+ copy_len = count - page_offset;
+
+ memcpy(page + page_offset,
+ ((char *) &plugin_header) + off + page_offset - file_pos,
+ copy_len);
+
+ page_offset += copy_len;
+ }
+
+ file_pos += sizeof(struct plugin_header);
+
+ if (plugin_header.data_length && (file_pos >= off) && (file_pos < (off + count))) {
+ copy_len = plugin_header.data_length;
+
+ if (copy_len + page_offset > count + off)
+ copy_len = count - page_offset;
+
+ memcpy(page + page_offset,
+ buffer,
+ copy_len);
+
+ page_offset += copy_len;
+
+ }
+
+ file_pos += plugin_header.data_length;
+
+ }
+ free_pages((unsigned long) buffer, 0);
+ if (page_offset < count)
+ *eof = 1;
+ return page_offset;
+}
+
+extern int initialise_suspend_plugins(void);
+extern void cleanup_suspend_plugins(void);
+static char suspend_core_version[] = SUSPEND_CORE_VERSION;
+
+static int resume2_write_proc(void)
+{
+ mm_segment_t oldfs;
+
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+ initialise_suspend_plugins();
+ attempt_to_parse_resume_device();
+ cleanup_suspend_plugins();
+ set_fs(oldfs);
+ return 0;
+}
+
+/*
+ * Core proc entries that aren't built in.
+ *
+ * This array contains entries that are automatically registered at
+ * boot. Plugins and the console code register their own entries separately.
+ */
+
+static struct suspend_proc_data proc_params[] = {
+ { .filename = "all_settings",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_CUSTOM,
+ .data = {
+ .special = {
+ .read_proc = suspend_read_compat_proc,
+ .write_proc = suspend_write_compat_proc,
+ }
+ }
+ },
+
+ { .filename = "debug_info",
+ .permissions = PROC_READONLY,
+ .type = SUSPEND_PROC_DATA_CUSTOM,
+ .data = {
+ .special = {
+ .read_proc = debuginfo_read_proc,
+ }
+ }
+ },
+
+ { .filename = "disable_sysdev_suspend",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_DISABLE_SYSDEV_SUPPORT,
+ }
+ }
+ },
+
+ { .filename = "freeze_timers",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_FREEZE_TIMERS,
+ }
+ }
+ },
+
+ { .filename = "image_size_limit",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &image_size_limit,
+ .minimum = -2,
+ .maximum = 32767,
+ }
+ }
+ },
+
+ { .filename = "last_result",
+ .permissions = PROC_READONLY,
+ .type = SUSPEND_PROC_DATA_UL,
+ .data = {
+ .ul = {
+ .variable = &suspend_result,
+ }
+ }
+ },
+
+ { .filename = "reboot",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_REBOOT,
+ }
+ }
+ },
+
+ { .filename = "resume2",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_STRING,
+ .data = {
+ .string = {
+ .variable = resume2_file,
+ .max_length = 255,
+ }
+ },
+ .write_proc = resume2_write_proc,
+ },
+
+
+ { .filename = "version",
+ .permissions = PROC_READONLY,
+ .type = SUSPEND_PROC_DATA_STRING,
+ .data = {
+ .string = {
+ .variable = suspend_core_version,
+ }
+ }
+ },
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ { .filename = "freezer_test",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_FREEZER_TEST,
+ }
+ }
+ },
+
+ { .filename = "keep_metadata",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_KEEP_METADATA,
+ }
+ }
+ },
+
+ { .filename = "test_filter_speed",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_TEST_FILTER_SPEED,
+ }
+ }
+ },
+
+ { .filename = "slow",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_SLOW,
+ }
+ }
+ },
+
+ { .filename = "display_metadata_page",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_UL,
+ .data = {
+ .ul = {
+ .variable = &display_metadata_page,
+ }
+ }
+ },
+#endif
+
+ { .filename = "forced_pageset1_size",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_UL,
+ .data = {
+ .ul = {
+ .variable = &forced_ps1_size,
+ }
+ }
+ },
+
+ { .filename = "forced_pageset2_size",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_UL,
+ .data = {
+ .ul = {
+ .variable = &forced_ps2_size,
+ }
+ }
+ },
+
+#if defined(CONFIG_ACPI)
+ { .filename = "powerdown_method",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_UL,
+ .data = {
+ .ul = {
+ .variable = &suspend_powerdown_method,
+ .minimum = 3,
+ .maximum = 5,
+ }
+ }
+ },
+#endif
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_KEEP_IMAGE
+ { .filename = "keep_image",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_BIT,
+ .data = {
+ .bit = {
+ .bit_vector = &suspend_action,
+ .bit = SUSPEND_KEEP_IMAGE,
+ }
+ }
+ },
+#endif
+};
+
+extern int debuginfo_read_proc(char * page, char ** start, off_t off, int count,
+ int *eof, void *data);
+
+/*
+ * Called from init kernel_thread.
+ * We check if we have an image and if so we try to resume.
+ * We also start ksuspendd if configuration looks right.
+ */
+
+extern int freeze_processes(int no_progress);
+
+static int do_resume(void)
+{
+ int ret = 0;
+ int read_image_result = 0;
+
+ /* Suspend always runs on processor 0 */
+ ensure_on_processor_zero();
+
+ if (sizeof(swp_entry_t) != sizeof(long)) {
+ printk(KERN_WARNING name_suspend
+ "The size of swp_entry_t != size of long. "
+ "Please report this!\n");
+ return ret;
+ }
+
+ set_suspend_state(SUSPEND_RUNNING);
+
+ if (!resume2_file[0])
+ printk(KERN_WARNING name_suspend
+ "You need to use a resume2= command line parameter to "
+ "tell Software Suspend 2 where to look for an image.\n");
+
+ if (!(test_suspend_state(SUSPEND_RESUME_DEVICE_OK)))
+ attempt_to_parse_resume_device();
+
+ if (!(test_suspend_state(SUSPEND_RESUME_DEVICE_OK))) {
+ /*
+ * Without a usable storage device we can do nothing -
+ * even if noresume is given
+ */
+
+ if (!num_writers)
+ printk(KERN_ALERT name_suspend
+ "No writers have been registered.\n");
+ else
+ printk(KERN_ALERT name_suspend
+ "Missing or invalid storage location "
+ "(resume2= parameter). Please correct and "
+ "rerun lilo (or equivalent) before "
+ "suspending.\n");
+ clear_suspend_state(SUSPEND_RUNNING);
+ return ret;
+ }
+
+ /* We enable the possibility of machine suspend */
+ orig_mem_free = real_nr_free_pages();
+
+ suspend_task = current->pid;
+
+ read_image_result = read_primary_suspend_image(); /* non fatal error ignored */
+
+ if (test_suspend_state(SUSPEND_NORESUME_SPECIFIED))
+ printk(KERN_WARNING name_suspend "Resuming disabled as requested.\n");
+
+ if (read_image_result) {
+ suspend_task = 0;
+ clear_suspend_state(SUSPEND_RUNNING);
+ return ret;
+ }
+
+ /*
+ * Ensure our suspend device tree is configured (2.6) as
+ * at suspend time
+ */
+
+ suspend_version_specific_initialise();
+
+ freeze_processes(1);
+
+ prepare_status(0, 0,
+ "Copying original kernel back (no status - sensitive!)...");
+
+ do_suspend2_lowlevel(1);
+ BUG();
+
+ return ret;
+}
+
+extern int suspend_plugin_keypress(unsigned int keycode);
+extern void request_abort_suspend(void);
+extern void schedule_suspend_message(int message_number);
+
+int suspend_keypress(unsigned int keycode)
+{
+ /* These keys work even if no output is enabled.
+ * (To get this far, we must be suspending or resuming).
+ */
+ switch (keycode) {
+ case 27:
+ /* Abort suspend */
+ if (TEST_ACTION_STATE(SUSPEND_CAN_CANCEL))
+ request_abort_suspend();
+ break;
+ case 114:
+ /* Otherwise, if R pressed, toggle rebooting */
+ suspend_action ^= (1 << SUSPEND_REBOOT);
+ schedule_suspend_message(2);
+ break;
+ default:
+ return suspend_plugin_keypress(keycode);
+ }
+ return 1;
+}
+
+extern void suspend_early_boot_message_plugins(void);
+extern void cleanup_finished_suspend_io(void);
+
+struct suspend2_core_ops core_ops = {
+ .do_suspend = do_activate,
+ .do_resume = do_resume,
+ .resume1 = do_suspend2_resume_1,
+ .resume2 = do_suspend2_resume_2,
+ .suspend1 = do_suspend2_suspend_1,
+ .suspend2 = do_suspend2_suspend_2,
+ .free_pool_pages = free_suspend_pool_pages,
+ .get_pool_pages = get_suspend_pool_pages,
+ .get_grabbed_pages = get_grabbed_pages,
+ .cleanup_finished_io = cleanup_finished_suspend_io,
+ .suspend_message = __suspend_message,
+ .update_status = update_status,
+ .prepare_status = prepare_status,
+ .schedule_message = schedule_suspend_message,
+ .early_boot_plugins = suspend_early_boot_message_plugins,
+ .keypress = suspend_keypress,
+ .verify_checksums = __suspend2_verify_checksums,
+};
+
+static struct proc_dir_entry *compat_parent;
+extern void suspend_memory_pool_init(void);
+
+static __init int core_load(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ if (suspend2_register_core(&core_ops))
+ return -EBUSY;
+
+ printk("Software Suspend Core.\n");
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&proc_params[i]);
+
+ suspend_console_proc_init();
+
+ suspend_memory_pool_init();
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,0)
+ return 0;
+#else
+ return (!!compat_parent);
+#endif
+}
+
+#ifdef MODULE
+static __exit void core_unload(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ printk("Software Suspend Core unloading.\n");
+ suspend_console_proc_exit();
+
+ for (i=0; i< numfiles; i++)
+ suspend_unregister_procfile(&proc_params[i]);
+
+ suspend2_unregister_core();
+}
+
+module_init(core_load);
+module_exit(core_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Suspend2 core");
+#else
+late_initcall(core_load);
+#endif
+EXPORT_SYMBOL(checksum_map);
+EXPORT_SYMBOL(attempt_to_parse_resume_device);


2004-11-24 15:29:56

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 29/51: Clear swapfile bdev in swapoff.

Suspend uses the bdev field as its means of telling which swap devices
are in use. (This info needs to be used at resume time without actually
doing the swapon[s] again). In order to avoid an oops in the suspend
code if the user turns off a swap device, this small addition is
necessary. (If you want the long explanation, feel free to ask!)

diff -ruN 816-clear-swapfile-bdev-in-swapoff-old/mm/swapfile.c 816-clear-swapfile-bdev-in-swapoff-new/mm/swapfile.c
--- 816-clear-swapfile-bdev-in-swapoff-old/mm/swapfile.c 2004-11-06 09:26:59.372699648 +1100
+++ 816-clear-swapfile-bdev-in-swapoff-new/mm/swapfile.c 2004-11-04 16:27:41.000000000 +1100
@@ -1179,6 +1179,7 @@
swap_file = p->swap_file;
p->swap_file = NULL;
p->max = 0;
+ p->bdev = NULL;
swap_map = p->swap_map;
p->swap_map = NULL;
p->flags = 0;


2004-11-24 15:29:56

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 27/51: Block I/O module.

This is the code that does all the really hard work in reading and
writing the image. It provides full asynchronous I/O, and in combination
with the layer above, readahead where I/O needs to be synchronous (image
decompression). I/O is also batched to further improve throughput. Some
of the key parameters are user tunable, as the best setting varies from
computer to computer.

diff -ruN 812-suspend2-block-io-module-old/kernel/power/suspend_block_io.c 812-suspend2-block-io-module-new/kernel/power/suspend_block_io.c
--- 812-suspend2-block-io-module-old/kernel/power/suspend_block_io.c 1970-01-01 10:00:00.000000000 +1000
+++ 812-suspend2-block-io-module-new/kernel/power/suspend_block_io.c 2004-11-23 22:18:49.000000000 +1100
@@ -0,0 +1,827 @@
+/*
+ * block_io.c
+ *
+ * Copyright 2004 Nigel Cunningham <[email protected]>
+ *
+ * Distributed under GPLv2.
+ *
+ * This file contains block io functions for suspend2. These are
+ * used by the swapwriter and it is planned that they will also
+ * be used by the NFSwriter.
+ *
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/highmem.h>
+#include <linux/blkdev.h>
+#include <linux/bio.h>
+#include <linux/kthread.h>
+
+#include "suspend.h"
+#include "block_io.h"
+#include "proc.h"
+#include "plugins.h"
+
+/* Bits in struct io_info->flags */
+#define IO_WRITING 1
+#define IO_RESTORE_PAGE_PROT 2
+#define IO_AWAITING_READ 3
+#define IO_AWAITING_WRITE 4
+#define IO_CLEANUP_IN_PROGRESS 5
+#define IO_HANDLE_PAGE_PROT 6
+
+#define USE_KEVENTD
+//#define TUNE_BATCHING
+
+/*
+ * ---------------------------------------------------------------
+ *
+ * IO in progress information storage and helpers
+ *
+ * ---------------------------------------------------------------
+ */
+
+struct io_info {
+ struct bio * sys_struct;
+ long blocks[PAGE_SIZE/512];
+ struct page * buffer_page;
+ struct page * data_page;
+ unsigned long flags;
+ struct block_device * dev;
+ int blocks_used;
+ int block_size;
+ struct list_head list;
+ int readahead_index;
+ struct work_struct work;
+};
+
+static LIST_HEAD(ioinfo_free);
+static LIST_HEAD(ioinfo_ready_for_cleanup);
+static LIST_HEAD(ioinfo_busy);
+static LIST_HEAD(ioinfo_submit_batch);
+static spinlock_t ioinfo_lists_lock = SPIN_LOCK_UNLOCKED;
+
+static int submit_batch = 0, submit_batch_size = 32;
+static void submit_batched(void);
+
+struct task_struct * suspend_bio_task;
+
+/* [Max] number of I/O operations pending */
+static atomic_t outstanding_io;
+static int max_outstanding_io = 0;
+static int buffer_allocs, buffer_frees;
+
+/* [Max] number of pages used for above struct */
+static int infopages = 0;
+static int maxinfopages = 0;
+
+static volatile unsigned long suspend_readahead_flags[((MAX_READAHEAD + (8 * sizeof(unsigned long) - 1)) / (8 * sizeof(unsigned long)))];
+static spinlock_t suspend_readahead_flags_lock = SPIN_LOCK_UNLOCKED;
+static struct page * suspend_readahead_pages[MAX_READAHEAD];
+
+static unsigned long nr_schedule_calls[6];
+static unsigned long bio_jiffies = 0;
+
+static char * sch_caller[] = {
+ "get_io_info_struct ",
+ "suspend_finish_all_io ",
+ "wait_on_one_page ",
+ "submit ",
+ "start_one ",
+ "suspend_wait_on_readahead",
+};
+
+static void suspend_io_cleanup(void * data);
+
+static void do_bio_wait(int caller)
+{
+#ifndef USE_KEVENTD
+ int num_cleaned = 0;
+ struct io_info * this, * next = NULL;
+#endif
+ int device;
+
+ nr_schedule_calls[caller]++;
+
+ /* Don't want to wait on I/O we haven't submitted! */
+ submit_batched();
+
+#ifndef USE_KEVENTD
+ if (!list_empty(&ioinfo_ready_for_cleanup))
+ list_for_each_entry_safe(this, next, &ioinfo_ready_for_cleanup, list) {
+ suspend_io_cleanup((void *) this);
+ num_cleaned++;
+ if (num_cleaned == 32)
+ break;
+ }
+#endif
+ for (device = 0; device < MAX_SWAPFILES; device++) {
+ struct block_device * bdev = swap_info[device].bdev;
+ if (bdev) {
+ request_queue_t * q = bdev_get_queue(bdev);
+ if (q && q->unplug_fn)
+ q->unplug_fn(q);
+ }
+ /* kblockd_flush(); io_schedule(); */
+ }
+ schedule();
+}
+
+/*
+ * cleanup_one
+ *
+ * Description: Clean up after completing I/O on a page.
+ * Arguments: struct io_info: Data for I/O to be completed.
+ */
+static inline void cleanup_one(struct io_info * io_info)
+{
+ struct page * buffer_page;
+ struct page * data_page;
+ char *buffer_address, *data_address;
+ int reading;
+
+ buffer_page = io_info->buffer_page;
+ data_page = io_info->data_page;
+
+ /*
+ * Already being cleaned up? Can't happen while we're single
+ * threaded, but a good check for later.
+ */
+
+ if (test_and_set_bit(IO_CLEANUP_IN_PROGRESS, &io_info->flags))
+ return;
+
+ reading = test_bit(IO_AWAITING_READ, &io_info->flags);
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Cleanup IO: [%p]\n",
+ io_info);
+
+ if (reading && io_info->readahead_index == -1) {
+ /*
+ * Copy the page we read into the buffer our caller provided.
+ */
+ data_address = (char *) kmap(data_page);
+ buffer_address = (char *) kmap(buffer_page);
+ memcpy(data_address, buffer_address, PAGE_SIZE);
+ flush_dcache_page(data_page);
+ kunmap(data_page);
+ kunmap(buffer_page);
+
+ }
+
+ if (!reading || io_info->readahead_index == -1) {
+ /* Sanity check */
+ if (page_count(buffer_page) != 2)
+ printk(KERN_EMERG "Cleanup IO: Page count is %d. Not good!\n",
+ page_count(buffer_page));
+ put_page(buffer_page);
+ __free_pages(buffer_page, 0);
+ buffer_frees++;
+ } else
+ put_page(buffer_page);
+
+ atomic_dec(&outstanding_io);
+ bio_put(io_info->sys_struct);
+ io_info->sys_struct = NULL;
+ io_info->flags = 0;
+}
+
+/*
+ * get_io_info_struct
+ *
+ * Description: Get an I/O struct.
+ * Returns: Pointer to the struct prepared for use.
+ */
+static struct io_info * get_io_info_struct(void)
+{
+ unsigned long newpage = 0, flags;
+ struct io_info * this = NULL;
+ int remaining = 0;
+
+ do {
+ /* Have we reached our number-of-IOs-activate-at-one limit? */
+ if ((max_async_ios) && (atomic_read(&outstanding_io) >= max_async_ios)) {
+ do_bio_wait(0);
+ continue;
+ }
+
+ /* Can start a new I/O. Is there a free one? */
+ if (!list_empty(&ioinfo_free)) {
+ /* Yes. Grab it. */
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+ break;
+ }
+
+ /* No. Need to allocate a new page for I/O info structs. */
+ newpage = get_zeroed_page(GFP_ATOMIC);
+ if (!newpage)
+ continue;
+
+ suspend_message(SUSPEND_MEMORY, SUSPEND_VERBOSE, 0,
+ "[NewIOPage %lx]", newpage);
+ infopages++;
+ if (infopages > maxinfopages)
+ maxinfopages++;
+
+ /* Prepare the new page for use. */
+ this = (struct io_info *) newpage;
+ remaining = PAGE_SIZE;
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+ while (remaining >= (sizeof(struct io_info))) {
+ list_add_tail(&this->list, &ioinfo_free);
+ this = (struct io_info *) (((char *) this) +
+ sizeof(struct io_info));
+ remaining -= sizeof(struct io_info);
+ }
+ break;
+ } while (1);
+
+ /* We have an I/O info struct. Move it to the busy list. */
+ this = list_entry(ioinfo_free.next, struct io_info, list);
+ list_move_tail(&this->list, &ioinfo_busy);
+ spin_unlock_irqrestore(&ioinfo_lists_lock, flags);
+ return this;
+}
+
+/*
+ * suspend_finish_all_io
+ *
+ * Description: Finishes all IO and frees all IO info struct pages.
+ */
+static void suspend_finish_all_io(void)
+{
+ struct io_info * this, * next = NULL;
+ unsigned long flags;
+
+ /* Submit any pending write batch */
+ submit_batched();
+
+ /* Wait for all I/O to complete. */
+ while (atomic_read(&outstanding_io))
+ do_bio_wait(1);
+
+ /*
+ * We're single threaded and all I/O is completed, so we shouldn't
+ * need to use the spinlock, but let's be safe.
+ */
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+
+ /*
+ * Two stages, to avoid using freed pages.
+ *
+ * First free all io_info structs on a page except the first.
+ */
+ list_for_each_entry_safe(this, next, &ioinfo_free, list) {
+ if (((unsigned long) this) & ~PAGE_MASK)
+ list_del(&this->list);
+ }
+
+ /*
+ * Now we have only one reference to each page, and can safely
+ * free pages, knowing we're not going to be trying to access the
+ * same page after freeing it.
+ */
+ list_for_each_entry_safe(this, next, &ioinfo_free, list) {
+ list_del(&this->list);
+ free_pages((unsigned long) this, 0);
+ infopages--;
+ suspend_message(SUSPEND_MEMORY, SUSPEND_VERBOSE, 0,
+ "[FreedIOPage %lx]", this);
+ }
+
+ spin_unlock_irqrestore(&ioinfo_lists_lock, flags);
+}
+
+/*
+ * wait_on_one_page
+ *
+ * Description: Wait for a particular I/O to complete.
+ */
+static void wait_on_one_page(struct io_info * io_info)
+{
+ do { do_bio_wait(2); } while (io_info->flags);
+}
+
+/*
+ * suspend_reset_io_stats
+ *
+ * Description: Reset all our sanity-checking statistics.
+ */
+static void suspend_reset_io_stats(void)
+{
+ int i;
+
+ max_outstanding_io = 0;
+ maxinfopages = 0;
+ buffer_allocs = buffer_frees = 0;
+
+ for (i = 0; i < 6; i++)
+ nr_schedule_calls[i] = 0;
+ bio_jiffies = 0;
+}
+
+/*
+ * suspend_check_io_stats
+ *
+ * Description: Check that our statistics look right and print
+ * any debugging info wanted.
+ */
+static void suspend_check_io_stats(void)
+{
+ int i;
+
+ BUG_ON(atomic_read(&outstanding_io));
+ BUG_ON(infopages);
+ BUG_ON(buffer_allocs != buffer_frees);
+ BUG_ON(!list_empty(&ioinfo_busy));
+ BUG_ON(!list_empty(&ioinfo_ready_for_cleanup));
+ BUG_ON(!list_empty(&ioinfo_free));
+
+ if (atomic_read(&outstanding_io))
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Outstanding_io after writing is %d.\n",
+ atomic_read(&outstanding_io));
+ suspend_message(SUSPEND_WRITER, SUSPEND_LOW, 0,
+ "Maximum outstanding_io was %d.\n",
+ max_outstanding_io);
+ if (infopages)
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Info pages is %d.\n",
+ infopages);
+ suspend_message(SUSPEND_WRITER, SUSPEND_LOW, 0,
+ "Max info pages was %d.\n",
+ maxinfopages);
+ if (buffer_allocs != buffer_frees)
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Buffer allocs (%d) != buffer frees (%d)",
+ buffer_allocs,
+ buffer_frees);
+ for(i = 0; i < 6; i++)
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Nr schedule calls %s: %lu.\n", sch_caller[i], nr_schedule_calls[i]);
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Jiffies waiting for bio calls:%lu.\n", bio_jiffies);
+}
+
+/* suspend_io_cleanup
+ */
+
+static void suspend_io_cleanup(void * data)
+{
+ struct io_info * io_info = (struct io_info *) data;
+ int readahead_index;
+ unsigned long flags;
+
+ /*
+ * If this I/O was a readahead, remember its index.
+ */
+ readahead_index = io_info->readahead_index;
+
+ /*
+ * Do the cleanup.
+ */
+ cleanup_one(io_info);
+
+ /*
+ * Record the readahead as done.
+ */
+ if (readahead_index > -1) {
+ int index = readahead_index/(8 * sizeof(unsigned long));
+ int bit = readahead_index - (index * 8 * sizeof(unsigned long));
+ spin_lock_irqsave(&suspend_readahead_flags_lock, flags);
+ set_bit(bit, &suspend_readahead_flags[index]);
+ spin_unlock_irqrestore(&suspend_readahead_flags_lock, flags);
+ }
+
+ /*
+ * Add it to the free list.
+ */
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+ list_move_tail(&io_info->list, &ioinfo_free);
+ spin_unlock_irqrestore(&ioinfo_lists_lock, flags);
+}
+
+/*
+ * suspend_end_bio
+ *
+ * Description: Function called by block driver from interrupt context when I/O
+ * is completed. This is the reason we use spinlocks in
+ * manipulating the io_info lists.
+ * Nearly the fs/buffer.c version, but we want to mark the page as
+ * done in our own structures too.
+ */
+
+static int suspend_end_bio(struct bio * bio, unsigned int num, int err)
+{
+ struct io_info *io_info = (struct io_info *) bio->bi_private;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+ list_move_tail(&io_info->list, &ioinfo_ready_for_cleanup);
+ spin_unlock_irqrestore(&ioinfo_lists_lock, flags);
+
+#ifdef USE_KEVENTD
+ INIT_WORK(&io_info->work, suspend_io_cleanup, (void *) io_info);
+ schedule_work(&io_info->work);
+#endif
+ return 0;
+}
+
+/**
+ * submit - submit BIO request.
+ * @rw: READ or WRITE.
+ * @io_info: IO info structure.
+ *
+ * Straight from the textbook - allocate and initialize the bio.
+ * If we're writing, make sure the page is marked as dirty.
+ * Then submit it and carry on.
+ */
+
+static int submit(int rw, struct io_info * io_info)
+{
+ int error = 0;
+ struct bio * bio = NULL;
+ unsigned long j1 = jiffies;
+
+ while (!bio) {
+ bio = bio_alloc(GFP_ATOMIC,1);
+ if (!bio)
+ do_bio_wait(3);
+ }
+
+ bio->bi_sector = io_info->blocks[0] << (PAGE_SHIFT - 9);
+ bio->bi_bdev = io_info->dev;
+ bio->bi_private = io_info;
+ bio->bi_end_io = suspend_end_bio;
+ io_info->sys_struct = bio;
+
+ if (bio_add_page(bio, io_info->buffer_page, PAGE_SIZE, 0) < PAGE_SIZE) {
+ printk("ERROR: adding page to bio at %ld\n",
+ io_info->blocks[0]);
+ bio_put(bio);
+ return -EFAULT;
+ }
+
+ if (rw == WRITE)
+ bio_set_pages_dirty(bio);
+ submit_bio(rw,bio);
+ bio_jiffies += jiffies - j1;
+ return error;
+}
+
+/*
+ * suspend_set_block_size
+ *
+ * Description: Set the blocksize for a bdev. This is a separate function
+ * because we have different versions for 2.4 and 2.6.
+ */
+static int suspend_set_block_size(struct block_device * bdev, int size)
+{
+ return set_blocksize(bdev, size);
+}
+
+static int suspend_get_block_size(struct block_device * bdev)
+{
+ return block_size(bdev);
+}
+
+/*
+ * We don't need to worry about new requests being added to the list;
+ * we're called from process context
+ */
+static void submit_batched(void)
+{
+ unsigned long flags;
+ struct io_info * this, * next = NULL;
+
+ list_for_each_entry_safe(this, next, &ioinfo_submit_batch, list) {
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+ list_move_tail(&this->list, &ioinfo_busy);
+ spin_unlock_irqrestore(&ioinfo_lists_lock, flags);
+ if (test_bit(IO_AWAITING_READ, &this->flags))
+ submit(READ, this);
+ else
+ submit(WRITE, this);
+ }
+ submit_batch = 0;
+}
+static void add_to_batch(struct io_info * io_info)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&ioinfo_lists_lock, flags);
+ /* We have an I/O info struct. Move it to the batch list. */
+ list_move_tail(&io_info->list, &ioinfo_submit_batch);
+ spin_unlock_irqrestore(&ioinfo_lists_lock, flags);
+
+ submit_batch++;
+
+ if (submit_batch == submit_batch_size)
+ submit_batched();
+}
+/*
+ * start_one
+ *
+ * Description: Prepare and start a read or write operation.
+ * Note that we use our own buffer for reading or writing.
+ * This simplifies doing readahead and asynchronous writing.
+ * We can begin a read without knowing the location into which
+ * the data will eventually be placed, and the buffer passed
+ * for a write can be reused immediately (essential for the
+ * plugins system).
+ * Failure? What's that?
+ * Returns: The io_info struct created.
+ */
+static struct io_info * start_one(int rw, struct submit_params * submit_info)
+{
+ struct io_info * io_info = get_io_info_struct();
+ unsigned long buffer_virt = 0;
+ char * to, * from;
+ struct page * buffer_page;
+ int i;
+
+ if (!io_info)
+ return NULL;
+
+ /* Get our local buffer */
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1,
+ "Start_IO: [%p]", io_info);
+
+ /* Copy settings to the io_info struct */
+ io_info->data_page = submit_info->page;
+ io_info->readahead_index = submit_info->readahead_index;
+
+ if (io_info->readahead_index == -1) {
+ while (!(buffer_virt = get_zeroed_page(GFP_ATOMIC)))
+ do_bio_wait(4);
+
+ buffer_allocs++;
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "[ALLOC BUFFER]->%d",
+ real_nr_free_pages());
+ buffer_page = virt_to_page(buffer_virt);
+
+ io_info->buffer_page = buffer_page;
+ } else {
+ unsigned long flags;
+ int index = io_info->readahead_index/(8 * sizeof(unsigned long));
+ int bit = io_info->readahead_index - index * 8 * sizeof(unsigned long);
+
+ spin_lock_irqsave(&suspend_readahead_flags_lock, flags);
+ clear_bit(bit, &suspend_readahead_flags[index]);
+ spin_unlock_irqrestore(&suspend_readahead_flags_lock, flags);
+
+ io_info->buffer_page = buffer_page = submit_info->page;
+ }
+
+ /* If writing, copy our data. The data is probably in
+ * lowmem, but we cannot be certain. If there is no
+ * compression/encryption, we might be passed the
+ * actual source page's address. */
+ if (rw == WRITE) {
+ set_bit(IO_WRITING, &io_info->flags);
+
+ to = (char *) buffer_virt;
+ from = kmap_atomic(io_info->data_page, KM_USER1);
+ memcpy(to, from, PAGE_SIZE);
+ flush_dcache_page(io_info->data_page);
+ flush_dcache_page(buffer_page);
+ kunmap_atomic(from, KM_USER1);
+ }
+
+ /* Submit the page */
+ get_page(buffer_page);
+
+ io_info->dev = submit_info->dev;
+ for (i = 0; i < submit_info->blocks_used; i++)
+ io_info->blocks[i] = submit_info->blocks[i];
+ io_info->blocks_used = submit_info->blocks_used;
+ io_info->block_size = PAGE_SIZE / submit_info->blocks_used;
+
+ if (rw == READ)
+ set_bit(IO_AWAITING_READ, &io_info->flags);
+ else
+ set_bit(IO_AWAITING_WRITE, &io_info->flags);
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1,
+ "-> (PRE BRW) %d\n",
+ real_nr_free_pages());
+
+ if (submit_batch_size > 1)
+ add_to_batch(io_info);
+ else
+ submit(rw, io_info);
+
+ atomic_inc(&outstanding_io);
+ if (atomic_read(&outstanding_io) > max_outstanding_io)
+ max_outstanding_io++;
+
+ return io_info;
+}
+
+static int suspend_do_io(int rw,
+ struct submit_params * submit_info, int syncio)
+{
+ struct io_info * io_info = start_one(rw, submit_info);
+ if (!io_info)
+ return 1;
+ else if (syncio)
+ wait_on_one_page(io_info);
+
+ /* If we were the only one, clean everything up */
+ if (!atomic_read(&outstanding_io))
+ suspend_finish_all_io();
+ return 0;
+}
+
+/* We used to use bread here, but it doesn't correctly handle
+ * blocksize != PAGE_SIZE. Now we create a submit_info to get the data we
+ * want and use our normal routines (synchronously).
+ */
+
+static int suspend_bdev_page_io(int rw, struct block_device * bdev, long pos,
+ struct page * page)
+{
+ struct submit_params submit_info;
+
+ submit_info.page = page;
+ submit_info.dev = bdev;
+
+ submit_info.blocks[0] = pos;
+ submit_info.blocks_used = 1;
+ submit_info.readahead_index = -1;
+ return suspend_do_io(rw, &submit_info, 1);
+}
+
+/*
+ * wait_on_readahead
+ *
+ * Wait until a particular readahead is ready.
+ */
+static void suspend_wait_on_readahead(int readahead_index)
+{
+ int index = readahead_index/(8 * sizeof(unsigned long));
+ int bit = readahead_index - index * 8 * sizeof(unsigned long);
+
+ /* read_ahead_index is the one we want to return */
+ while (!test_bit(bit, &suspend_readahead_flags[index]))
+ do_bio_wait(5);
+}
+
+/*
+ * readahead_done
+ *
+ * Returns whether the readahead requested is ready.
+ */
+
+static int suspend_readahead_ready(int readahead_index)
+{
+ int index = readahead_index/(8 * sizeof(unsigned long));
+ int bit = readahead_index - (index * 8 * sizeof(unsigned long));
+
+ return test_bit(bit, &suspend_readahead_flags[index]);
+}
+
+/* suspend_readahead_prepare
+ * Set up for doing readahead on an image */
+static int suspend_prepare_readahead(int index)
+{
+ unsigned long new_page = get_zeroed_page(GFP_ATOMIC);
+
+ if(!new_page) {
+ printk("No page for readahead %d.\n", index);
+ return -ENOMEM;
+ }
+
+ suspend_bio_ops.readahead_pages[index] = virt_to_page(new_page);
+ return 0;
+}
+
+/* suspend_readahead_cleanup
+ * Clean up structures used for readahead */
+static void suspend_cleanup_readahead(int page)
+{
+ __free_pages(suspend_bio_ops.readahead_pages[page], 0);
+ suspend_bio_ops.readahead_pages[page] = 0;
+ return;
+}
+
+static unsigned long suspend_bio_memory_needed(void)
+{
+ return (REAL_MAX_ASYNC * (PAGE_SIZE + sizeof(struct request) +
+ sizeof(struct bio) + sizeof(struct io_info)));
+}
+
+#if 0
+static int suspend_bio_kthread(void * data)
+{
+ return 0;
+}
+
+static int start_suspend_bio_thread(void)
+{
+
+ suspend_bio_task = kthread_run(suspend_bio_kthread, NULL,
+ PF_NOFREEZE, "suspend_bio");
+
+ if (IS_ERR(suspend_bio_task)) {
+ printk("suspend_bio thread could not be started.\n");
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+static void end_suspend_bio_thread(void)
+{
+}
+#endif
+
+static struct suspend_proc_data proc_params[] = {
+ { .filename = "async_io_limit",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &max_async_ios,
+ .minimum = 1,
+ .maximum = MAX_READAHEAD,
+ }
+ }
+ },
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+#ifdef TUNE_BATCHING
+ { .filename = "submit_batch_size",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &submit_batch_size,
+ .minimum = 1,
+ .maximum = 512,
+ }
+ }
+ },
+#endif
+#endif
+};
+
+struct suspend_bio_ops suspend_bio_ops = {
+ .set_block_size = suspend_set_block_size,
+ .get_block_size = suspend_get_block_size,
+ .submit_io = suspend_do_io,
+ .bdev_page_io = suspend_bdev_page_io,
+ .prepare_readahead = suspend_prepare_readahead,
+ .cleanup_readahead = suspend_cleanup_readahead,
+ .readahead_pages = suspend_readahead_pages,
+ .wait_on_readahead = suspend_wait_on_readahead,
+ .check_io_stats = suspend_check_io_stats,
+ .reset_io_stats = suspend_reset_io_stats,
+ .finish_all_io = suspend_finish_all_io,
+ .readahead_ready = suspend_readahead_ready,
+};
+
+EXPORT_SYMBOL(suspend_bio_ops);
+
+static struct suspend_plugin_ops suspend_blockwriter_ops =
+{
+ .name = "Block I/O",
+ .type = MISC_PLUGIN,
+ //.initialise = start_suspend_bio_thread,
+ //.cleanup = end_suspend_bio_thread,
+ .memory_needed = suspend_bio_memory_needed,
+};
+
+static __init int suspend_block_io_load(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+ int result;
+
+ if (!(result = suspend_register_plugin(&suspend_blockwriter_ops))) {
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&proc_params[i]);
+ }
+
+ return result;
+}
+
+#ifdef MODULE
+static __exit void suspend_block_io_unload(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ for (i=0; i< numfiles; i++)
+ suspend_unregister_procfile(&proc_params[i]);
+ suspend_unregister_plugin(&suspend_blockwriter_ops);
+}
+
+module_init(suspend_block_io_load);
+module_exit(suspend_block_io_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Suspend2 block io functions");
+#else
+late_initcall(suspend_block_io_load);
+#endif


2004-11-24 15:34:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 34/51: Includes


please submit header changes together with the matching code changes.
And all this plugin thingies in here look like overengineering.

2004-11-24 15:46:44

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Suspend 2 merge: 31/51: Export tlb flushing

--Nigel Cunningham <[email protected]> wrote (on Wednesday, November 24, 2004 23:59:50 +1100):

> This patch adds a do_flush_tlb_all function that does the
> SMP-appropriate thing for suspend after the image is restored.

Is software suspend only designed for i386, or is that the only arch that
didn't have such a function already? Seems like too low a level to be
exporting to me.

M.

> diff -ruN 818-tlb-flushing-functions-old/arch/i386/kernel/smp.c 818-tlb-flushing-functions-new/arch/i386/kernel/smp.c
> --- 818-tlb-flushing-functions-old/arch/i386/kernel/smp.c 2004-11-06 09:27:19.225681536 +1100
> +++ 818-tlb-flushing-functions-new/arch/i386/kernel/smp.c 2004-11-04 16:27:41.000000000 +1100
> @@ -476,7 +476,7 @@
> preempt_enable();
> }
>
> -static void do_flush_tlb_all(void* info)
> +void do_flush_tlb_all(void* info)
> {
> unsigned long cpu = smp_processor_id();
>
> diff -ruN 818-tlb-flushing-functions-old/include/asm-i386/tlbflush.h 818-tlb-flushing-functions-new/include/asm-i386/tlbflush.h
> --- 818-tlb-flushing-functions-old/include/asm-i386/tlbflush.h 2004-11-03 21:55:01.000000000 +1100
> +++ 818-tlb-flushing-functions-new/include/asm-i386/tlbflush.h 2004-11-04 16:27:41.000000000 +1100
> @@ -82,6 +82,7 @@
> #define flush_tlb() __flush_tlb()
> #define flush_tlb_all() __flush_tlb_all()
> #define local_flush_tlb() __flush_tlb()
> +#define local_flush_tlb_all() __flush_tlb_all();
>
> static inline void flush_tlb_mm(struct mm_struct *mm)
> {
> @@ -114,6 +115,10 @@
> extern void flush_tlb_current_task(void);
> extern void flush_tlb_mm(struct mm_struct *);
> extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
> +extern void do_flush_tlb_all(void * info);
> +
> +#define local_flush_tlb_all() \
> + do_flush_tlb_all(NULL);
>
> #define flush_tlb() flush_tlb_current_task()
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>


2004-11-24 13:24:46

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 26/51: Kconfig and makefile.

Here are the changes to kernel/power/Makefile|Kconfig

diff -ruN 811-Kconfig-and-Makefile-for-suspend2-old/kernel/power/Kconfig 811-Kconfig-and-Makefile-for-suspend2-new/kernel/power/Kconfig
--- 811-Kconfig-and-Makefile-for-suspend2-old/kernel/power/Kconfig 2004-11-24 09:53:12.000000000 +1100
+++ 811-Kconfig-and-Makefile-for-suspend2-new/kernel/power/Kconfig 2004-11-24 18:51:15.213707144 +1100
@@ -30,6 +30,8 @@
bool "Software Suspend (EXPERIMENTAL)"
depends on EXPERIMENTAL && PM && SWAP
---help---
+ Pavel's original version.
+
Enable the possibility of suspending the machine.
It doesn't need APM.
You may suspend your machine by 'swsusp' or 'shutdown -z <time>'
@@ -73,3 +75,138 @@
suspended image to. It will simply pick the first available swap
device.

+menu "Software Suspend 2"
+
+config SOFTWARE_SUSPEND2_CORE
+ tristate "Software Suspend 2"
+ depends on PM
+ select SOFTWARE_SUSPEND2
+ ---help---
+ Software Suspend 2 is the 'new and improved' suspend support. You
+ can now build it as modules, but be aware that this requires
+ initrd support (the modules you use in saving the image have to
+ be loaded in order for you to be able to resume!)
+
+ See the Software Suspend home page (softwaresuspend.berlios.de)
+ for FAQs, HOWTOs and other documentation.
+
+ config SOFTWARE_SUSPEND2
+ bool
+
+ if SOFTWARE_SUSPEND2
+ config SOFTWARE_SUSPEND2_WRITER
+ bool
+
+ comment 'Image Storage (you need at least one writer)'
+ depends on SOFTWARE_SUSPEND2_CORE
+
+ config SOFTWARE_SUSPEND_SWAPWRITER
+ tristate ' Swap Writer'
+ depends on SWAP && SOFTWARE_SUSPEND2_CORE
+ select SOFTWARE_SUSPEND2_WRITER
+ ---help---
+ This option enabled support for storing an image in your
+ swap space. Swap partitions are supported. Swap file
+ support is currently broken (16 April 2004).
+
+ comment 'Page Transformers'
+ depends on SOFTWARE_SUSPEND2_WRITER
+
+ if SOFTWARE_SUSPEND2_WRITER
+ config SOFTWARE_SUSPEND_LZF_COMPRESSION
+ tristate ' LZF image compression (Preferred)'
+ ---help---
+ This option enables compression of pages stored during suspending
+ to disk, using LZF compression. LZF compression is fast and
+ still achieves a good compression ratio.
+
+ You probably want to say 'Y'.
+
+ config SOFTWARE_SUSPEND_GZIP_COMPRESSION
+ tristate ' GZIP image Compression (Slow)'
+ depends on SOFTWARE_SUSPEND2_CORE
+ select ZLIB_DEFLATE
+ select ZLIB_INFLATE
+ ---help---
+ This option enables compression of pages stored during Software Suspend
+ process. Pages are compressed using the zlib library, with a default
+ setting (in code) of fastest compression (still VERY slow!). If your swap
+ device is painfully slow compared to your CPU, you might possibly want
+ this. Then again, you might just want to upgrade your storage (if you
+ can).
+
+ Just in case you haven't gotten the hint yet, this option should be off
+ for most people. If will make your computer take a minute to suspend
+ when it could take seconds.
+
+ config SOFTWARE_SUSPEND_DEVICE_MAPPER
+ tristate ' Device Mapper support'
+ depends on SOFTWARE_SUSPEND2_CORE && BLK_DEV_DM
+ ---help---
+ This option creates a module which allows Suspend to tell the
+ device mapper code to allocate enough memory for its work while
+ suspending. It doesn't do anything else, but without it, dm-crypt
+ won't work properly.
+
+ This option should be off for most people.
+
+ comment 'User Interface Options'
+
+ config SOFTWARE_SUSPEND_BOOTSPLASH
+ tristate ' Bootsplash support'
+ depends on SOFTWARE_SUSPEND2_CORE && BOOTSPLASH
+ ---help---
+ This option enables support for Bootsplash (bootsplash.org). Suspend
+ can set the progress bar value and switch between silent and verbose
+ modes. (Silent mode is used when the debug level is 0 or 1).
+
+ config SOFTWARE_SUSPEND_TEXT_MODE
+ tristate ' Text mode console support'
+ depends on SOFTWARE_SUSPEND2_CORE && VT
+ ---help---
+ This option enables support for a text mode 'nice display'. If you don't
+ have/want bootsplash support, you probably want this.
+
+ comment 'General Options'
+
+ config SOFTWARE_SUSPEND_DEFAULT_RESUME2
+ string ' Default resume device name'
+ ---help---
+ You normally need to add a resume2= parameter to your lilo.conf or
+ equivalent. With this option properly set, the kernel has a value
+ to default. No damage will be done if the value is invalid.
+
+ config SOFTWARE_SUSPEND_KEEP_IMAGE
+ bool ' Allow Keep Image Mode'
+ ---help---
+ This option allows you to keep and image and reuse it. It is intended
+ __ONLY__ for use with systems where all filesystems are mounted read-
+ only (kiosks, for example). To use it, compile this option in and boot
+ normally. Set the KEEP_IMAGE flag in /proc/software_suspend and suspend.
+ When you resume, the image will not be removed. You will be unable to turn
+ off swap partitions (assuming you are using the swap writer), but future
+ suspends simply do a power-down. The image can be updated using the
+ kernel command line parameter suspend_act= to turn off the keep image
+ bit. Keep image mode is a little less user friendly on purpose - it
+ should not be used without thought!
+
+ comment 'Debugging'
+
+ config SOFTWARE_SUSPEND_DEBUG
+ bool ' Compile in debugging output'
+ ---help---
+ This option enables the inclusion of debugging info in the software
+ suspend code. Turning it off will reduce the kernel size but make
+ debugging suspend & resume issues harder to do.
+
+ For normal usage, this option can be turned off.
+
+ endif
+
+ endif
+
+endmenu
+
+comment 'Suspend2 depends on EXPERIMENTAL and PM support.'
+ depends on !EXPERIMENTAL || !PM
+
diff -ruN 811-Kconfig-and-Makefile-for-suspend2-old/kernel/power/Makefile 811-Kconfig-and-Makefile-for-suspend2-new/kernel/power/Makefile
--- 811-Kconfig-and-Makefile-for-suspend2-old/kernel/power/Makefile 2004-11-03 21:55:05.000000000 +1100
+++ 811-Kconfig-and-Makefile-for-suspend2-new/kernel/power/Makefile 2004-11-24 18:50:24.503416280 +1100
@@ -6,6 +6,22 @@
swsusp-smp-$(CONFIG_SMP) += smp.o

obj-y := main.o process.o console.o pm.o
+
+ifeq ($(CONFIG_SOFTWARE_SUSPEND2),y)
+obj-y += suspend_builtin.o proc.o
+endif
+
+suspend_core-objs := io.o memory_pool.o pagedir.o prepare_image.o \
+ range.o suspend.o plugins.o suspend_ui.o utility.o
+
+obj-$(CONFIG_SOFTWARE_SUSPEND2_CORE) += suspend_core.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_BOOTSPLASH) += suspend_bootsplash.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_TEXT_MODE) += suspend_text.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_LZF_COMPRESSION) += suspend_lzf.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_GZIP_COMPRESSION) += suspend_gzip.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_DEVICE_MAPPER) += suspend_dm.o
+obj-$(CONFIG_SOFTWARE_SUSPEND_SWAPWRITER) += suspend_block_io.o suspend_swap.o
+
obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o $(swsusp-smp-y) disk.o

obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o


2004-11-24 13:24:46

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 25/51: Documentation

Here is the kernel tree documentation. I have a word document that I'm
also working on, which will provide detail not given here, including an
explanation of the API for people who want to write new extensions.

diff -ruN 810-documentation-old/Documentation/kernel-parameters.txt 810-documentation-new/Documentation/kernel-parameters.txt
--- 810-documentation-old/Documentation/kernel-parameters.txt 2004-11-03 21:55:04.000000000 +1100
+++ 810-documentation-new/Documentation/kernel-parameters.txt 2004-11-04 16:27:40.000000000 +1100
@@ -804,6 +804,8 @@

noresume [SWSUSP] Disables resume and restore original swap space.

+ noresume2 [SWSUSP2] Disables resuming and restores original swap signature.
+
no-scroll [VGA] Disables scrollback.
This is required for the Braillex ib80-piezo Braille
reader made by F.H. Papenmeier (Germany).
@@ -1011,7 +1013,12 @@

reserve= [KNL,BUGS] Force the kernel to ignore some iomem area

- resume= [SWSUSP] Specify the partition device for software suspension
+ resume= [SWSUSP] Specify the partition device for software suspension.
+
+ resume2= [SWSUSP2] Specify the storage device for software suspend.
+ Format: <writer>:<writer-parameters>.
+ See Documentation/power/swsusp2.txt for details of the formats
+ for available image writers.

rhash_entries= [KNL,NET]
Set number of hash buckets for route cache
diff -ruN 810-documentation-old/Documentation/power/suspend2.txt 810-documentation-new/Documentation/power/suspend2.txt
--- 810-documentation-old/Documentation/power/suspend2.txt 1970-01-01 10:00:00.000000000 +1000
+++ 810-documentation-new/Documentation/power/suspend2.txt 2004-11-04 16:27:40.000000000 +1100
@@ -0,0 +1,506 @@
+ --- Software Suspend for Linux, version 2.0 ---
+
+1. What is it?
+2. Why would you want it?
+3. What do you need to use it?
+4. How do you use it?
+5. What do all those entries in /proc/software_suspend do?
+6. How do you get support?
+7. I think I've found a bug. What should I do?
+8. When will XXX be supported?
+9. How does it work?
+10. Who wrote Software Suspend?
+
+1. What is it?
+
+ Imagine you're sitting at your computer, working away. For some reason, you
+ need to turn off your computer for a while - perhaps it's time to go home
+ for the day. When you come back to your computer next, you're going to want
+ to carry on where you left off. Now imagine that you could push a button and
+ have your computer store the contents of its memory to disk and power down.
+ Then, when you next start up your computer, it loads that image back into
+ memory and you can carry on from where you were, just as if you'd never
+ turned the computer off. Far less time to start up, no reopening
+ applications and finding what directory you put that file in yesterday.
+ That's what Software Suspend does.
+
+2. Why would you want it?
+
+ Why wouldn't you want it?
+
+ Being able to save the state of your system and quickly restore it improves
+ your productivity - you get a useful system in far less time than through
+ the normal boot process.
+
+3. What do you need to use it?
+
+ a. Kernel Support.
+
+ Software Suspend is part of the Linux Kernel. This version is not part of Linus's
+ 2.6 tree at the moment, so you will need to download the kernel source and
+ apply the latest patch. Having done that, enable the appropriate options in
+ make [menu|x]config (under General Setup), compile and install your kernel.
+ Software Suspend works with SMP, Highmem, preemption, x86-32, PPC and mac.
+ x86-64 support is coming.
+
+ Software Suspend patches are available from http://softwaresuspend.berlios.de.
+
+ You may also want to apply the optional patches. At the time of writing,
+ option patches are available to support Bootsplash (http://www.bootsplash.org, for
+ an even nicer display during suspend), Win4Lin. The Win4Lin option patch (only
+ needed for 2.4) provides support for Win4Lin.
+
+ Option patches should be applied after the main patch and after Win4Lin
+ or Bootsplash.
+
+ As of version 2.0.0.102, suspend can be built as modules. To use this
+ configuration, you need to have initrd support, because resuming needs
+ to occur before any filesystems that were mounted when you suspended
+ are remounted. For details on setting up suspend-as-modules, please
+ read the FAQs on suspend's web site.
+
+ b. Swapspace.
+
+ Software Suspend can store the suspend image in your swap partition,
+ a swap file or a combination thereof. Whichever combination you choose, you
+ will probably want to create enough swap space to store the largest image
+ you could have, plus the space you'd normally use for swap. A good rule of
+ thumb would be to calculate the amount of swap you'd want without using
+ Software Suspend, and then add the amount of memory you have. This swap
+ space can be arranged in any way you'd like. It can be in one partition or
+ file, or spread over a number. The only requirement is that they be active
+ when you start a suspend cycle.
+
+ There is one exception to this requirement. Software Suspend has
+ the ability to turn on one swap file or partition at the start of
+ suspending and turn it back off at the end. If you want to ensure you have
+ enough memory to store a image when your memory is fully used, you might
+ want to make one swap partition/file for 'normal' use, and another for
+ Software Suspend to activate & deactivate automatically. (Further details
+ below).
+
+ c. Bootloader configuration.
+
+ Using Software Suspend also requires that you add an extra parameter to
+ your lilo.conf or equivalent. Here's an example for a swap partition:
+
+ append="resume2=/dev/hda1"
+
+ This would tell Software Suspend that /dev/hda1 is a swap partition you
+ have. Software Suspend will use the swap signature of this partition as a
+ pointer to your data when you suspend. This means that (in this example)
+ /dev/hda1 doesn't need to be _the_ swap partition where all of your data
+ is actually stored. It just needs to be a swap partition that has a
+ valid signature.
+
+ You don't need to have a swap partition for this purpose. Software Suspend
+ can also use a swap file, but usage is a little more complex. Having made
+ your swap file, turn it on and do
+
+ cat /proc/software_suspend/header_locations
+
+ (this assumes you've already compiled your kernel with Software Suspend
+ support and booted it). The results of the cat command will tell you
+ what you need to put in lilo.conf:
+
+ For swap partitions like /dev/hda1, simply use resume2=/dev/hda1.
+ For swapfile `swapfile`, use resume2=/dev/hda2:0x242d@4096.
+
+ If the swapfile changes for any reason (it is moved to a different
+ location, it is deleted and recreated, or the filesystem is
+ defragmented) then you will have to check
+ /proc/software_suspend/header_locations for a new resume_block value.
+
+ Once you've compiled and installed the kernel, adjusted your lilo.conf
+ and rerun lilo, you should only need to reboot for the most basic part
+ of Software Suspend to be ready.
+
+ d. A suspend script.
+
+ Since the driver model in 2.6 kernels is still being developed, you may need
+ to do more, however. Users of Software Suspend usually start the process
+ via a script which prepares for the suspend, tells the kernel to do its
+ stuff and then restore things afterwards. This script might involve:
+
+ - Switching to a text console and back if X doesn't like the video card
+ status on resume.
+ - Running /sbin/hwclock [--directisa] to update the clock on resume
+ - Un/reloading PCMCIA support since it doesn't play well with suspend.
+
+ Note that you might not be able to unload some drivers if there are
+ processes using them. You might have to kill off processes that hold
+ devices open. Hint: if your X server accesses an USB mouse, doing a
+ 'chvt' to a text console releases the device and you can unload the
+ module.
+
+ Check out the latest script (available on Berlios).
+
+4. How do you use it?
+
+ Once your script is properly set up, you should just be able to start it
+ and everything should go like clockwork. Of course things aren't always
+ that easy out of the box.
+
+ Check out (in the kernel source tree) include/linux/suspend-debug for
+ settings you can use to get detailed information about what suspend is doing.
+ /proc/sys/kernel/swsusp and the kernel parameters suspend_act, suspend_dbg
+ and suspend_lvl allow you to set the action and debugging parameters prior
+ to starting a suspend and/or at the lilo prompt before resuming. There is
+ also a nice little program that should be available from Berlios which
+ makes it easier to turn these debugging settings on and off. Note that to
+ get any debugging output, you need to enable it when compiling the kernel.
+
+ A neat feature of Software Suspend is that you can press Escape at any time
+ during suspending, and the process will be aborted.
+
+ Due to the way suspend works, this means you'll have your system back and
+ perfectly usable almost instantly. The only exception is when it's at
+ the very end of writing the image. Then it will need to reload a small
+ (usually 4-50MBs) portion first.
+
+ If you run into problems with resuming, adding the "noresume2" option to
+ the kernel command line will let you skip the resume step and
+ (hopefully) recover your system.
+
+5. What do all those entries in /proc/software_suspend do?
+
+ /proc/software_suspend is the directory which contains files you can use to
+ tune and configure Software Suspend to your liking. The exact contents of
+ the directory will depend upon the version of Software Suspend you're
+ running, the options you selected at compile time and which modules you have
+ inserted at the moment (where appropriate). In the following
+ descriptions, names in brackets refer to compile time options and modules
+ that control whether the file exists. (Note that they're all dependant upon
+ you having selected CONFIG_SOFTWARE_SUSPEND2 in the first place!)
+
+ Since the values of these settings can open potential security risks, they
+ are usually accessible only to the root user. You can, however, enable a
+ compile time option which makes all of these files world-accessible. This
+ should only be done if you trust everyone with shell access to this
+ computer!
+
+ - activate:
+
+ When anything is written to this file suspend will be activated and suspend
+ the system. The value is completely ignored. It is just the fact that you
+ write to the file that initiates the suspend.
+
+ When accessed from an initrd, the software instead checks whether it needs
+ to initiate a resume. If it doesn't, the echo returns almost immediately.
+ If a resume is needed, the echo never returns.
+
+ - async_io_limit: (module suspend_block_io)
+
+ This value is the limit on the number of pages Software Suspend will submit
+ for reading or writing at once. The ideal value depends upon the speed of
+ your hard disks, but the default (and maximum) of 256 should be fine.
+
+ - debug_info:
+
+ This file returns information about your configuration that may be helpful
+ in diagnosing problems with suspending.
+
+ - debug_sections (CONFIG_SOFTWARE_SUSPEND_DEBUG, module suspend_core):
+
+ This value, together with the console log level, controls what debugging
+ information is displayed. The console log level determines the level of
+ detail, and this value determines what detail is displayed. This value is
+ a bit vector, and the meaning of the bits can be found in the kernel tree
+ in include/linux/suspend-debug.h. It can be over-ridden using the kernel's
+ command line option suspend_dbg.
+
+ - default_console_level (CONFIG_SOFTWARE_SUSPEND_DEBUG, module suspend_core):
+
+ This determines the value of the console log level at the start of a
+ suspend cycle. If debugging is compiled in, the console log level can be
+ changed during a cycle by pressing the digit keys. Meanings are:
+
+ 0: Nice display.
+ 1: Nice display plus numerical progress.
+ 2: Errors only.
+ 3: Low level debugging info.
+ 4: Medium level debugging info.
+ 5: High level debugging info.
+ 6: Verbose debugging info.
+
+ This value can be over-ridden using the kernel command line option
+ suspend_lvl.
+
+ - disable_gzip_compression (CONFIG_SOFTWARE_SUSPEND_GZIP_COMPRESSION,
+ module suspend_gzip):
+
+ If gzip compression support is compiled in, this option can be used to
+ disable this plugin.
+
+ - disable_lzf_compression (CONFIG_SOFTWARE_SUSPEND_LZF_COMPRESSION,
+ modules suspend_lzf):
+
+ If lzf compression support is compiled in, this option can be used to
+ disable this plugin.
+
+ - enable_escape (module suspend_core):
+
+ Setting this to "1" will enable you abort a suspend by
+ pressing escape, "0" (default) disables this feature. Note that enabling
+ this option means that you cannot initiate a suspend and then walk away
+ from your computer, expecting it to be secure. With feature disabled,
+ you can validly have this expectation once Suspend begins to write the
+ image to disk. (Prior to this point, it is possible that Suspend might
+ about because of failure to freeze all processes or because constraints
+ on its ability to save the image are not met).
+
+ - expected_gzip_compression (CONFIG_SOFTWARE_SUSPEND_GZIP_COMPRESSION,
+ module suspend_gzip):
+ - expected_lzf_compression (CONFIG_SOFTWARE_SUSPEND_LZF_COMPRESSION,
+ module suspend_lzf):
+
+ These values allow you to set an expected compression ratio, which Software
+ Suspend will use in calculating whether it meets constraints on the image
+ size. If this expected compression ratio is not attained, the suspend will
+ abort, so it is wise to allow some spare. You can see what compression
+ ratio is achieved in the logs after suspending.
+
+ Note that the values are cumulative. If you compile in both gzip and lzf
+ compression, have both enabled, and set both expected compression ratios
+ to 20, Suspend will expect that the storage required will be at most
+ .8 * .8 = 64% of the number of pages to be written.
+
+ - header_locations:
+
+ This option tells you the resume= options to use for swap devices you
+ currently have activated. It is particularly useful when you only want to
+ use a swap file to store your image. See above for further details.
+
+ - image_size_limit:
+
+ The maximum size of suspend image written to disk, measured in megabytes
+ (1024*1024).
+
+ - interface_version:
+
+ The value returned by this file can be used by scripts and configuration
+ tools to determine what entries should be looked for. The value is
+ incremented whenever an entry in /proc/software_suspend is obsoleted or
+ added.
+
+ - last_result:
+
+ The result of the last suspend, as defined in
+ include/linux/suspend-debug.h with the values SUSPEND_ABORTED to
+ SUSPEND_KEPT_IMAGE. This is a bitmask.
+
+ - log_everything (CONFIG_SOFTWARE_SUSPEND_DEBUG):
+
+ Setting this option results in all messages printed being logged. Normally,
+ only a subset are logged, so as to not slow the process and not clutter the
+ logs. Useful for debugging. It can be toggled during a cycle by pressing
+ 'L'.
+
+ - no_output:
+
+ Setting this to "1" disables all output from suspend. It may be useful if a
+ distribution wants to implement a static display while suspending.
+
+ - pause_between_steps (CONFIG_SOFTWARE_SUSPEND_DEBUG):
+
+ This option is used during debugging, to make Software Suspend pause between
+ each step of the process. It is ignored when the nice display is on.
+
+ - progressbar_granularity_limit (CONFIG_FBCON_SPLASHSCREEN):
+
+ This option can be used to limit the granularity of the progress bar
+ displayed with a bootsplash screen. The value is the maximum number of
+ steps. That is, 10 will make the progress bar jump in 10% increments.
+
+ - reboot (CONFIG_SOFTWARE_SUSPEND_DEBUG):
+
+ This option causes Software Suspend to reboot rather than powering down
+ at the end of saving an image. It can be toggled during a cycle by pressing
+ 'R'.
+
+ - slow:
+
+ This option inserts a couple of one+ second delays in the code. It should
+ not be needed, and may disappear in a future version.
+
+ - swapfile:
+
+ This entry is used to specify the swapfile or partition that
+ Software Suspend will attempt to swapon/swapoff automatically. Thus, if
+ I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically
+ for my suspend image, I would
+
+ echo /dev/hda2 > /proc/software_suspend/swapfile
+
+ /dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the
+ swapon and swapoff occur while other processes are frozen (including kswapd)
+ so this swap file will not be used up when attempting to free memory. The
+ parition/file is also given the highest priority, so other swapfiles/partitions
+ will only be used to save the image when this one is filled.
+
+ The value of this file is used by header_locations along with any currently
+ activated swapfiles/partitions.
+
+ - version:
+
+ The version of suspend you have compiled into the currently running kernel.
+
+6. How do you get support?
+
+ Glad you asked. Software Suspend is being actively maintained and supported,
+ both by Nigel (the guy doing most of the coding at the moment) and its
+ users. You can find the mailing list via the Sourceforge project page.
+
+7. I think I've found a bug. What should I do?
+
+ If you're seeing Software Suspend hang at some point, and especially if
+ lights are flashing on your keyboard, you should compile in debugging
+ support and try...
+
+ echo 1 > /proc/software_suspend/debug_sections
+ echo 3 > /proc/software_suspend/default_console_level
+ echo > /proc/software_suspend/activate
+
+ You should then see low level debugging information and eventually an
+ oops.
+
+ Good information on how to provide us with useful information from an
+ oops is found in the file REPORTING-BUGS, in the top level directory
+ of the kernel tree. If you get an oops, please especially note the
+ information about running what is printed on the screen through ksymoops.
+ The raw information is useless.
+
+ You might also read the FAQ and HOWTO on the web site for known issues,
+ and subscribe to the mailing list.
+
+ Beginning with 1.1rc10, you should include the contents of
+ /proc/software_suspend/debug_info in your report. Prior to this version,
+ similar information is written to /var/log/messages at the end of a
+ successful resume and should be sent. It is also a good idea to check
+ /var/log/messages for relevant information as well. Information from the
+ unloading and reloading of drivers and modules prior to and after
+ suspending is sometimes helpful.
+
+8. When will XXX be supported?
+
+ Software Suspend currently lacks support for x86-64..
+
+ Patches for the other items (and anything that's been missed) are welcome.
+ Please send to the list.
+
+ Because Nigel's main task is definitely not Software Suspend and he doesn't
+ have the hardware, he will be unlikely to develop support for any of these
+ in the near future. His development work to date has been driven by the
+ desire to be a user of a more feature complete Software Suspend.
+
+9. How does it work?
+
+ Software Suspend does its work in a number of steps.
+
+ a. Freezing system activity.
+
+ The first main stage in suspending is to stop all other activity. This is
+ achieved in stages. First, we stop tasks from submitting new I/O using hooks
+ in the system calls for reading, writing and at a number of other places as
+ well as at the kernel threads that start I/O. If any tasks are syncing,
+ we wait for them to complete. We then do our own sync, just in case no
+ syncs were running. Next, we stop all the others tasks. Some are signalled
+ and put in a 'refrigerator'. Others are simply not scheduled again until we
+ decide to wake them up.
+
+ b. Eating memory.
+
+ For a successful suspend, you need to have enough disk space to store the
+ image and enough memory for the various limitations of Software Suspend's
+ algorithm. You can also specify a maximum image size. In order to attain
+ to those constraints, Software Suspend may 'eat' memory. If, after freezing
+ processes, the constraints aren't met, Software Suspend will thaw all the
+ other processes and begin to eat memory until its calculations indicate
+ the constraints are met. It will then freeze processes again and recheck
+ its calculations.
+
+ c. Suspending drivers and storing processor context.
+
+ Software Suspend then calls the power management functions to notify
+ drivers of the suspend, and saves the processor state.
+
+ d. Storage of meta data and image.
+
+ Next, Software Suspend allocates the swap pages that will be used to save
+ the image and stores their locations, along with the locations of the pages
+ to be saved in what we call pagesets or pagedirs. Software Suspend stores
+ data in two pagesets. Pageset 2 contains pages on the active and inactive
+ lists; essentially the page cache. Pageset 1 contains all other pages,
+ including the kernel. We use two pagesets for one important reason: We
+ need to make an atomic copy of the kernel to ensure consistency of the
+ image. Without a second pagedir, that would limit us to an image that was
+ at most half the amount of memory available. Using two pagesets allows us
+ to store a full image. Since pageset 2 pages won't be needed in saving
+ pageset 1, we first save pageset 2 pages. We can then make our atomic copy
+ of the remaining pages using both pageset 2 pages and any other pages that
+ are free. While saving both pagesets, we are careful not to corrupt the
+ image. We immediately shoot down pages that are added to the page cache,
+ and we allocate a special memory pool of extra pages that can be used by
+ during suspending. All of the pages in this pool are saved along with the
+ rest of the pageset 1 pages, even if they're not used. This saves us having
+ to worry about the image becoming inconsistent while we're saving it.
+
+ e. Save a second copy of the pagedirs.
+
+ To reload pagedir 1 at resume time, we need to know where the data is
+ stored. This requires the saving of a second copy of the pagedirs.
+
+ f. Save the suspend header.
+
+ Nearly there! We save our settings and other parameters needed for
+ reloading pagedir 1 in a 'suspend header' this is a single swap page.
+
+ g. Set the swap header.
+
+ Finally, we edit the swap header for our resume= swap file/partition. The
+ swap signature is changed to record what kind of header it originally was
+ (swapspace 1 or 2) and the bdev and first block and block size details of
+ the suspend header.
+
+ h. Power down.
+
+ Or reboot if we're debugging and the appropriate option is selected.
+
+ Whew!
+
+ Reloading the image.
+ --------------------
+
+ Reloading the image is essentially the reverse of all the above. We load
+ our copy of pagedir 1, being careful to choose locations that aren't going
+ to be overwritten as we copy it back (We start very early in the boot
+ process, so there are no other processes to quiesce here). We then copy
+ pagedir 1 back to its original location in memory and restore the process
+ context. We are now running with the original kernel. Next, we reload the
+ pageset 2 pages, free the memory and swap used by Software Suspend, restore
+ the pagedir header and restart processes. Sounds easy in comparison to
+ suspending, doesn't it!
+
+ There is of course more to Software Suspend than this, but this explanation
+ should be a good start. If there's interest, I'll write further
+ documentation on range pages and the low level I/O.
+
+10. Who wrote Software Suspend?
+
+ (Answer based on the writings of Florent Chabaud, credits in files and
+ Nigel's limited knowledge; apologies to anyone missed out!)
+
+ The main developers of Software Suspend have been...
+
+ Gabor Kuti
+ Pavel Machek
+ Florent Chabaud
+ Nigel Cunningham
+
+ They have been aided in their efforts by a host of hundreds, if not thousands
+ of testers and people who have submitted bug fixes & suggestions. Of special
+ note are the efforts of Michael Frank, who had his computers repetitively
+ suspend and resume for literally tens of thousands of cycles and developed
+ scripts to stress the system and test Software Suspend far beyond the point
+ most of us (Nigel included!) would consider testing. His efforts have
+ contributed as much to Software Suspend as any of the names above.


2004-11-24 16:02:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.


* Nigel Cunningham <[email protected]> wrote:

> New exports for suspend. I've cut them down some as a result of the
> last review, but could perhaps do more? Would people prefer to see a
> single struct wrapping exported functions?

> --- 400-exports-old/kernel/sched.c 2004-11-06 09:23:53.364977120 +1100
> +++ 400-exports-new/kernel/sched.c 2004-11-06 09:23:56.627481144 +1100
> @@ -3798,6 +3798,7 @@
>
> read_unlock(&tasklist_lock);
> }
> +EXPORT_SYMBOL(show_state);

this one is ok i think, but make it EXPORT_SYMBOL_GPL() please.

Ingo

2004-11-24 13:16:24

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 20/51: Timer freezer (experimental).

Here's experimental support for freezing timers. It doesn't really
freeze them, but rather reschedules the call for a little later. One
issue I need to think about is staggering the invocations at resume
time, as some of the per CPU timer code seeks to do.

There is debugging code in here so that when it's used (it's off by
default at the moment), and a timer that needs to be no_freeze holds up
I/O, the user can find out what it is. Other code allows the timer
freezer to be disabled on the fly (so you don't have to reboot because
of this).

(This could replace the patches above for MCE checking and slab
reaping).

diff -ruN 550-timer-freezer-old/drivers/block/ll_rw_blk.c
550-timer-freezer-new/drivers/block/ll_rw_blk.c
--- 550-timer-freezer-old/drivers/block/ll_rw_blk.c 2004-11-24
17:55:30.776567832 +1100
+++ 550-timer-freezer-new/drivers/block/ll_rw_blk.c 2004-11-24
17:23:36.145077968 +1100
@@ -254,6 +254,7 @@

q->unplug_timer.function = blk_unplug_timeout;
q->unplug_timer.data = (unsigned long)q;
+ q->unplug_timer.no_freeze = 1;

/*
* by default assume old behaviour and bounce for any highmem page
diff -ruN 550-timer-freezer-old/drivers/input/serio/i8042.c
550-timer-freezer-new/drivers/input/serio/i8042.c
--- 550-timer-freezer-old/drivers/input/serio/i8042.c 2004-11-24
09:52:58.000000000 +1100
+++ 550-timer-freezer-new/drivers/input/serio/i8042.c 2004-11-24
17:23:36.154076600 +1100
@@ -1039,6 +1039,7 @@
dbg_init();

init_timer(&i8042_timer);
+ i8042_timer.no_freeze = 1;
i8042_timer.function = i8042_timer_func;

if (i8042_platform_init())
diff -ruN 550-timer-freezer-old/include/linux/timer.h
550-timer-freezer-new/include/linux/timer.h
--- 550-timer-freezer-old/include/linux/timer.h 2004-11-03
21:54:17.000000000 +1100
+++ 550-timer-freezer-new/include/linux/timer.h 2004-11-24
17:23:36.169074320 +1100
@@ -19,6 +19,8 @@
unsigned long data;

struct tvec_t_base_s *base;
+
+ int no_freeze;
};

#define TIMER_MAGIC 0x4b87ad6e
diff -ruN 550-timer-freezer-old/kernel/timer.c
550-timer-freezer-new/kernel/timer.c
--- 550-timer-freezer-old/kernel/timer.c 2004-11-24 17:55:30.863554608
+1100
+++ 550-timer-freezer-new/kernel/timer.c 2004-11-24 17:55:22.021898744
+1100
@@ -31,6 +31,7 @@
#include <linux/time.h>
#include <linux/jiffies.h>
#include <linux/cpu.h>
+#include <linux/suspend.h>
#include <linux/syscalls.h>

#include <asm/uaccess.h>
@@ -429,6 +430,9 @@
*/
#define INDEX(N) (base->timer_jiffies >> (TVR_BITS + N * TVN_BITS)) &
TVN_MASK

+#define FN_CACHE_SIZE 15
+static void * recent_fns[FN_CACHE_SIZE];
+
static inline void __run_timers(tvec_base_t *base)
{
struct timer_list *timer;
@@ -463,7 +467,23 @@
smp_wmb();
timer->base = NULL;
spin_unlock_irq(&base->lock);
- fn(data);
+ if (unlikely(test_suspend_state(SUSPEND_TIMER_FREEZER_ON) &&
(!timer->no_freeze))) {
+ int shown = 0, i, copy_start = 0;
+ for (i = 0; i < FN_CACHE_SIZE; i++)
+ if (fn == recent_fns[i]) {
+ shown = i + 1;
+ copy_start = i;
+ break;
+ }
+ for (i = copy_start; i < (FN_CACHE_SIZE - 1); i++)
+ recent_fns[i] = recent_fns[i+1];
+ recent_fns[FN_CACHE_SIZE - 1] = fn;
+ if (!shown) {
+ printk("Timed call of %p delayed while freezer on.\n", fn);
+ }
+ mod_timer(timer, jiffies + HZ);
+ } else
+ fn(data);
spin_lock_irq(&base->lock);
goto repeat;
}
diff -ruN 550-timer-freezer-old/net/sched/sch_generic.c
550-timer-freezer-new/net/sched/sch_generic.c
--- 550-timer-freezer-old/net/sched/sch_generic.c 2004-11-24
09:53:13.000000000 +1100
+++ 550-timer-freezer-new/net/sched/sch_generic.c 2004-11-24
17:23:36.189071280 +1100
@@ -210,6 +210,7 @@
init_timer(&dev->watchdog_timer);
dev->watchdog_timer.data = (unsigned long)dev;
dev->watchdog_timer.function = dev_watchdog;
+ dev->watchdog_timer.no_freeze = 1;
}

void __netdev_watchdog_up(struct net_device *dev)


2004-11-24 13:10:55

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 16/51: Disable cache reaping during suspend.

I have to admit to being a little unsure as to why this is needed, but
suspend's reliability is helped a lot by disabling cache reaping while
suspending. Perhaps one of the mm guys will be able to enlighten me
here. Might be SMP related.

diff -ruN 505-disable-cache-reaping-during-suspend-old/mm/slab.c 505-disable-cache-reaping-during-suspend-new/mm/slab.c
--- 505-disable-cache-reaping-during-suspend-old/mm/slab.c 2004-11-03 21:55:05.000000000 +1100
+++ 505-disable-cache-reaping-during-suspend-new/mm/slab.c 2004-11-06 09:25:01.972547184 +1100
@@ -92,6 +92,7 @@
#include <linux/sysctl.h>
#include <linux/module.h>
#include <linux/rcupdate.h>
+#include <linux/suspend.h>

#include <asm/uaccess.h>
#include <asm/cacheflush.h>
@@ -2730,7 +2731,9 @@
{
struct list_head *walk;

- if (down_trylock(&cache_chain_sem)) {
+ if ((unlikely(test_suspend_state(SUSPEND_RUNNING))) ||
+ (down_trylock(&cache_chain_sem)))
+ {
/* Give up. Setup the next iteration. */
schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC + smp_processor_id());
return;


2004-11-24 16:18:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 8/51: /proc/acpi/sleep hook.

On Wed, Nov 24, 2004 at 11:57:25PM +1100, Nigel Cunningham wrote:
> Same thing as the previous patch, but for /proc/acpi/sleep.

And again totally bogus. Make sure swsusp and swsusp2 export the same
interface. Preferably the old one, but if it absolutely doesn't fit
your needs submit a patch to switch the old code to the new interface
first.

2004-11-24 13:07:48

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend2 merge: 1/51: Device trees

This patch allows the device tree to be split up into multiple trees. I
don't really expect it to be merged, but it is an important part of
suspend at the moment, and I certainly want to see something like it
that will allow us to suspend some parts of the device tree and not
others. Suspend2 uses it to keep alive the hard drive (or equivalent)
that we're writing the image to while suspending other devices, thus
improving the consistency of the image written.

I remember from last time this was posted that someone commented on
exporting the default device tree; I haven't changed that yet.

diff -ruN 205-device-pm-trees-old/drivers/base/power/main.c 205-device-pm-trees-new/drivers/base/power/main.c
--- 205-device-pm-trees-old/drivers/base/power/main.c 2004-11-24 09:52:56.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/power/main.c 2004-11-24 19:48:29.133671960 +1100
@@ -4,6 +4,9 @@
* Copyright (c) 2003 Patrick Mochel
* Copyright (c) 2003 Open Source Development Lab
*
+ * Partial tree additions
+ * Copyright (c) 2004 Nigel Cunningham
+ *
* This file is released under the GPLv2
*
*
@@ -23,10 +26,18 @@
#include <linux/device.h>
#include "power.h"

-LIST_HEAD(dpm_active);
-LIST_HEAD(dpm_off);
-LIST_HEAD(dpm_off_irq);
-
+struct partial_device_tree default_device_tree =
+{
+ .dpm_active = LIST_HEAD_INIT(default_device_tree.dpm_active),
+ .dpm_off = LIST_HEAD_INIT(default_device_tree.dpm_off),
+ .dpm_off_irq = LIST_HEAD_INIT(default_device_tree.dpm_off_irq),
+};
+EXPORT_SYMBOL(default_device_tree);
+
+/*
+ * One mutex for all trees because we can be moving items
+ * between trees.
+ */
DECLARE_MUTEX(dpm_sem);
DECLARE_MUTEX(dpm_list_sem);

@@ -77,7 +88,9 @@
dev->bus ? dev->bus->name : "No Bus", dev->kobj.name);
atomic_set(&dev->power.pm_users, 0);
down(&dpm_list_sem);
- list_add_tail(&dev->power.entry, &dpm_active);
+ list_add_tail(&dev->power.entry, &default_device_tree.dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
+ dev->tree = &default_device_tree;
device_pm_set_parent(dev, dev->parent);
if ((error = dpm_sysfs_add(dev)))
list_del(&dev->power.entry);
@@ -93,6 +106,7 @@
dpm_sysfs_remove(dev);
device_pm_release(dev->power.pm_parent);
list_del_init(&dev->power.entry);
+ dev->current_list = DEVICE_LIST_NONE;
up(&dpm_list_sem);
}

diff -ruN 205-device-pm-trees-old/drivers/base/power/Makefile 205-device-pm-trees-new/drivers/base/power/Makefile
--- 205-device-pm-trees-old/drivers/base/power/Makefile 2004-11-03 21:51:55.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/power/Makefile 2004-11-24 19:48:29.134671808 +1100
@@ -1,5 +1,5 @@
obj-y := shutdown.o
-obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o
+obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o tree.o

ifeq ($(CONFIG_DEBUG_DRIVER),y)
EXTRA_CFLAGS += -DDEBUG
diff -ruN 205-device-pm-trees-old/drivers/base/power/power.h 205-device-pm-trees-new/drivers/base/power/power.h
--- 205-device-pm-trees-old/drivers/base/power/power.h 2004-11-24 09:52:56.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/power/power.h 2004-11-24 19:48:29.135671656 +1100
@@ -35,10 +35,22 @@
/*
* The PM lists.
*/
-extern struct list_head dpm_active;
-extern struct list_head dpm_off;
-extern struct list_head dpm_off_irq;

+struct partial_device_tree
+{
+ struct list_head dpm_active;
+ struct list_head dpm_off;
+ struct list_head dpm_off_irq;
+};
+
+enum {
+ DEVICE_LIST_NONE,
+ DEVICE_LIST_DPM_ACTIVE,
+ DEVICE_LIST_DPM_OFF,
+ DEVICE_LIST_DPM_OFF_IRQ,
+};
+
+extern struct partial_device_tree default_device_tree;

static inline struct dev_pm_info * to_pm_info(struct list_head * entry)
{
@@ -64,7 +76,9 @@
* resume.c
*/

+extern void dpm_resume_tree(struct partial_device_tree * tree);
extern void dpm_resume(void);
+extern void dpm_power_up_tree(struct partial_device_tree * tree);
extern void dpm_power_up(void);
extern int resume_device(struct device *);

diff -ruN 205-device-pm-trees-old/drivers/base/power/resume.c 205-device-pm-trees-new/drivers/base/power/resume.c
--- 205-device-pm-trees-old/drivers/base/power/resume.c 2004-11-24 09:52:56.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/power/resume.c 2004-11-24 19:48:29.136671504 +1100
@@ -29,16 +29,17 @@



-void dpm_resume(void)
+void dpm_resume_tree(struct partial_device_tree * tree)
{
down(&dpm_list_sem);
- while(!list_empty(&dpm_off)) {
- struct list_head * entry = dpm_off.next;
+ while(!list_empty(&tree->dpm_off)) {
+ struct list_head * entry = tree->dpm_off.next;
struct device * dev = to_device(entry);

get_device(dev);
list_del_init(entry);
- list_add_tail(entry, &dpm_active);
+ list_add_tail(entry, &tree->dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;

up(&dpm_list_sem);
if (!dev->power.prev_state)
@@ -50,6 +51,11 @@
}


+void dpm_resume(void)
+{
+ dpm_resume_tree(&default_device_tree);
+}
+
/**
* device_resume - Restore state of each device in system.
*
@@ -66,6 +72,14 @@

EXPORT_SYMBOL_GPL(device_resume);

+void device_resume_tree(struct partial_device_tree * tree)
+{
+ down(&dpm_sem);
+ dpm_resume_tree(tree);
+ up(&dpm_sem);
+}
+
+EXPORT_SYMBOL(device_resume_tree);

/**
* device_power_up_irq - Power on some devices.
@@ -78,20 +92,27 @@
* Interrupts must be disabled when calling this.
*/

-void dpm_power_up(void)
+void dpm_power_up_tree(struct partial_device_tree * tree)
{
- while(!list_empty(&dpm_off_irq)) {
- struct list_head * entry = dpm_off_irq.next;
+ while(!list_empty(&tree->dpm_off_irq)) {
+ struct list_head * entry = tree->dpm_off_irq.next;
struct device * dev = to_device(entry);

get_device(dev);
list_del_init(entry);
- list_add_tail(entry, &dpm_active);
+ list_add_tail(entry, &tree->dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
resume_device(dev);
put_device(dev);
}
}
+EXPORT_SYMBOL(dpm_power_up_tree);
+

+void dpm_power_up(void)
+{
+ dpm_power_up_tree(&default_device_tree);
+}

/**
* device_pm_power_up - Turn on all devices that need special attention.
diff -ruN 205-device-pm-trees-old/drivers/base/power/shutdown.c 205-device-pm-trees-new/drivers/base/power/shutdown.c
--- 205-device-pm-trees-old/drivers/base/power/shutdown.c 2004-11-03 21:54:14.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/power/shutdown.c 2004-11-24 19:48:29.137671352 +1100
@@ -65,3 +65,4 @@
sysdev_shutdown();
}

+EXPORT_SYMBOL(device_shutdown);
diff -ruN 205-device-pm-trees-old/drivers/base/power/suspend.c 205-device-pm-trees-new/drivers/base/power/suspend.c
--- 205-device-pm-trees-old/drivers/base/power/suspend.c 2004-11-24 09:52:56.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/power/suspend.c 2004-11-24 19:51:15.776338424 +1100
@@ -51,7 +51,7 @@


/**
- * device_suspend - Save state and stop all devices in system.
+ * device_suspend_tree - Save state and stop all devices in system.
* @state: Power state to put each device in.
*
* Walk the dpm_active list, call ->suspend() for each device, and move
@@ -60,19 +60,19 @@
* the device to the dpm_off list. If it returns -EAGAIN, we move it to
* the dpm_off_irq list. If we get a different error, try and back out.
*
- * If we hit a failure with any of the devices, call device_resume()
+ * If we hit a failure with any of the devices, call device_resume_tree()
* above to bring the suspended devices back to life.
*
*/

-int device_suspend(u32 state)
+int device_suspend_tree(u32 state, struct partial_device_tree * tree)
{
int error = 0;

down(&dpm_sem);
down(&dpm_list_sem);
- while (!list_empty(&dpm_active) && error == 0) {
- struct list_head * entry = dpm_active.prev;
+ while (!list_empty(&tree->dpm_active) && error == 0) {
+ struct list_head * entry = tree->dpm_active.prev;
struct device * dev = to_device(entry);

get_device(dev);
@@ -87,10 +87,12 @@
/* Move it to the dpm_off or dpm_off_irq list */
if (!error) {
list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off);
+ list_add(&dev->power.entry, &tree->dpm_off);
+ dev->current_list = DEVICE_LIST_DPM_OFF;
} else if (error == -EAGAIN) {
list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off_irq);
+ list_add(&dev->power.entry, &tree->dpm_off_irq);
+ dev->current_list = DEVICE_LIST_DPM_OFF_IRQ;
error = 0;
}
}
@@ -101,11 +103,17 @@
}
up(&dpm_list_sem);
if (error)
- dpm_resume();
+ dpm_resume_tree(tree);
up(&dpm_sem);
return error;
}

+EXPORT_SYMBOL(device_suspend_tree);
+
+int device_suspend(u32 state)
+{
+ return device_suspend_tree(state, &default_device_tree);
+}
EXPORT_SYMBOL_GPL(device_suspend);


@@ -118,25 +126,28 @@
* done, power down system devices.
*/

-int device_power_down(u32 state)
+int device_power_down_tree(u32 state, struct partial_device_tree * tree)
{
int error = 0;
struct device * dev;

- list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
+ list_for_each_entry_reverse(dev, &tree->dpm_off_irq, power.entry) {
if ((error = suspend_device(dev, state)))
break;
}
if (error)
- goto Error;
- if ((error = sysdev_suspend(state)))
- goto Error;
- Done:
+ dpm_power_up();
return error;
- Error:
- dpm_power_up();
- goto Done;
}

-EXPORT_SYMBOL_GPL(device_power_down);
+EXPORT_SYMBOL_GPL(device_power_down_tree);

+int device_power_down(u32 state)
+{
+ int error;
+
+ if (!(error = device_power_down_tree(state, &default_device_tree)))
+ error = sysdev_suspend(state);
+ return error;
+}
+EXPORT_SYMBOL(device_power_down);
diff -ruN 205-device-pm-trees-old/drivers/base/power/tree.c 205-device-pm-trees-new/drivers/base/power/tree.c
--- 205-device-pm-trees-old/drivers/base/power/tree.c 1970-01-01 10:00:00.000000000 +1000
+++ 205-device-pm-trees-new/drivers/base/power/tree.c 2004-11-24 19:48:29.139671048 +1100
@@ -0,0 +1,105 @@
+/*
+ * suspend.c - Functions for moving devices between trees.
+ *
+ * Copyright (c) 2004 Nigel Cunningham
+ *
+ * This file is released under the GPLv2
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include "power.h"
+
+/*
+ * device_merge_tree - Move an entire tree into another tree
+ * @source: The tree to be moved
+ * @dest : The destination tree
+ */
+
+void device_merge_tree( struct partial_device_tree * source,
+ struct partial_device_tree * dest)
+{
+ down(&dpm_sem);
+ list_splice_init(&source->dpm_active, &dest->dpm_active);
+ list_splice_init(&source->dpm_off, &dest->dpm_off);
+ list_splice_init(&source->dpm_off_irq, &dest->dpm_off_irq);
+ up(&dpm_sem);
+}
+EXPORT_SYMBOL(device_merge_tree);
+
+/*
+ * device_switch_trees - Move a device and its ancestors to a new tree
+ * @dev: The lowest device to be moved.
+ * @tree: The destination tree.
+ *
+ * Note that siblings can be left in the original tree. This is because
+ * we want to be able to keep part of a tree in one state while part is
+ * in another.
+ *
+ * Since we iterate all the way back to the top, and may move entries
+ * already in the destination tree, we will never violate the depth
+ * first property of the destination tree.
+ */
+
+void device_switch_trees(struct device * dev, struct partial_device_tree * tree)
+{
+ down(&dpm_sem);
+ while (dev) {
+ list_del(&dev->power.entry);
+ switch (dev->current_list) {
+ case DEVICE_LIST_DPM_ACTIVE:
+ list_add(&dev->power.entry, &tree->dpm_active);
+ break;
+ case DEVICE_LIST_DPM_OFF:
+ list_add(&dev->power.entry, &tree->dpm_off);
+ break;
+ case DEVICE_LIST_DPM_OFF_IRQ:
+ list_add(&dev->power.entry, &tree->dpm_off_irq);
+ break;
+ }
+
+ dev = dev->parent;
+ }
+ up(&dpm_sem);
+}
+
+EXPORT_SYMBOL(device_switch_trees);
+
+/*
+ * create_device_tree - Create a new device tree
+ */
+
+struct partial_device_tree * device_create_tree(void)
+{
+ struct partial_device_tree * new_tree;
+
+ new_tree = (struct partial_device_tree *)
+ kmalloc(sizeof(struct partial_device_tree), GFP_ATOMIC);
+
+ if (!IS_ERR(new_tree)) {
+ INIT_LIST_HEAD(&new_tree->dpm_active);
+ INIT_LIST_HEAD(&new_tree->dpm_off);
+ INIT_LIST_HEAD(&new_tree->dpm_off_irq);
+ }
+
+ return new_tree;
+}
+EXPORT_SYMBOL(device_create_tree);
+
+/*
+ * device_destroy_tree - Destroy a dynamically created tree
+ */
+
+void device_destroy_tree(struct partial_device_tree * tree)
+{
+ BUG_ON(tree == &default_device_tree);
+
+ BUG_ON(!list_empty(&tree->dpm_active));
+ BUG_ON(!list_empty(&tree->dpm_off));
+ BUG_ON(!list_empty(&tree->dpm_off_irq));
+
+ kfree(tree);
+}
+
+EXPORT_SYMBOL(device_destroy_tree);
diff -ruN 205-device-pm-trees-old/drivers/base/sys.c 205-device-pm-trees-new/drivers/base/sys.c
--- 205-device-pm-trees-old/drivers/base/sys.c 2004-11-24 09:52:56.000000000 +1100
+++ 205-device-pm-trees-new/drivers/base/sys.c 2004-11-24 19:48:29.140670896 +1100
@@ -337,7 +337,7 @@
}
return 0;
}
-
+EXPORT_SYMBOL(sysdev_suspend);

/**
* sysdev_resume - Bring system devices back to life.
@@ -384,6 +384,7 @@
}
return 0;
}
+EXPORT_SYMBOL(sysdev_resume);


int __init system_bus_init(void)
diff -ruN 205-device-pm-trees-old/include/linux/device.h 205-device-pm-trees-new/include/linux/device.h
--- 205-device-pm-trees-old/include/linux/device.h 2004-11-24 09:53:11.000000000 +1100
+++ 205-device-pm-trees-new/include/linux/device.h 2004-11-24 21:31:45.988606816 +1100
@@ -285,6 +285,11 @@
override */

void (*release)(struct device * dev);
+
+ struct partial_device_tree * tree; /* Which tree of devices this
+ device is in */
+ int current_list; /* Which list within the tree the
+ device is on (speeds moving) */
};

static inline struct device *
diff -ruN 205-device-pm-trees-old/include/linux/pm.h 205-device-pm-trees-new/include/linux/pm.h
--- 205-device-pm-trees-old/include/linux/pm.h 2004-11-24 09:53:11.000000000 +1100
+++ 205-device-pm-trees-new/include/linux/pm.h 2004-11-24 19:48:29.144670288 +1100
@@ -235,12 +235,25 @@

extern void device_pm_set_parent(struct device * dev, struct device * parent);

+struct partial_device_tree;
+extern struct partial_device_tree default_device_tree;
+
extern int device_suspend(u32 state);
+extern int device_suspend_tree(u32 state, struct partial_device_tree * tree);
extern int device_power_down(u32 state);
+extern int device_power_down_tree(u32 state, struct partial_device_tree * tree);
extern void device_power_up(void);
+extern void device_power_up_tree(struct partial_device_tree * tree);
extern void device_resume(void);
-
-
+extern void device_resume_tree(struct partial_device_tree * tree);
+extern void device_merge_tree( struct partial_device_tree * source,
+ struct partial_device_tree * dest);
+extern void device_switch_trees(struct device * dev, struct partial_device_tree * tree);
+extern void dpm_power_up_tree(struct partial_device_tree * tree);
+extern int sysdev_suspend(u32 state);
+extern int sysdev_resume(void);
+extern struct partial_device_tree * device_create_tree(void);
+extern void device_destroy_tree(struct partial_device_tree * tree);
#endif /* __KERNEL__ */

#endif /* _LINUX_PM_H */


2004-11-24 13:07:48

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend2 merge: 2/51: Find class by name

A second patch that shouldn't be needed but is at the moment. It is used
to find any framebuffer drivers and add put them in the kept-alive
device tree. Once Pavel's improvements to the PM_ states are merged, I
hope this will be able to go away.

Not sure if would be helpful elsewhere?

diff -ruN 207-find-class-by-name-old/drivers/base/class.c 207-find-class-by-name-new/drivers/base/class.c
--- 207-find-class-by-name-old/drivers/base/class.c 2004-11-24 09:52:56.000000000 +1100
+++ 207-find-class-by-name-new/drivers/base/class.c 2004-11-24 17:20:21.385685896 +1100
@@ -497,6 +497,25 @@
kobject_put(&class_dev->kobj);
}

+struct class * class_find(char * name)
+{
+ struct class * this_class;
+
+ if (!name)
+ return NULL;
+
+ down_read(&class_subsys.rwsem);
+ list_for_each_entry(this_class, &class_subsys.kset.list, subsys.kset.kobj.entry) {
+ if (!(strcmp(this_class->name, name))) {
+ class_get(this_class);
+ up_read(&class_subsys.rwsem);
+ return this_class;
+ }
+ }
+ up_read(&class_subsys.rwsem);
+
+ return NULL;
+}

int class_interface_register(struct class_interface *class_intf)
{
@@ -579,3 +598,5 @@

EXPORT_SYMBOL_GPL(class_interface_register);
EXPORT_SYMBOL_GPL(class_interface_unregister);
+
+EXPORT_SYMBOL_GPL(class_find);
diff -ruN 207-find-class-by-name-old/include/linux/device.h 207-find-class-by-name-new/include/linux/device.h
--- 207-find-class-by-name-old/include/linux/device.h 2004-11-24 17:20:41.583615344 +1100
+++ 207-find-class-by-name-new/include/linux/device.h 2004-11-24 17:19:56.510467504 +1100
@@ -164,6 +164,7 @@

extern struct class * class_get(struct class *);
extern void class_put(struct class *);
+extern struct class * class_find(char * name);


struct class_attribute {


2004-11-24 13:07:47

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 3/51: e820 table support

The first of the 'real' candidates for merging.

This adds support for setting and clearing the Nosave status of pages
based on the contents of the e820 table, and clearing Nosave for __init
pages when they're freed.

diff -ruN 208-e820-table-support-old/arch/i386/mm/init.c 208-e820-table-support-new/arch/i386/mm/init.c
--- 208-e820-table-support-old/arch/i386/mm/init.c 2004-11-03 21:54:38.000000000 +1100
+++ 208-e820-table-support-new/arch/i386/mm/init.c 2004-11-04 16:27:39.000000000 +1100
@@ -27,6 +27,7 @@
#include <linux/slab.h>
#include <linux/proc_fs.h>
#include <linux/efi.h>
+#include <linux/suspend.h>

#include <asm/processor.h>
#include <asm/system.h>
@@ -266,12 +267,15 @@
{
if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) {
ClearPageReserved(page);
+ ClearPageNosave(page);
set_bit(PG_highmem, &page->flags);
set_page_count(page, 1);
__free_page(page);
totalhigh_pages++;
- } else
+ } else {
SetPageReserved(page);
+ SetPageNosave(page);
+ }
}

#ifndef CONFIG_DISCONTIGMEM
@@ -349,7 +353,7 @@
#endif
}

-#if defined(CONFIG_PM_DISK) || defined(CONFIG_SOFTWARE_SUSPEND)
+#ifdef CONFIG_PM
/*
* Swap suspend & friends need this for resume because things like the intel-agp
* driver might have split up a kernel 4MB mapping.
@@ -569,6 +573,7 @@
int codesize, reservedpages, datasize, initsize;
int tmp;
int bad_ppro;
+ void * addr;

#ifndef CONFIG_DISCONTIGMEM
if (!mem_map)
@@ -599,12 +604,25 @@
totalram_pages += __free_all_bootmem();

reservedpages = 0;
- for (tmp = 0; tmp < max_low_pfn; tmp++)
- /*
- * Only count reserved RAM pages
- */
- if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp)))
- reservedpages++;
+ addr = __va(0);
+ for (tmp = 0; tmp < max_low_pfn; tmp++, addr += PAGE_SIZE) {
+ if (page_is_ram(tmp)) {
+ /*
+ * Only count reserved RAM pages
+ */
+ if (PageReserved(mem_map+tmp))
+ reservedpages++;
+ /*
+ * Mark nosave pages
+ */
+ if (addr >= (void *)&__nosave_begin && addr < (void *)&__nosave_end)
+ SetPageNosave(mem_map+tmp);
+ } else
+ /*
+ * Non-RAM pages are always nosave
+ */
+ SetPageNosave(mem_map+tmp);
+ }

set_highmem_pages_init(bad_ppro);

@@ -703,6 +721,7 @@
addr = (unsigned long)(&__init_begin);
for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
ClearPageReserved(virt_to_page(addr));
+ ClearPageNosave(virt_to_page(addr));
set_page_count(virt_to_page(addr), 1);
free_page(addr);
totalram_pages++;
@@ -717,6 +736,7 @@
printk (KERN_INFO "Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
for (; start < end; start += PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
+ ClearPageNosave(virt_to_page(start));
set_page_count(virt_to_page(start), 1);
free_page(start);
totalram_pages++;
diff -ruN 208-e820-table-support-old/mm/bootmem.c 208-e820-table-support-new/mm/bootmem.c
--- 208-e820-table-support-old/mm/bootmem.c 2004-11-03 21:53:00.000000000 +1100
+++ 208-e820-table-support-new/mm/bootmem.c 2004-11-04 16:27:39.000000000 +1100
@@ -279,11 +279,13 @@

count += BITS_PER_LONG;
__ClearPageReserved(page);
+ ClearPageNosave(page);
set_page_count(page, 1);
for (j = 1; j < BITS_PER_LONG; j++) {
if (j + 16 < BITS_PER_LONG)
prefetchw(page + j + 16);
__ClearPageReserved(page + j);
+ ClearPageNosave(page + j);
}
__free_pages(page, ffs(BITS_PER_LONG)-1);
i += BITS_PER_LONG;
@@ -294,6 +296,7 @@
if (v & m) {
count++;
__ClearPageReserved(page);
+ ClearPageNosave(page);
set_page_count(page, 1);
__free_page(page);
}
@@ -314,6 +317,7 @@
for (i = 0; i < ((bdata->node_low_pfn-(bdata->node_boot_start >> PAGE_SHIFT))/8 + PAGE_SIZE-1)/PAGE_SIZE; i++,page++) {
count++;
__ClearPageReserved(page);
+ ClearPageNosave(page);
set_page_count(page, 1);
__free_page(page);
}


2004-11-24 13:07:46

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 4/51: Get module list.

This provides access to the list of loaded modules for suspend's
debugging output. When a cycle finishes, suspend outputs something the
following:

> Please include the following information in bug reports:
> - SUSPEND core : 2.1.5.7
> - Kernel Version : 2.6.9
> - Compiler vers. : 3.3
> - Modules loaded : tuner bttv videodev snd_seq_oss snd_seq_midi_event
> snd_seq snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm
> snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device
> snd soundcore visor usbserial usblp joydev evdev usbmouse usbhid
> uhci_hcd usbcore ppp_deflate zlib_deflate zlib_inflate bsd_comp
> ipt_LOG ipt_state ipt_MASQUERADE iptable_nat ip_conntrack
> ipt_multiport ipt_REJECT iptable_filter ip_tables ppp_async
> ppp_generic slhc crc_ccitt video_buf v4l2_common btcx_risc Win4Lin
> mki_adapter radeon agpgart parport_pc lp parport sg ide_cd sr_mod
> cdrom floppy af_packet e1000 loop dm_mod tsdev suspend_bootsplash
> suspend_text suspend_swap suspend_block_io suspend_lzf suspend_core
> - Attempt number : 9
> - Parameters : 0 2304 32768 1 0 4096 5
> - Limits : 261680 pages RAM. Initial boot: 252677.
> - Overall expected compression percentage: 0.
> - LZF Compressor enabled.
> Compressed 922112000 bytes into 437892038 (52 percent compression).
> - Swapwriter active.
> Swap available for image: 294868 pages.
> - Debugging compiled in.
> - Preemptive kernel.
> - SMP kernel.
> - Highmem Support.
> - I/O speed: Write 72 MB/s, Read 119 MB/s.

Including the modules loaded is very helpful for debugging problems.


diff -ruN 209-get-module-list-old/kernel/module.c 209-get-module-list-new/kernel/module.c
--- 209-get-module-list-old/kernel/module.c 2004-11-24 17:21:28.892423312 +1100
+++ 209-get-module-list-new/kernel/module.c 2004-11-24 17:21:20.619680960 +1100
@@ -2105,6 +2105,18 @@
}
EXPORT_SYMBOL(module_remove_driver);

+int print_module_list_to_buffer(char * buffer, int size)
+{
+ struct module *mod;
+ int pos = 0;
+
+ list_for_each_entry(mod, &modules, list)
+ if (mod->name)
+ pos += snprintf(buffer+pos, size-pos-1,
+ "%s ", mod->name);
+ return pos;
+}
+
#ifdef CONFIG_MODVERSIONS
/* Generate the signature for struct module here, too, for modversions. */
void struct_module(struct module *mod) { return; }
@@ -2115,3 +2127,5 @@
struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */
#endif /* CONFIG_KDB */

+/* For Suspend2 */
+EXPORT_SYMBOL(print_module_list_to_buffer);


2004-11-24 16:38:09

by Roman Zippel

[permalink] [raw]
Subject: Re: Suspend 2 merge: 26/51: Kconfig and makefile.

Hi,

On Wed, 24 Nov 2004, Nigel Cunningham wrote:

> +menu "Software Suspend 2"
> +
> +config SOFTWARE_SUSPEND2_CORE
> + tristate "Software Suspend 2"
> + depends on PM
> + select SOFTWARE_SUSPEND2
> + ---help---
> + Software Suspend 2 is the 'new and improved' suspend support. You
> + can now build it as modules, but be aware that this requires
> + initrd support (the modules you use in saving the image have to
> + be loaded in order for you to be able to resume!)
> +
> + See the Software Suspend home page (softwaresuspend.berlios.de)
> + for FAQs, HOWTOs and other documentation.
> +
> + config SOFTWARE_SUSPEND2
> + bool
> +
> + if SOFTWARE_SUSPEND2
> + config SOFTWARE_SUSPEND2_WRITER
> + bool
> +

Please don't use such indentations.
There is no need to use to select here either. If you really want to make
it modular (and you can convince Christoph), you want to do something like
this:

config SOFTWARE_SUSPEND2
tristate "Software Suspend 2"
depends on PM

config SOFTWARE_SUSPEND2_BUILTIN
def_bool SOFTWARE_SUSPEND2

and let everything else depend on SOFTWARE_SUSPEND2.

> + config SOFTWARE_SUSPEND_SWAPWRITER
> + tristate ' Swap Writer'
> + depends on SWAP && SOFTWARE_SUSPEND2_CORE
> + select SOFTWARE_SUSPEND2_WRITER

This select is also bogus.

> +
> +ifeq ($(CONFIG_SOFTWARE_SUSPEND2),y)
> +obj-y += suspend_builtin.o proc.o
> +endif

Use SOFTWARE_SUSPEND2_BUILTIN here without the ifeq.

bye, Roman

2004-11-24 13:04:37

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 9/51: init/* changes.

This patch includes all of the changes to init/*.

There are two main parts:

1) Make name_to_dev_t non init. Why should you need to reboot if all you
want to do is change the device you're using to suspend? That's M$'s way
:>. Suspend2 allows the user to change the resume2= parameter after
booting via a proc entry. Making this code non-__init allows us to use
the same code that will be used at boot time to parse that string.

2) Hooks for resuming. Suspend2 functionality can be compiled as modules
or built in. Resuming can be activated via an initrd. These hooks allow
for all of the combinations of the above. Allowing resuming from within
an initrd is important because then you can set up LVM volumes
(including encrypted devices), compile drivers for your resume device as
modules and so on.

diff -ruN 302-init-hooks-old/init/do_mounts.c 302-init-hooks-new/init/do_mounts.c
--- 302-init-hooks-old/init/do_mounts.c 2004-11-24 19:47:36.680646032 +1100
+++ 302-init-hooks-new/init/do_mounts.c 2004-11-24 19:50:37.257194224 +1100
@@ -52,7 +52,7 @@
__setup("ro", readonly);
__setup("rw", readwrite);

-static dev_t __init try_name(char *name, int part)
+static dev_t try_name(char *name, int part)
{
char path[64];
char buf[32];
@@ -134,16 +134,21 @@
* is mounted on rootfs /sys.
*/

-dev_t __init name_to_dev_t(char *name)
+dev_t name_to_dev_t(char *name)
{
char s[32];
char *p;
dev_t res = 0;
- int part;
+ int part, mount_result;

#ifdef CONFIG_SYSFS
int mkdir_err = sys_mkdir("/sys", 0700);
- if (sys_mount("sysfs", "/sys", "sysfs", 0, NULL) < 0)
+ /*
+ * When changing resume2 parameter for Software Suspend, sysfs may
+ * already be mounted.
+ */
+ mount_result = sys_mount("sysfs", "/sys", "sysfs", 0, NULL);
+ if (mount_result < 0 && mount_result != -EBUSY)
goto out;
#endif

@@ -195,7 +200,8 @@
res = try_name(s, part);
done:
#ifdef CONFIG_SYSFS
- sys_umount("/sys", 0);
+ if (mount_result >= 0)
+ sys_umount("/sys", 0);
out:
if (!mkdir_err)
sys_rmdir("/sys");
@@ -205,6 +211,8 @@
res = 0;
goto done;
}
+/* Exported for Software Suspend */
+EXPORT_SYMBOL(name_to_dev_t);

static int __init root_dev_setup(char *line)
{
@@ -398,9 +406,25 @@

is_floppy = MAJOR(ROOT_DEV) == FLOPPY_MAJOR;

+ /* Suspend2:
+ * By this point, suspend_early_init has been called to initialise our
+ * proc interface. If modules are built in, they have registered (all
+ * of the above via late_initcalls).
+ *
+ * We have not yet looked to see if an image exists, however. If we
+ * have an initrd, it is expected that the user will have set it up
+ * to echo > /proc/software_suspend/activate and thus initiate any
+ * resume. If they don'tdo that, we do it immediately after the initrd
+ * is finished (major issues if they mount filesystems rw from the
+ * initrd! - they are warned. If there's no usable initrd, we do our
+ * check next
+ */
if (initrd_load())
goto out;

+ if (test_suspend_state(SUSPEND_RESUME_NOT_DONE))
+ software_suspend_try_resume();
+
if (is_floppy && rd_doload && rd_load_disk(0))
ROOT_DEV = Root_RAM0;

diff -ruN 302-init-hooks-old/init/do_mounts_initrd.c 302-init-hooks-new/init/do_mounts_initrd.c
--- 302-init-hooks-old/init/do_mounts_initrd.c 2004-11-03 21:51:15.000000000 +1100
+++ 302-init-hooks-new/init/do_mounts_initrd.c 2004-11-24 19:48:29.282649312 +1100
@@ -7,6 +7,7 @@
#include <linux/romfs_fs.h>
#include <linux/initrd.h>
#include <linux/sched.h>
+#include <linux/suspend.h>

#include "do_mounts.h"

@@ -58,10 +59,16 @@

pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
if (pid > 0) {
- while (pid != sys_wait4(-1, &i, 0, NULL))
+ while (pid != sys_wait4(-1, &i, 0, NULL)) {
yield();
+ if (current->flags & PF_FREEZE)
+ refrigerator(PF_FREEZE);
+ }
}

+ if (test_suspend_state(SUSPEND_RESUME_NOT_DONE))
+ software_suspend_try_resume();
+
/* move initrd to rootfs' /old */
sys_fchdir(old_fd);
sys_mount("/", ".", NULL, MS_MOVE, NULL);
diff -ruN 302-init-hooks-old/init/main.c 302-init-hooks-new/init/main.c
--- 302-init-hooks-old/init/main.c 2004-11-24 19:50:46.268824248 +1100
+++ 302-init-hooks-new/init/main.c 2004-11-24 19:48:29.293647640 +1100
@@ -46,6 +46,7 @@
#include <linux/rmap.h>
#include <linux/mempolicy.h>
#include <linux/key.h>
+#include <linux/suspend.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -758,6 +759,8 @@
else
prepare_namespace();

+ clear_suspend_state(SUSPEND_BOOT_TIME);
+
/*
* Ok, we have completed the initial bootup, and
* we're essentially up and running. Get rid of the
diff -ruN 302-init-hooks-old/kernel/power/swsusp.c 302-init-hooks-new/kernel/power/swsusp.c
--- 302-init-hooks-old/kernel/power/swsusp.c 2004-11-24 09:53:12.000000000 +1100
+++ 302-init-hooks-new/kernel/power/swsusp.c 2004-11-24 19:48:29.294647488 +1100
@@ -1168,7 +1168,7 @@

}

-extern dev_t __init name_to_dev_t(const char *line);
+extern dev_t name_to_dev_t(char *line);

static int __init read_pagedir(void)
{


2004-11-24 13:04:36

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 13/51: Disable highmem tlb flush for copyback.

When we're making/restoring our atomic copy of the image, secondary
processors are frozen. Trying an SMP call at that time could thus lead
to deadlock. Secondary processors have their tlbs unconditionally
flushed when leaving the processor refrigerator, so this doesn't come
back to bite us.


diff -ruN 502-disable-highmem-tlb-flush-for-copyback-old/mm/highmem.c 502-disable-highmem-tlb-flush-for-copyback-new/mm/highmem.c
--- 502-disable-highmem-tlb-flush-for-copyback-old/mm/highmem.c 2004-11-03 21:54:14.000000000 +1100
+++ 502-disable-highmem-tlb-flush-for-copyback-new/mm/highmem.c 2004-11-04 16:27:40.000000000 +1100
@@ -26,6 +26,7 @@
#include <linux/init.h>
#include <linux/hash.h>
#include <linux/highmem.h>
+#include <linux/suspend.h>
#include <asm/tlbflush.h>

static mempool_t *page_pool, *isa_page_pool;
@@ -94,7 +95,10 @@

set_page_address(page, NULL);
}
- flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ if (test_suspend_state(SUSPEND_FREEZE_SMP))
+ __flush_tlb();
+ else
+ flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
}

static inline unsigned long map_new_virtual(struct page *page)


2004-11-24 16:48:16

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 40/51: Prepare image

This file contains the main routines used to prepare an image. Note that
this is potentially an iterative process: our allocation of metadata for
the image we know about may change the characteristics of the image and
require the allocation of a few extra pages. The number of iterations is
definitely bound (and the user can always press escape - if they've
enabled it - to cancel if I was wrong here).

We account for every page of memory. They are either:

- LRU -> pageset 2
- allocated for memory pool -> pageset 1
- otherwise used -> pageset 1
- unused and not allocated for memory pool -> not saved
- used but marked NoSave -> not saved

Plugins tell us how much memory they need, and we put that much plus a
little more in the memory pool. (Can't account for device drivers).

diff -ruN 830-prepare-image-old/kernel/power/prepare_image.c 830-prepare-image-new/kernel/power/prepare_image.c
--- 830-prepare-image-old/kernel/power/prepare_image.c 1970-01-01 10:00:00.000000000 +1000
+++ 830-prepare-image-new/kernel/power/prepare_image.c 2004-11-18 11:52:47.000000000 +1100
@@ -0,0 +1,1050 @@
+/*
+ * kernel/power/prepare_image.c
+ *
+ * Copyright (C) 2003-2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * We need to eat memory until we can:
+ * 1. Perform the save without changing anything (RAM_NEEDED < max_mapnr)
+ * 2. Fit it all in available space (active_writer->available_space() >= STORAGE_NEEDED)
+ * 3. Reload the pagedir and pageset1 to places that don't collide with their
+ * final destinations, not knowing to what extent the resumed kernel will
+ * overlap with the one loaded at boot time. I think the resumed kernel should overlap
+ * completely, but I don't want to rely on this as it is an unproven assumption. We
+ * therefore assume there will be no overlap at all (worse case).
+ * 4. Meet the user's requested limit (if any) on the size of the image.
+ * The limit is in MB, so pages/256 (assuming 4K pages).
+ *
+ * (Final test in save_image doesn't use EATEN_ENOUGH_MEMORY)
+ */
+
+#define SUSPEND_PREPARE_IMAGE_C
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/notifier.h>
+
+#include "suspend.h"
+#include "pageflags.h"
+#include "plugins.h"
+#include "proc.h"
+
+extern int pageset1_sizelow, pageset2_sizelow;
+extern unsigned long orig_mem_free;
+extern void mark_pages_for_pageset2(void);
+extern int image_size_limit;
+extern int fill_suspend_memory_pool(int sizesought);
+
+int suspend_amount_grabbed = 0;
+static int arefrozen = 0, numnosave = 0;
+static int header_space_allocated = 0;
+extern unsigned long forced_ps1_size, forced_ps2_size;
+
+/*
+ * generate_free_page_map
+ *
+ * Description: This routine generates a bitmap of free pages from the
+ * lists used by the memory manager. We then use the bitmap
+ * to quickly calculate which pages to save and in which
+ * pagesets.
+ */
+static void generate_free_page_map(void)
+{
+ int i, order, loop, cpu;
+ struct page * page;
+ unsigned long flags;
+ struct zone *zone;
+ struct per_cpu_pageset *pset;
+
+ for(i=0; i < max_mapnr; i++)
+ SetPageInUse(mem_map+i);
+
+ for_each_zone(zone) {
+ if (!zone->present_pages)
+ continue;
+ spin_lock_irqsave(&zone->lock, flags);
+ for (order = MAX_ORDER - 1; order >= 0; --order) {
+ list_for_each_entry(page, &zone->free_area[order].free_list, lru)
+ for(loop=0; loop < (1 << order); loop++) {
+ ClearPageInUse(page+loop);
+ ClearPagePageset2(page+loop);
+ }
+ }
+
+
+ for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ if (!cpu_possible(cpu))
+ continue;
+
+ pset = &zone->pageset[cpu];
+
+ for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) {
+ struct per_cpu_pages *pcp;
+ struct page * page;
+
+ pcp = &pset->pcp[i];
+ list_for_each_entry(page, &pcp->list, lru) {
+ ClearPageInUse(page);
+ ClearPagePageset2(page);
+ }
+ }
+ }
+
+ spin_unlock_irqrestore(&zone->lock, flags);
+ }
+}
+
+/* size_of_free_region
+ *
+ * Description: Return the number of pages that are free, beginning with and
+ * including this one.
+ */
+static int size_of_free_region(struct page * page)
+{
+ struct page * posn = page;
+
+ while (((posn-mem_map) < max_mapnr) && (!PageInUse(posn)))
+ posn++;
+ return (posn - page);
+}
+
+static void display_reserved_pages(void)
+{
+ int loop;
+ int rangemin = -1;
+
+ for (loop = 0; loop < max_mapnr; loop++) {
+ if (PageReserved(mem_map+loop)) {
+ if (rangemin == -1)
+ rangemin = loop;
+ } else {
+ if (rangemin > -1) {
+ printk("Reserved pages from %p to %p.\n",
+ page_address(mem_map+rangemin),
+ ((char *) page_address(mem_map + loop)) - 1);
+ rangemin = -1;
+ }
+ }
+ }
+
+ if (rangemin > -1)
+ printk("Reserved pages from %p to %p.\n",
+ page_address(mem_map+rangemin),
+ ((char *) page_address(mem_map + max_mapnr)) - 1);
+}
+
+/*
+ * Description: Display which pages are marked Nosave.
+ */
+void display_nosave_pages(void)
+{
+ int loop;
+ int rangemin = -1;
+
+ if (!TEST_DEBUG_STATE(SUSPEND_NOSAVE))
+ return;
+
+ display_reserved_pages();
+
+ for (loop = 0; loop < max_mapnr; loop++) {
+ if (PageNosave(mem_map+loop)) {
+ if (rangemin == -1)
+ rangemin = loop;
+ } else {
+ if (rangemin > -1) {
+ printk("Nosave pages from %p to %p.\n",
+ page_address(mem_map+rangemin),
+ ((char *) page_address(mem_map + loop)) - 1);
+ rangemin = -1;
+ }
+ }
+ }
+
+ if (rangemin > -1)
+ printk("Nosave pages from %p to %p.\n",
+ page_address(mem_map+rangemin),
+ ((char *) page_address(mem_map + max_mapnr)) - 1);
+}
+
+/*
+ * count_data_pages
+ *
+ * This routine generates our lists of pages to be stored in each
+ * pageset. Since we store the data using ranges, and adding new
+ * ranges might allocate a new range page, this routine may well
+ * be called more than once.
+ */
+static struct pageset_sizes_result count_data_pages(void)
+{
+ int chunk_size, loop, numfree = 0;
+ int ranges = 0, currentrange = 0;
+ int usepagedir2;
+ int rangemin = 0;
+ struct pageset_sizes_result result;
+ struct range * rangepointer;
+ unsigned long value;
+#ifdef CONFIG_HIGHMEM
+ unsigned long highstart_pfn = get_highstart_pfn();
+#endif
+
+ result.size1 = 0;
+ result.size1low = 0;
+ result.size2 = 0;
+ result.size2low = 0;
+ result.needmorespace = 0;
+
+ numnosave = 0;
+
+ put_range_chain(&pagedir1.origranges);
+ put_range_chain(&pagedir1.destranges);
+ put_range_chain(&pagedir2.origranges);
+ pagedir2.destranges.first = NULL;
+ pagedir2.destranges.size = 0;
+
+ generate_free_page_map();
+
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED)) {
+ result.size1 = -1;
+ result.size1low = -1;
+ result.size2 = -1;
+ result.size2low = -1;
+ result.needmorespace = 0;
+ return result;
+ }
+
+ if (max_mapnr != num_physpages) {
+ abort_suspend("Max_mapnr is not equal to num_physpages.");
+ result.size1 = -1;
+ result.size1low = -1;
+ result.size2 = -1;
+ result.size2low = -1;
+ result.needmorespace = 0;
+ return result;
+ }
+ /*
+ * Pages not to be saved are marked Nosave irrespective of being reserved
+ */
+ for (loop = 0; loop < max_mapnr; loop++) {
+ if (PageNosave(mem_map+loop)) {
+ numnosave++;
+ if (currentrange) {
+ append_to_range_chain(currentrange, rangemin, loop - 1);
+ rangemin = loop;
+ currentrange = 0;
+ }
+ continue;
+ }
+
+ if (!PageReserved(mem_map+loop)) {
+ if ((chunk_size=size_of_free_region(mem_map+loop))!=0) {
+ if (currentrange) {
+ append_to_range_chain(currentrange, rangemin, loop - 1);
+ rangemin = loop;
+ currentrange = 0;
+ }
+ numfree += chunk_size;
+ loop += chunk_size - 1;
+ continue;
+ }
+ } else {
+#ifdef CONFIG_HIGHMEM
+ if (loop >= highstart_pfn) {
+ /* HighMem pages may be marked Reserved. We ignore them. */
+ numnosave++;
+ if (currentrange) {
+ append_to_range_chain(currentrange, rangemin, loop - 1);
+ rangemin = loop;
+ currentrange = 0;
+ }
+ continue;
+ }
+#endif
+ };
+
+ usepagedir2 = !!PagePageset2(mem_map+loop);
+
+ if (currentrange != (1 + usepagedir2)) {
+ if (currentrange)
+ append_to_range_chain(currentrange, rangemin, loop - 1);
+ currentrange = usepagedir2 + 1;
+ rangemin = loop;
+ ranges++;
+ }
+
+ if (usepagedir2) {
+ result.size2++;
+ if (!PageHighMem(mem_map+loop))
+ result.size2low++;
+ } else {
+ result.size1++;
+ if (!PageHighMem(mem_map+loop))
+ result.size1low++;
+ }
+ }
+
+ if (currentrange)
+ append_to_range_chain(currentrange, rangemin, loop - 1);
+
+ if ((pagedir1.pageset_size) && (result.size1 > pagedir1.pageset_size))
+ result.needmorespace = 1;
+ if ((pagedir2.pageset_size) && (result.size2 > pagedir2.pageset_size))
+ result.needmorespace = 1;
+ suspend_message(SUSPEND_RANGES, SUSPEND_MEDIUM, 0, "Counted %d ranges.\n", ranges);
+ pagedir2.destranges.first = pagedir2.origranges.first;
+ pagedir2.destranges.size = pagedir2.origranges.size;
+ range_for_each(&pagedir1.allocdranges, rangepointer, value) {
+ add_to_range_chain(&pagedir1.destranges, value);
+ }
+
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_MEDIUM, 0,
+ "Count data pages: Set1 (%d) + Set2 (%d) + Nosave (%d) + NumFree (%d) = %d.\n",
+ result.size1, result.size2, numnosave, numfree,
+ result.size1 + result.size2 + numnosave + numfree);
+ return result;
+}
+
+/* amount_needed
+ *
+ * Calculates the amount by which the image size needs to be reduced to meet
+ * our constraints.
+ */
+static int amount_needed(int use_image_size_limit)
+{
+
+ int max1 = max( (int) (RAM_TO_SUSPEND - real_nr_free_pages() -
+ nr_free_highpages() - suspend_amount_grabbed),
+ ((int) (STORAGE_NEEDED(1) -
+ active_writer->ops.writer.storage_available())));
+ if (use_image_size_limit)
+ return max( max1,
+ (image_size_limit > 0) ?
+ ((int) (STORAGE_NEEDED(1) - (image_size_limit << 8))) : 0);
+ return max1;
+}
+
+#define EATEN_ENOUGH_MEMORY() (amount_needed(1) < 1)
+unsigned long storage_available = 0;
+
+/* display_stats
+ *
+ * Display the vital statistics.
+ */
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+static void display_stats(void)
+{
+ unsigned long storage_allocated = active_writer->ops.writer.storage_allocated();
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_MEDIUM, 1,
+ "Free:%d+%d+%d=%d(%d). Sets:%d(%d),%d(%d). Header:%d. Nosave:%d-%d-%d=%d. Storage:%d/%lu(%lu). Needed:%d|%d|%d.\n",
+
+ /* Free */
+ real_nr_free_pages(), suspend_amount_grabbed, suspend_memory_pool_level(0),
+ real_nr_free_pages() + suspend_amount_grabbed + suspend_memory_pool_level(0),
+ real_nr_free_pages() - nr_free_highpages(),
+
+ /* Sets */
+ pageset1_size, pageset1_sizelow,
+ pageset2_size, pageset2_sizelow,
+
+ /* Header */
+ num_range_pages,
+
+ /* Nosave */
+ numnosave, pagedir1.allocdranges.size, suspend_amount_grabbed,
+ numnosave - pagedir1.allocdranges.size - suspend_amount_grabbed,
+
+ /* Storage - converted to pages for comparison */
+ storage_allocated,
+ STORAGE_NEEDED(1),
+ storage_available,
+
+ /* Needed */
+ RAM_TO_SUSPEND - real_nr_free_pages() - nr_free_highpages() - suspend_amount_grabbed,
+ STORAGE_NEEDED(1) - storage_available,
+ (image_size_limit > 0) ? (STORAGE_NEEDED(1) - (image_size_limit << 8)) : 0);
+}
+#else
+#define display_stats() do { } while(0)
+#endif
+
+struct bloat_pages {
+ struct bloat_pages * next;
+ int order;
+};
+
+static struct bloat_pages * bloat_pages = NULL;
+
+void free_pageset_size_bloat(void)
+{
+ while (bloat_pages) {
+ struct bloat_pages * next = bloat_pages->next;
+ free_pages((unsigned long) bloat_pages, bloat_pages->order);
+ bloat_pages = next;
+ }
+}
+
+#define redo_counts() \
+{ \
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1, \
+ "Recalculating counts. Currently %ld & %ld. ", \
+ ps1_get, ps2_get); \
+ result = count_data_pages(); \
+ if (forced_ps1_size) \
+ ps1_get = forced_ps1_size - result.size1 - drop_one; \
+ if (forced_ps2_size) \
+ ps2_get = forced_ps2_size - result.size2; \
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1, \
+ "Now %ld and %ld.\n", ps1_get, ps2_get); \
+}
+
+void increase_pageset_size(struct pageset_sizes_result result)
+{
+ long ps1_get = 0, ps2_get = 0, order, j;
+ int drop_one = 0;
+
+ if (forced_ps1_size)
+ ps1_get = forced_ps1_size - result.size1;
+
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_HIGH, 1,
+ "1: Forced size = %ld. Have %d -> ps1_get = %ld.\n",
+ forced_ps1_size, result.size1, ps1_get);
+
+ /*
+ * We can make ps2 size exactly what was requested, but
+ * not both.
+ */
+ if (forced_ps2_size) {
+ ps2_get = forced_ps2_size - result.size2;
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_HIGH, 1,
+ "2: Forced size = %ld. Have %d -> ps2_get = %ld.\n",
+ forced_ps2_size, result.size2, ps2_get);
+
+ if (ps2_get > 0) {
+ order = generic_fls(ps2_get);
+ if (order >= MAX_ORDER)
+ order = MAX_ORDER - 1;
+
+ while(ps2_get > 0) {
+ struct page * newpage;
+ unsigned long virt;
+ struct bloat_pages * link;
+
+ if ((ps1_get - (1 << order)) < (1 << order))
+ redo_counts();
+
+ while ((1 << order) > (ps2_get))
+ order--;
+
+ virt = get_grabbed_pages(order);
+
+ while ((!virt) && (order > 0)) {
+ order--;
+ if ((ps1_get - (1 << order)) < (1 << order))
+ redo_counts();
+ virt = get_grabbed_pages(order);
+ }
+
+ if (!virt) {
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_MEDIUM, 1,
+ " Failed to allocate enough memory for"
+ " requested pageset sizes.\n");
+ return;
+ }
+
+ newpage = virt_to_page(virt);
+ for (j = 0; j < (1 << order); j++)
+ SetPagePageset2(newpage + j);
+
+ link = (struct bloat_pages *) virt;
+ link->next = bloat_pages;
+ link->order = order;
+ bloat_pages = link;
+
+ ps2_get -= (1 << order);
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "Allocated %d for ps2. To get %ld.\n",
+ 1 << order, ps2_get);
+ }
+ } else
+ {
+ /* Here, we're making ps2 pages into ps1 pages */
+ int i;
+
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_HIGH, 1,
+ "Moving %ld ps2 pages to ps1.\n", -ps2_get);
+ for (i = 0; i < max_mapnr; i++) {
+ if PagePageset2(mem_map + i) {
+ ClearPagePageset2(mem_map + i);
+ ps2_get++;
+ ps1_get--;
+ }
+ if (!ps2_get)
+ break;
+ }
+ }
+ } else {
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_HIGH, 1,
+ "2: Forced size = %ld. Have %d -> ps2_get = %ld.\n",
+ forced_ps2_size, result.size2, ps2_get);
+ }
+
+ if (ps1_get > 0) {
+
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_HIGH, 1,
+ "Still to get %ld pages for ps1.\n", ps1_get);
+
+ /* We might allocate an extra range page later. */
+ if (ps1_get > 1) {
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "Reducing ps1_get by one.\n");
+ drop_one = 1;
+ ps1_get--;
+ }
+
+ order = generic_fls(ps1_get);
+ if (order >= MAX_ORDER)
+ order = MAX_ORDER - 1;
+
+ while(ps1_get > 0) {
+ unsigned long virt;
+ struct bloat_pages * link;
+
+ if ((ps1_get - (1 << order)) < (1 << order))
+ redo_counts();
+
+ while ((1 << order) > (ps1_get))
+ order--;
+
+ virt = get_grabbed_pages(order);
+
+ while ((!virt) && (order > 0)) {
+ order--;
+ if ((ps1_get - (1 << order)) < (1 << order))
+ redo_counts();
+ virt = get_grabbed_pages(order);
+ }
+
+ if (!virt) {
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "Couldn't get enough pages. Need %ld more.\n",
+ ps1_get);
+ return;
+ }
+
+ link = (struct bloat_pages *) virt;
+ link->next = bloat_pages;
+ link->order = order;
+ bloat_pages = link;
+
+ ps1_get -= (1 << order);
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "Allocated %d for ps1. To get %ld.\n", 1 << order, ps1_get);
+ }
+ }
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "Exiting increase pageset size.\n\n");
+}
+
+/*
+ * Eaten is the number of pages which have been eaten.
+ * Pagedirincluded is the number of pages which have been allocated for the pagedir.
+ */
+extern int allocate_extra_pagedir_memory(struct pagedir * p, int pageset_size, int alloc_from);
+
+struct pageset_sizes_result recalculate_stats(void)
+{
+ struct pageset_sizes_result result;
+
+ mark_pages_for_pageset2(); /* Need to call this before getting pageset1_size! */
+ result = count_data_pages();
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "Forced sizes %ld and %ld. Result %d and %d.\n",
+ forced_ps1_size, forced_ps2_size,
+ result.size1, result.size2);
+ if ((forced_ps1_size && forced_ps1_size != result.size1) ||
+ (forced_ps2_size && forced_ps2_size != result.size2)) {
+ increase_pageset_size(result);
+ result = count_data_pages();
+ }
+ pageset1_sizelow = result.size1low;
+ pageset2_sizelow = result.size2low;
+ pagedir1.lastpageset_size = pageset1_size = result.size1;
+ pagedir2.lastpageset_size = pageset2_size = result.size2;
+ storage_available = active_writer->ops.writer.storage_available();
+ suspend_store_free_mem(SUSPEND_FREE_RANGE_PAGES, 0);
+ return result;
+}
+
+/* update_image
+ *
+ * Allocate [more] memory and storage for the image.
+ * Remember, this is iterative!
+ */
+static int update_image(void)
+{
+ struct pageset_sizes_result result;
+ int iteration = 0, orig_num_range_pages;
+
+ result = recalculate_stats();
+
+ suspend_store_free_mem(SUSPEND_FREE_RANGE_PAGES, 0);
+
+ do {
+ iteration++;
+
+ orig_num_range_pages = num_range_pages;
+
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ "-- Iteration %d.\n", iteration);
+
+ if (suspend_allocate_checksum_pages()) {
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ "Still need to get more pages for checksum pages.\n");
+ return 1;
+ }
+
+ if (allocate_extra_pagedir_memory(&pagedir1, pageset1_size, pageset2_sizelow)) {
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ "Still need to get more pages for pagedir 1.\n");
+ return 1;
+ }
+
+ if (active_writer->ops.writer.allocate_storage(MAIN_STORAGE_NEEDED(1))) {
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ "Still need to get more storage space for the image proper.\n");
+ suspend_store_free_mem(SUSPEND_FREE_WRITER_STORAGE, 0);
+ return 1;
+ }
+
+ suspend_store_free_mem(SUSPEND_FREE_WRITER_STORAGE, 0);
+
+ set_suspend_state(SUSPEND_SLAB_ALLOC_FALLBACK);
+
+ if (active_writer->ops.writer.allocate_header_space(HEADER_STORAGE_NEEDED)) {
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1,
+ "Still need to get more storage space for header.\n");
+ return 1;
+ }
+
+ header_space_allocated = HEADER_STORAGE_NEEDED;
+
+ clear_suspend_state(SUSPEND_SLAB_ALLOC_FALLBACK);
+
+ /*
+ * Allocate remaining storage space, if possible, up to the
+ * maximum we know we'll need. It's okay to allocate the
+ * maximum if the writer is the swapwriter, but
+ * we don't want to grab all available space on an NFS share.
+ * We therefore ignore the expected compression ratio here,
+ * thereby trying to allocate the maximum image size we could
+ * need (assuming compression doesn't expand the image), but
+ * don't complain if we can't get the full amount we're after.
+ */
+
+ active_writer->ops.writer.allocate_storage(
+ max((long)(active_writer->ops.writer.storage_available() -
+ active_writer->ops.writer.storage_allocated()),
+ (long)(HEADER_STORAGE_NEEDED + MAIN_STORAGE_NEEDED(1))));
+
+ suspend_store_free_mem(SUSPEND_FREE_WRITER_STORAGE, 0);
+
+ result = recalculate_stats();
+ display_stats();
+
+ } while (((orig_num_range_pages < num_range_pages) ||
+ result.needmorespace ||
+ header_space_allocated < HEADER_STORAGE_NEEDED ||
+ active_writer->ops.writer.storage_allocated() < (HEADER_STORAGE_NEEDED + MAIN_STORAGE_NEEDED(1)))
+ && (!TEST_RESULT_STATE(SUSPEND_ABORTED)));
+
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_MEDIUM, 1, "-- Exit loop.\n");
+
+ return (amount_needed(0) > 0);
+}
+
+/* ----------------------- Memory grabbing --------------------------
+ *
+ * All of the memory that is available, we grab.
+ * This enables us to get the image size down, even when other
+ * processes might be trying to increase their memory usage. (We
+ * have a hook to disable the OOM killer).
+ *
+ * At the same time, suspend's own routines get memory from this
+ * pool, and so does slab growth. Only get_zeroed_page and siblings
+ * see no memory available.
+ */
+
+static spinlock_t suspend_grabbed_memory_lock = SPIN_LOCK_UNLOCKED;
+
+struct eaten_memory_t
+{
+ void * next;
+};
+
+struct eaten_memory_t *eaten_memory[MAX_ORDER];
+
+static void __grab_free_memory(void)
+{
+ int order, k;
+
+ /*
+ * First, quickly eat all memory that's already free.
+ */
+
+ for (order = MAX_ORDER - 1; order > -1; order--) {
+ struct eaten_memory_t *prev = eaten_memory[order];
+ eaten_memory[order] = (struct eaten_memory_t *) __get_free_pages(GFP_ATOMIC, order);
+ while (eaten_memory[order]) {
+ struct page * page = virt_to_page(eaten_memory[order]);
+ eaten_memory[order]->next = prev;
+ prev = eaten_memory[order];
+ suspend_amount_grabbed += (1 << order);
+ for (k=0; k < (1 << order); k++) {
+ SetPageNosave(page + k);
+ ClearPagePageset2(page + k);
+ }
+ eaten_memory[order] = (struct eaten_memory_t *) __get_free_pages(GFP_ATOMIC, order);
+ }
+ eaten_memory[order] = prev;
+ }
+}
+
+static void grab_free_memory(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&suspend_grabbed_memory_lock, flags);
+ __grab_free_memory();
+ spin_unlock_irqrestore(&suspend_grabbed_memory_lock, flags);
+}
+
+static void free_grabbed_memory(void)
+{
+ struct eaten_memory_t *next = NULL, *this = NULL;
+ int j, num_freed = 0, order;
+ unsigned long flags;
+
+ spin_lock_irqsave(&suspend_grabbed_memory_lock, flags);
+
+ /* Free all eaten pages immediately */
+ for (order = MAX_ORDER - 1; order > -1; order--) {
+ this=eaten_memory[order];
+ while(this) {
+ struct page * page = virt_to_page(this);
+ next = this->next;
+ for (j=0; j < (1 << order); j++)
+ ClearPageNosave(page + j);
+ free_pages((unsigned long) this, order);
+ num_freed+= (1 << order);
+ this = next;
+ }
+ eaten_memory[order] = NULL;
+ }
+ suspend_amount_grabbed -= num_freed;
+ BUG_ON(suspend_amount_grabbed);
+ spin_unlock_irqrestore(&suspend_grabbed_memory_lock, flags);
+}
+
+unsigned long get_grabbed_pages(int order)
+{
+ unsigned long this = (unsigned long) eaten_memory[order];
+ int alternative, j;
+ unsigned long flags;
+ struct page * page;
+
+ /* Get grabbed lowmem pages for suspend's use */
+ spin_lock_irqsave(&suspend_grabbed_memory_lock, flags);
+
+try_again:
+ if (this) {
+ page = virt_to_page(this);
+ eaten_memory[order] = eaten_memory[order]->next;
+ for (j=0; j < (1 << order); j++) {
+ ClearPageNosave(page + j);
+ ClearPagePageset2(page + j);
+ clear_page(page_address(page + j));
+ }
+ suspend_amount_grabbed -= (1 << order);
+ spin_unlock_irqrestore(&suspend_grabbed_memory_lock, flags);
+ return this;
+ }
+
+ alternative = order+1;
+ while ((!eaten_memory[alternative]) && (alternative < MAX_ORDER))
+ alternative++;
+
+ /* Maybe we didn't eat any memory - try normal get */
+ if (alternative == MAX_ORDER) {
+ this = __get_free_pages(GFP_ATOMIC, order);
+ if (this) {
+ page = virt_to_page(this);
+ for (j=0; j < (1 << order); j++) {
+ clear_page((char *) this + j * PAGE_SIZE);
+ ClearPagePageset2(page + j);
+ }
+ }
+ spin_unlock_irqrestore(&suspend_grabbed_memory_lock, flags);
+ return this;
+ }
+
+ {
+ unsigned long virt = (unsigned long) eaten_memory[alternative];
+ page = virt_to_page(eaten_memory[alternative]);
+ eaten_memory[alternative] = eaten_memory[alternative]->next;
+ for (j=0; j < (1 << (alternative)); j++) {
+ ClearPageNosave(page + j);
+ clear_page(page_address(page + j));
+ ClearPagePageset2(page + j);
+ }
+ free_pages(virt, alternative);
+ suspend_amount_grabbed -= (1 << alternative);
+ }
+
+ /* Get the chunk we want to return. May fail if something grabs
+ * the memory before us. */
+ this = __get_free_pages(GFP_ATOMIC, order);
+ if (!this)
+ goto try_again;
+
+ page = virt_to_page(this);
+
+ /* Grab the rest */
+ __grab_free_memory();
+
+ spin_unlock_irqrestore(&suspend_grabbed_memory_lock, flags);
+
+ return this;
+}
+
+/* --------------------------------------------------------------------------- */
+
+extern int freeze_processes(int no_progress);
+
+static int attempt_to_freeze(void)
+{
+ int result;
+
+ /* Stop processes before checking again */
+ thaw_processes(FREEZER_ALL_THREADS);
+ prepare_status(1, 1, "Freezing processes");
+ result = freeze_processes(0);
+ suspend_message(SUSPEND_FREEZER, SUSPEND_VERBOSE, 0, "- Freeze_processes returned %d.\n",
+ result);
+
+ if (result) {
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ SET_RESULT_STATE(SUSPEND_FREEZING_FAILED);
+ } else
+ arefrozen = 1;
+
+ return result;
+}
+
+extern asmlinkage long sys_sync(void);
+
+static int eat_memory(void)
+{
+ int orig_memory_still_to_eat, last_amount_needed = 0, times_criteria_met = 0;
+ int free_flags = 0, did_eat_memory = 0;
+
+ /*
+ * Note that if we have enough storage space and enough free memory, we may
+ * exit without eating anything. We give up when the last 10 iterations ate
+ * no extra pages because we're not going to get much more anyway, but
+ * the few pages we get will take a lot of time.
+ *
+ * We freeze processes before beginning, and then unfreeze them if we
+ * need to eat memory until we think we have enough. If our attempts
+ * to freeze fail, we give up and abort.
+ */
+
+ /* ----------- Stage 1: Freeze Processes ------------- */
+
+
+ prepare_status(0, 1, "Eating memory.");
+
+ recalculate_stats();
+ display_stats();
+
+ orig_memory_still_to_eat = amount_needed(1);
+ last_amount_needed = orig_memory_still_to_eat;
+
+ switch (image_size_limit) {
+ case -1: /* Don't eat any memory */
+ if (orig_memory_still_to_eat) {
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ SET_RESULT_STATE(SUSPEND_WOULD_EAT_MEMORY);
+ }
+ break;
+ case -2: /* Free caches only */
+ free_flags = GFP_NOIO | __GFP_HIGHMEM;
+ break;
+ default:
+ free_flags = GFP_ATOMIC | __GFP_HIGHMEM;
+ }
+
+ /* ----------- Stage 2: Eat memory ------------- */
+
+ while (((!EATEN_ENOUGH_MEMORY()) || (image_size_limit == -2)) && (!TEST_RESULT_STATE(SUSPEND_ABORTED)) && (times_criteria_met < 10)) {
+ int amount_freed;
+ int amount_wanted = orig_memory_still_to_eat - amount_needed(1);
+ if (amount_wanted < 1)
+ amount_wanted = 1; /* image_size_limit == -2 */
+
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_VERBOSE, 1,
+ "Times met criteria is %d.\n", times_criteria_met);
+ if (orig_memory_still_to_eat)
+ update_status(orig_memory_still_to_eat - amount_needed(1), orig_memory_still_to_eat, " Image size %d ", MB(STORAGE_NEEDED(1)));
+ else
+ update_status(0, 1, "Image size %d ", MB(STORAGE_NEEDED(1)));
+
+ if ((last_amount_needed - amount_needed(1)) < 10)
+ times_criteria_met++;
+ else
+ times_criteria_met = 0;
+ last_amount_needed = amount_needed(1);
+ amount_freed = shrink_all_memory(last_amount_needed);
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_VERBOSE, 1,
+ "Given %d, shrink_all_memory returned %d.\n", last_amount_needed, amount_freed);
+ grab_free_memory();
+ recalculate_stats();
+ display_stats();
+
+ did_eat_memory = 1;
+
+ check_shift_keys(0, NULL);
+ }
+
+ grab_free_memory();
+
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_VERBOSE, 1,
+ "Out of main eat memory loop.\n");
+
+ if (did_eat_memory) {
+ unsigned long orig_state = get_suspend_state();
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_VERBOSE, 1,
+ "Ate memory; letting kjournald etc run.\n");
+ clear_suspend_state(SUSPEND_USE_MEMORY_POOL);
+ thaw_processes(FREEZER_KERNEL_THREADS);
+ /* Freeze_processes will call sys_sync too */
+ freeze_processes(1);
+ grab_free_memory();
+ restore_suspend_state(orig_state);
+ recalculate_stats();
+ display_stats();
+ }
+
+ suspend_message(SUSPEND_EAT_MEMORY, 1, SUSPEND_VERBOSE, "\n");
+
+ suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_VERBOSE, 1,
+ "(Freezer exit:) Swap needed calculated as (%d+%d)*%d/100+%d+1+%d=%d.\n",
+ pageset1_size,
+ pageset2_size,
+ expected_compression_ratio(),
+ num_range_pages,
+ HEADER_STORAGE_NEEDED,
+ STORAGE_NEEDED(1));
+
+ /* Blank out image size display */
+ update_status(100, 100, " ");
+
+ /* Include image size limit when checking what to report */
+ if (amount_needed(1) > 0)
+ SET_RESULT_STATE(SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY);
+
+ /* But don't include it when deciding whether to abort (soft limit) */
+ if ((amount_needed(0) > 0)) {
+ printk("Unable to free sufficient memory to suspend. Still need %d pages. "
+ "You may be able to avoid this problem by reducing the async_io_limit\n",
+ amount_needed(1));
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ }
+
+ check_shift_keys(1, "Memory eating completed.");
+ return 0;
+}
+
+/* prepare_image
+ *
+ * Entry point to the whole image preparation section.
+ *
+ * We do four things:
+ * - Freeze processes;
+ * - Ensure image size constraints are met;
+ * - Complete all the preparation for saving the image,
+ * including allocation of storage. The only memory
+ * that should be needed when we're finished is that
+ * for actually storing the image (and we know how
+ * much is needed for that because the plugins tell
+ * us).
+ * - Make sure that all dirty buffers are written out.
+ */
+int prepare_image(void)
+{
+ int result = 1, sizesought;
+
+ arefrozen = 0;
+
+ header_space_allocated = 0;
+
+ sizesought = 100 + memory_for_plugins();
+
+ PRINTFREEMEM("prior to filling the memory pool");
+
+ if (fill_suspend_memory_pool(sizesought))
+ return 1;
+
+ PRINTFREEMEM("after filling the memory pool");
+ suspend_store_free_mem(SUSPEND_FREE_MEM_POOL, 0);
+
+ if (attempt_to_freeze())
+ return 1;
+
+ PRINTFREEMEM("after freezing processes");
+ suspend_store_free_mem(SUSPEND_FREE_FREEZER, 0);
+
+ if (!active_writer->ops.writer.storage_available()) {
+ printk(KERN_ERR "You need some storage available to be able to suspend.\n");
+ SET_RESULT_STATE(SUSPEND_ABORTED);
+ SET_RESULT_STATE(SUSPEND_NOSTORAGE_AVAILABLE);
+ return 1;
+ }
+
+ do {
+ if (eat_memory() || TEST_RESULT_STATE(SUSPEND_ABORTED))
+ break;
+
+ PRINTFREEMEM("after eating memory");
+ suspend_store_free_mem(SUSPEND_FREE_EAT_MEMORY, 0);
+
+ /* Top up */
+ if (fill_suspend_memory_pool(sizesought))
+ continue;
+
+ PRINTFREEMEM("after refilling memory pool");
+ suspend_store_free_mem(SUSPEND_FREE_MEM_POOL, 0);
+
+ result = update_image();
+ PRINTFREEMEM("after updating the image");
+
+ } while ((result) && (!TEST_RESULT_STATE(SUSPEND_ABORTED)) &&
+ (!TEST_RESULT_STATE(SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY)));
+
+ PRINTFREEMEM("after preparing image");
+
+ /* Release memory that has been eaten */
+ free_grabbed_memory();
+
+ PRINTFREEMEM("after freeing grabbed memory");
+ suspend_store_free_mem(SUSPEND_FREE_GRABBED_MEMORY, 1);
+
+ set_suspend_state(SUSPEND_USE_MEMORY_POOL);
+
+ check_shift_keys(1, "Image preparation complete.");
+
+ return result;
+}
+
+EXPORT_SYMBOL(suspend_amount_grabbed);


2004-11-24 13:04:35

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge:L 12/51: Disable OOM killer when suspending.

When preparing the image, suspend eats all the memory in sight, both to
reduce the image size and to improve the reliability of our stats (We've
worked hard to make it work reliably under heavy load - 100+). Of course
this can result in the OOM killer being triggered, so this simple test
stops that happening.

diff -ruN 501-disable-oom-killer-when-suspending-old/mm/oom_kill.c 501-disable-oom-killer-when-suspending-new/mm/oom_kill.c
--- 501-disable-oom-killer-when-suspending-old/mm/oom_kill.c 2004-11-03 21:53:47.000000000 +1100
+++ 501-disable-oom-killer-when-suspending-new/mm/oom_kill.c 2004-11-04 16:27:40.000000000 +1100
@@ -20,6 +20,7 @@
#include <linux/swap.h>
#include <linux/timex.h>
#include <linux/jiffies.h>
+#include <linux/suspend.h>

/* #define DEBUG */

@@ -237,6 +238,9 @@
static unsigned long first, last, count, lastkill;
unsigned long now, since;

+ if (test_suspend_state(SUSPEND_FREEZER_ON))
+ return;
+
spin_lock(&oom_lock);
now = jiffies;
since = now - last;


2004-11-24 13:04:35

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 11/51: Export vt functions.

On wide > 128 char displays, the text display gets messed up if gotoxy's
and gotoxay's variables are signed (I should confirm that this is still
the case - it's been a while).

We need to modify kmsg_redirect to see our messages when debugging :>


diff -ruN 401-export-vt-functions-old/drivers/char/vt.c 401-export-vt-functions-new/drivers/char/vt.c
--- 401-export-vt-functions-old/drivers/char/vt.c 2004-11-06 09:24:03.326462744 +1100
+++ 401-export-vt-functions-new/drivers/char/vt.c 2004-11-04 16:27:40.000000000 +1100
@@ -913,7 +913,7 @@
*/
static void gotoxy(int currcons, int new_x, int new_y)
{
- int min_y, max_y;
+ unsigned int min_y, max_y;

if (new_x < 0)
x = 0;
@@ -940,7 +940,7 @@
}

/* for absolute user moves, when decom is set */
-static void gotoxay(int currcons, int new_x, int new_y)
+static void gotoxay(int currcons, unsigned int new_x, unsigned int new_y)
{
gotoxy(currcons, new_x, decom ? (top+new_y) : new_y);
}
@@ -3312,6 +3312,7 @@
* Visible symbols for modules
*/

+EXPORT_SYMBOL(kmsg_redirect);
EXPORT_SYMBOL(color_table);
EXPORT_SYMBOL(default_red);
EXPORT_SYMBOL(default_grn);


2004-11-24 16:57:29

by Dave Hansen

[permalink] [raw]
Subject: Re: Suspend 2 merge: 18/51: Debug page_alloc support.

On Wed, 2004-11-24 at 04:58, Nigel Cunningham wrote:
> +#ifdef CONFIG_HIGHMEM
> + if (page >= highmem_start_page)
> + return 0;
> +#endif

There's a patch pending in -mm to kill highmem_start_page. Please use
PageHighMem().

-- Dave

2004-11-24 17:05:10

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 6/51

Ooh.

6/51 is missing.

Let's just say I can't count. I've checked the directory and the reboot
handler is the next one after workthreads :>

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 17:08:37

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 39/51: Plugins support.

A plugin is an extension to suspend, but not necessarily a module (I'm
trying to avoid confusing the terms). Plugins can make transformations
on pages of memory in the image (compress/encrypt...), write the image
(swapwriter, eg), provide I/O facilities (bootsplash/textmode) or be
'miscellaneous' (blockwriter that does the hard work for the swapwriter,
device mapper plugin that simply ensures enough memory is available for
the device mapper given the async I/O limit set).

This file handles registration & removal of plugins, as well as
invocation of some of the routines.

diff -ruN 829-plugins-old/kernel/power/plugins.c 829-plugins-new/kernel/power/plugins.c
--- 829-plugins-old/kernel/power/plugins.c 1970-01-01 10:00:00.000000000 +1000
+++ 829-plugins-new/kernel/power/plugins.c 2004-11-11 06:41:52.000000000 +1100
@@ -0,0 +1,372 @@
+/*
+ * kernel/power/plugin.c
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+
+#include "suspend.h"
+#include "plugins.h"
+
+struct list_head suspend_filters, suspend_writers, suspend_plugins, suspend_ui;
+int num_filters = 0, num_writers = 0, num_ui = 0, num_plugins = 0;
+struct suspend_plugin_ops * active_writer = NULL;
+struct suspend_plugin_ops * checksum_plugin = NULL;
+
+/*
+ * header_storage_for_plugins
+ *
+ * Returns the amount of space needed to store configuration
+ * data needed by the plugins prior to copying back the original
+ * kernel. We can exclude data for pageset2 because it will be
+ * available anyway once the kernel is copied back.
+ */
+unsigned long header_storage_for_plugins(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+ unsigned long bytes = 0;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->storage_needed)
+ bytes += this_plugin->storage_needed();
+ }
+
+ return bytes;
+}
+
+/*
+ * expected_compression_ratio
+ *
+ * Returns the expected ratio between the amount of memory
+ * to be saved and the amount of space required on the
+ * storage device.
+ */
+int expected_compression_ratio(void)
+{
+ struct suspend_plugin_ops * this_filter;
+ unsigned long ratio = 100;
+
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if (this_filter->ops.filter.expected_compression)
+ ratio = ratio * this_filter->ops.filter.expected_compression() / 100;
+ }
+
+ return (int) ratio;
+}
+
+/*
+ * memory_for_plugins
+ *
+ * Returns the amount of memory requested by plugins for
+ * doing their work during the cycle.
+ */
+
+unsigned long memory_for_plugins(void)
+{
+ unsigned long bytes = 0;
+ struct suspend_plugin_ops * this_plugin;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->memory_needed)
+ bytes += this_plugin->memory_needed();
+ }
+
+ return ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT);
+}
+
+/* suspend_early_boot_message_plugins
+ *
+ * Call early_boot_message methods for plugins.
+ */
+void suspend_early_boot_message_plugins(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+
+ list_for_each_entry(this_plugin, &suspend_ui, ops.ui.ui_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->ops.ui.early_boot_message_prep)
+ this_plugin->ops.ui.early_boot_message_prep();
+ }
+}
+
+/* suspend_plugin_keypress
+ *
+ * Pass the keycode to plugins until one handles it.
+ */
+void suspend_plugin_keypress(unsigned int keycode)
+{
+ struct suspend_plugin_ops * this_plugin;
+
+ list_for_each_entry(this_plugin, &suspend_ui, ops.ui.ui_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->ops.ui.keypress)
+ if (this_plugin->ops.ui.keypress(keycode))
+ return;
+ }
+}
+
+/* post_kernel_restore_redraw
+ *
+ * Call UI plugins to allow them to redraw the screen after a restoration
+ * of the original kernel
+ */
+
+void suspend_post_restore_redraw(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+
+ list_for_each_entry(this_plugin, &suspend_ui, ops.ui.ui_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->ops.ui.post_kernel_restore_redraw)
+ this_plugin->ops.ui.post_kernel_restore_redraw();
+ }
+}
+
+
+/* find_plugin_given_name
+ * Functionality : Return a plugin (if found), given a pointer
+ * to its name
+ */
+
+struct suspend_plugin_ops * find_plugin_given_name(char * name)
+{
+ struct suspend_plugin_ops * this_plugin, * found_plugin = NULL;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (!strcmp(name, this_plugin->name)) {
+ found_plugin = this_plugin;
+ break;
+ }
+ }
+
+ return found_plugin;
+}
+
+/*
+ * print_plugin_debug_info
+ * Functionality : Get debugging info from plugins into a buffer.
+ */
+int print_plugin_debug_info(char * buffer, int buffer_size)
+{
+ struct suspend_plugin_ops *this_plugin;
+ int len = 0;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->print_debug_info) {
+ int result;
+ result = this_plugin->print_debug_info(buffer + len,
+ buffer_size - len);
+ len += result;
+ }
+ }
+
+ return len;
+}
+
+extern int attempt_to_parse_resume_device(void);
+
+int suspend_initialise_plugin_lists(void) {
+ INIT_LIST_HEAD(&suspend_filters);
+ INIT_LIST_HEAD(&suspend_writers);
+ INIT_LIST_HEAD(&suspend_ui);
+ INIT_LIST_HEAD(&suspend_plugins);
+ return 0;
+}
+
+int suspend_register_plugin(struct suspend_plugin_ops * plugin)
+{
+ if (!num_plugins)
+ suspend_initialise_plugin_lists();
+
+ if (find_plugin_given_name(plugin->name))
+ return -EBUSY;
+
+ switch (plugin->type) {
+ case FILTER_PLUGIN:
+ list_add_tail(&plugin->ops.filter.filter_list,
+ &suspend_filters);
+ num_filters++;
+ break;
+
+ case WRITER_PLUGIN:
+ list_add_tail(&plugin->ops.writer.writer_list,
+ &suspend_writers);
+ num_writers++;
+ if ((!active_writer) &&
+ (!(test_suspend_state(SUSPEND_BOOT_TIME))))
+ attempt_to_parse_resume_device();
+ break;
+
+ case UI_PLUGIN:
+ list_add_tail(&plugin->ops.ui.ui_list,
+ &suspend_ui);
+ num_ui++;
+ break;
+
+ case MISC_PLUGIN:
+ break;
+
+ case CHECKSUM_PLUGIN:
+ if (!checksum_plugin)
+ checksum_plugin = plugin;
+ else
+ printk("Checksum plugin already registered!");
+ break;
+
+ default:
+ printk("Hmmm. Plugin '%s' has an invalid type."
+ " It has been ignored.\n", plugin->name);
+ return -EINVAL;
+ }
+ list_add_tail(&plugin->plugin_list, &suspend_plugins);
+ num_plugins++;
+
+ return 0;
+}
+
+void suspend_unregister_plugin(struct suspend_plugin_ops * plugin)
+{
+ switch (plugin->type) {
+ case FILTER_PLUGIN:
+ list_del(&plugin->ops.filter.filter_list);
+ num_filters--;
+ break;
+
+ case WRITER_PLUGIN:
+ list_del(&plugin->ops.writer.writer_list);
+ num_writers--;
+ if (active_writer == plugin)
+ attempt_to_parse_resume_device();
+ break;
+
+ case UI_PLUGIN:
+ list_del(&plugin->ops.ui.ui_list);
+ num_ui--;
+ break;
+
+ case MISC_PLUGIN:
+ break;
+
+ case CHECKSUM_PLUGIN:
+ if (plugin == checksum_plugin)
+ checksum_plugin = NULL;
+ break;
+ default:
+ printk("Hmmm. Plugin '%s' has an invalid type."
+ " It has been ignored.\n", plugin->name);
+ return;
+ }
+ list_del(&plugin->plugin_list);
+ num_plugins--;
+}
+
+void suspend_move_plugin_tail(struct suspend_plugin_ops * plugin)
+{
+ switch (plugin->type) {
+ case FILTER_PLUGIN:
+ if (num_filters > 1)
+ list_move_tail(&plugin->ops.filter.filter_list,
+ &suspend_filters);
+ break;
+
+ case WRITER_PLUGIN:
+ if (num_writers > 1)
+ list_move_tail(&plugin->ops.writer.writer_list,
+ &suspend_writers);
+ break;
+
+ case UI_PLUGIN:
+ if (num_ui > 1)
+ list_move_tail(&plugin->ops.ui.ui_list,
+ &suspend_ui);
+ break;
+
+ case MISC_PLUGIN:
+ break;
+ default:
+ printk("Hmmm. Plugin '%s' has an invalid type."
+ " It has been ignored.\n", plugin->name);
+ return;
+ }
+ if ((num_filters + num_writers + num_ui) > 1)
+ list_move_tail(&plugin->plugin_list, &suspend_plugins);
+}
+
+int initialise_suspend_plugins(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+ int result;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->initialise) {
+ suspend_message(SUSPEND_MEMORY, SUSPEND_MEDIUM, 1,
+ "Initialising plugin %s.\n",
+ this_plugin->name);
+ if ((result = this_plugin->initialise()))
+ return result;
+ }
+ PRINTFREEMEM("after initialising plugin");
+ }
+
+ return 0;
+}
+
+void cleanup_suspend_plugins(void)
+{
+ struct suspend_plugin_ops * this_plugin;
+
+ list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) {
+ if (this_plugin->disabled)
+ continue;
+ if (this_plugin->cleanup) {
+ suspend_message(SUSPEND_MEMORY, SUSPEND_MEDIUM, 1,
+ "Cleaning up plugin %s.\n",
+ this_plugin->name);
+ this_plugin->cleanup();
+ PRINTFREEMEM("after cleaning up plugin");
+ }
+ }
+}
+
+struct suspend_plugin_ops *
+get_next_filter(struct suspend_plugin_ops * filter_sought)
+{
+ struct suspend_plugin_ops * last_filter = NULL, *this_filter = NULL;
+
+ list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) {
+ if (this_filter->disabled)
+ continue;
+ if ((last_filter == filter_sought) || (!filter_sought))
+ return this_filter;
+ last_filter = this_filter;
+ }
+
+ return active_writer;
+}
+
+EXPORT_SYMBOL(get_next_filter);
+EXPORT_SYMBOL(suspend_register_plugin);
+EXPORT_SYMBOL(suspend_unregister_plugin);
+EXPORT_SYMBOL(max_async_ios);
+EXPORT_SYMBOL(active_writer);
+EXPORT_SYMBOL(suspend_filters);
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+EXPORT_SYMBOL(suspend_store_free_mem);
+#endif
+EXPORT_SYMBOL(suspend_post_restore_redraw);


2004-11-24 17:11:46

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 32/51: Make show task non-static.

This is used to show the state of a task when a process fails to enter
the refrigerator.

diff -ruN 819-export-show-task-old/kernel/sched.c 819-export-show-task-new/kernel/sched.c
--- 819-export-show-task-old/kernel/sched.c 2004-11-06 09:27:29.549112136 +1100
+++ 819-export-show-task-new/kernel/sched.c 2004-11-04 16:27:41.000000000 +1100
@@ -32,7 +32,6 @@
#include <linux/security.h>
#include <linux/notifier.h>
#include <linux/profile.h>
-#include <linux/suspend.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
#include <linux/smp.h>
@@ -3719,7 +3718,7 @@
return list_entry(p->sibling.next,struct task_struct,sibling);
}

-static void show_task(task_t * p)
+void show_task(task_t * p)
{
task_t *relative;
unsigned state;


2004-11-24 16:54:44

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Suspend 2 merge: 42/51: Suspend.c

On Thu, 25 Nov 2004, Nigel Cunningham wrote:

> Here's the heart of the core :> (No, that's not a typo).
>
> - Device suspend/resume calls
> - Power down
> - Highest level routine
> - all_settings proc entry handling

This isn't the only patch (the utility.c file is another one) which
introduces functions/helpers which are subsystem specific (like ACPI) but
somehow land up in the same file with a suspend_ prefix. I understand that
it'll be more work but can you get them integrated with the subsystem in
question?

Thanks,
Zwane

2004-11-24 17:14:44

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 50/51: Device mapper support.

This is the device mapper support plugin. Its sole purpose is to ensure
that the device mapper allocates enough memory to process all of the I/O
we want to throw at it.

diff -ruN 856-suspend-dm-old/drivers/md/dm-io.c 856-suspend-dm-new/drivers/md/dm-io.c
--- 856-suspend-dm-old/drivers/md/dm-io.c 2004-11-03 21:55:01.000000000 +1100
+++ 856-suspend-dm-new/drivers/md/dm-io.c 2004-11-11 15:25:13.549327528 +1100
@@ -214,15 +214,6 @@
*---------------------------------------------------------------*/
static struct bio_set _bios;

-/* FIXME: can we shrink this ? */
-struct io {
- unsigned long error;
- atomic_t count;
- struct task_struct *sleeper;
- io_notify_fn callback;
- void *context;
-};
-
/*
* io contexts are only dynamically allocated for asynchronous
* io. Since async io is likely to be the majority of io we'll
@@ -247,6 +238,13 @@
return 4 * pages; /* too many ? */
}

+/* Wrapper for exporting this, as suspend2 dm helper code needs this */
+unsigned int dm_pages_to_ios(unsigned int pages)
+{
+ return pages_to_ios(pages);
+}
+EXPORT_SYMBOL(dm_pages_to_ios);
+
static int resize_pool(unsigned int new_ios)
{
int r = 0;
diff -ruN 856-suspend-dm-old/drivers/md/dm-io.h 856-suspend-dm-new/drivers/md/dm-io.h
--- 856-suspend-dm-old/drivers/md/dm-io.h 2004-11-03 21:52:50.000000000 +1100
+++ 856-suspend-dm-new/drivers/md/dm-io.h 2004-11-09 08:35:30.000000000 +1100
@@ -8,6 +8,7 @@
#define _DM_IO_H

#include "dm.h"
+#include <linux/bio.h>

/* FIXME make this configurable */
#define DM_MAX_IO_REGIONS 8
@@ -30,6 +31,18 @@
*/
typedef void (*io_notify_fn)(unsigned long error, void *context);

+/*
+ * Moved here from dm-io.c, as suspend2 dm code needs it
+ */
+/* FIXME: can we shrink this ? */
+struct io {
+ unsigned long error;
+ atomic_t count;
+ struct task_struct *sleeper;
+ io_notify_fn callback;
+ void *context;
+};
+

/*
* Before anyone uses the IO interface they should call
@@ -42,6 +55,11 @@
void dm_io_put(unsigned int num_pages);

/*
+ * The suspend2 dm helper code needs this one
+ */
+unsigned int dm_pages_to_ios(unsigned int pages);
+
+/*
* Synchronous IO.
*
* Please ensure that the rw flag in the next two functions is
diff -ruN 856-suspend-dm-old/kernel/power/suspend_dm.c 856-suspend-dm-new/kernel/power/suspend_dm.c
--- 856-suspend-dm-old/kernel/power/suspend_dm.c 1970-01-01 10:00:00.000000000 +1000
+++ 856-suspend-dm-new/kernel/power/suspend_dm.c 2004-11-11 08:16:40.000000000 +1100
@@ -0,0 +1,136 @@
+/*
+ * kernel/power/suspend_dm.c
+ *
+ * Copyright (C) 2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains support for interfacing with the device mapper
+ * to allocate memory for its work.
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+
+/* For calculating how much memory it needs */
+#include "../../drivers/md/dm-io.h"
+
+#include "suspend.h"
+#include "plugins.h"
+#include "proc.h"
+
+static struct suspend_plugin_ops suspend_dm_ops;
+static int io_get_result;
+
+/* ---- Exported functions ---- */
+
+/* suspend_dm_init()
+ *
+ * Description: Allocate buffers for device mapper use.
+ * Returns: Zero on success, -ENOMEM if unable to vmalloc.
+ */
+
+static int suspend_dm_init(void)
+{
+ io_get_result = dm_io_get(max_async_ios);
+ return io_get_result;
+}
+
+/* suspend_dm_cleanup()
+ *
+ * Description: Tell DM to release the memory we allocated.
+ * Returns: Zero. Always works!
+ */
+
+static void suspend_dm_cleanup(void)
+{
+ if (!io_get_result)
+ dm_io_put(max_async_ios);
+}
+
+/* suspend_dm_save_config_info
+ *
+ * Description: Save informaton needed when reloading the image at resume time.
+ * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE.
+ * Returns: Number of bytes used for saving our data.
+ */
+
+static int suspend_dm_save_config_info(char * buffer)
+{
+ return 0;
+}
+
+/* suspend_dm_load_config_info
+ *
+ * Description: Reload information needed for decompressing the image at
+ * resume time.
+ * Arguments: Buffer: Pointer to the start of the data.
+ * Size: Number of bytes that were saved.
+ */
+
+static void suspend_dm_load_config_info(char * buffer, int size)
+{
+ BUG_ON(size);
+ return;
+}
+
+/*
+ * data for our proc entries.
+ */
+
+static struct suspend_proc_data disable_dm_support_proc_data = {
+ .filename = "disable_device_mapper_support",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &suspend_dm_ops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ }
+};
+
+/*
+ * Ops structure.
+ */
+
+static struct suspend_plugin_ops suspend_dm_ops = {
+ .type = MISC_PLUGIN,
+ .name = "Device Mapper Support",
+ .initialise = suspend_dm_init,
+ .cleanup = suspend_dm_cleanup,
+ .save_config_info = suspend_dm_save_config_info,
+ .load_config_info = suspend_dm_load_config_info,
+};
+
+/* ---- Registration ---- */
+
+static __init int suspend_dm_load(void)
+{
+ int result;
+
+ if (!(result = suspend_register_plugin(&suspend_dm_ops))) {
+ printk("Software Suspend Device Mapper support registering.\n");
+ suspend_register_procfile(&disable_dm_support_proc_data);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void suspend_dm_unload(void)
+{
+ printk("Software Suspend Device Mapper support unloading.\n");
+ suspend_unregister_procfile(&disable_dm_support_proc_data);
+ suspend_unregister_plugin(&suspend_dm_ops);
+}
+
+
+module_init(suspend_dm_load);
+module_exit(suspend_dm_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Device Mapper support for Suspend2");
+#else
+late_initcall(suspend_dm_load);
+#endif


2004-11-24 17:14:43

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 51/51: Notes

When I started, I thought I did have 51 patches, really! One of them
turned out to be a couple of things I intend to reverse :>

In posting all of this, I recognise of course that no one else
understands how it all fits together. I'm hoping that those who care
enough will ask questions that I'll happily answer, learn from and
through which I'll improve the code.

For now, though, I'm going to bed.

Looking forward to your feedback... and hoping not too many people fail
to appreciate the volume of patches I've sent!

Nigel

2004-11-24 17:34:22

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 48/51: Swapwriter

This is the swapwriter. It is forms the glue between the highlevel I/O
routines in io.c and the blockwriter routines in block_io.c. It is
responsible for allocating storage, translating the requests for pages
within pagesets into devices and blocks and the like. It is abstracted
from the block writer because the plan is that we'll eventually have a
generic file writer (ie not using swapspace, but a simple file, possibly
on NFS). The swapwriter can automatically swapon and swapoff swapspace
for you. Handy if you have a swapdevice you only want to be used for
suspending. It understands the swsusp signature so we can play nicely
with swsusp (note that the Makefile order is important here: swsusp will
balk at our signature, so we need to look first).

diff -ruN 854-swapwriter-old/kernel/power/suspend_swap.c 854-swapwriter-new/kernel/power/suspend_swap.c
--- 854-swapwriter-old/kernel/power/suspend_swap.c 1970-01-01 10:00:00.000000000 +1000
+++ 854-swapwriter-new/kernel/power/suspend_swap.c 2004-11-17 16:38:59.000000000 +1100
@@ -0,0 +1,1967 @@
+/*
+ * Swapwriter.c
+ *
+ * Copyright 2004 Nigel Cunningham <[email protected]>
+ *
+ * Distributed under GPLv2.
+ *
+ * This file encapsulates functions for usage of swap space as a
+ * backing store.
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/swapops.h>
+
+#include "suspend.h"
+#include "block_io.h"
+#include "proc.h"
+#include "plugins.h"
+
+#define SIGNATURE_VER 6
+
+/* --- Struct of pages stored on disk */
+
+static struct suspend_plugin_ops swapwriterops;
+
+struct swaplink {
+ char dummy[PAGE_SIZE - sizeof(swp_entry_t)];
+ swp_entry_t next;
+};
+
+union diskpage {
+ union swap_header swh; /* swh.magic is the only member used */
+ struct swaplink link;
+ struct suspend_header sh;
+};
+
+union p_diskpage {
+ union diskpage *pointer;
+ char *ptr;
+ unsigned long address;
+};
+
+#define SIGNATURE_LENGTH 10
+
+// - Manage swap signature.
+static int prepare_signature(struct submit_params * first_header_page,
+ char * current_header);
+static int parse_signature(char * signature, int restore);
+
+// Higher Level
+static int readahead_index = 0, readahead_submit_index = 0;
+static int readahead_allocs = 0, readahead_frees = 0;
+
+static char * swapwriter_buffer = NULL;
+static int swapwriter_buffer_posn = 0;
+static int swapwriter_page_index = 0;
+static unsigned long * header_link = NULL;
+#define BYTES_PER_HEADER_PAGE (PAGE_SIZE - sizeof(swp_entry_t))
+
+/*
+ * ---------------------------------------------------------------
+ *
+ * Internal Data Structures
+ *
+ * ---------------------------------------------------------------
+ */
+
+/* header_data contains data that is needed to reload pagedir1, and
+ * is therefore saved in the suspend header.
+ *
+ * Pagedir2 swap comes before pagedir1 swap (save order), and the first swap
+ * entry for pagedir1 to use is set when pagedir2 is written (when we know how
+ * much swap it used). Since this first entry is almost certainly not at the
+ * start of a range, the firstoffset variable below tells us where to start in
+ * the range. All of this means we don't have to worry about getting different
+ * compression ratios for the kernel and cache (when compressing the image).
+ * We can simply allocate one pool of swap (size determined using expected
+ * compression ratio) and use it without worrying whether one pageset
+ * compresses better and the other worse (this is what happens). As long as the
+ * user gets the expected compression right, it will work.
+ */
+
+struct {
+ /* Range chains for swap & blocks */
+ struct rangechain swapranges;
+ struct rangechain block_chain[MAX_SWAPFILES];
+
+ /* Location of start of pagedir 1 */
+ struct range * pd1start_block_range;
+ unsigned long pd1start_block_offset;
+ int pd1start_chain;
+
+ /* Devices used for swap */
+ dev_t swapdevs[MAX_SWAPFILES];
+ char blocksizes[MAX_SWAPFILES];
+
+ /* Asynchronous I/O limit */
+ int max_async_ios;
+} header_data;
+
+
+dev_t header_device = 0;
+struct block_device * header_block_device = NULL;
+struct range * this_range_page = NULL, * next_range_page = NULL;
+int headerblocksize = PAGE_SIZE;
+int headerblock;
+
+/* For swapfile automatically swapon/off'd. */
+static char swapfilename[256] = "";
+extern asmlinkage long sys_swapon(const char * specialfile, int swap_flags);
+
+int suspend_swapon_status = 0;
+
+/* Must be silent - might be called from cat /proc/suspend/debug_info
+ * Returns 0 if was off, -EBUSY if was on, error value otherwise.
+ */
+static int enable_swapfile(void)
+{
+ int activateswapresult = -EINVAL;
+
+ if (suspend_swapon_status)
+ return 0;
+
+ if (swapfilename[0]) {
+ /* Attempt to swap on with maximum priority */
+ activateswapresult = sys_swapon(swapfilename, 0xFFFF);
+ if ((activateswapresult) && (activateswapresult != -EBUSY))
+ printk(name_suspend
+ "The swapfile/partition specified by "
+ "/proc/suspend/swapfile (%s) could not"
+ " be turned on (error %d). Attempting "
+ "to continue.\n",
+ swapfilename, activateswapresult);
+ if (!activateswapresult)
+ suspend_swapon_status = 1;
+ }
+ return activateswapresult;
+}
+
+extern asmlinkage long sys_swapoff(const char * specialfile);
+/* Returns 0 if was on, -EINVAL if was off, error value otherwise */
+static int disable_swapfile(void)
+{
+ int result = -EINVAL;
+
+ if (!suspend_swapon_status)
+ return 0;
+
+ if (swapfilename[0]) {
+ result = sys_swapoff(swapfilename);
+ if (result == -EINVAL)
+ return 0; /* Wasn't on */
+ if (!result)
+ suspend_swapon_status = 0;
+ }
+
+ return result;
+}
+
+static int manage_swapfile(int enable)
+{
+ static int result;
+ mm_segment_t oldfs;
+
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+ if (enable)
+ result = enable_swapfile();
+ else
+ result = disable_swapfile();
+ set_fs(oldfs);
+
+ return result;
+}
+
+/*
+ * ---------------------------------------------------------------
+ *
+ * Current state.
+ *
+ * ---------------------------------------------------------------
+ */
+
+/* Which pagedir are we saving/reloading? Needed so we can know whether to
+ * remember the last swap entry used at the end of writing pageset2, and
+ * get that location when saving or reloading pageset1.*/
+static int current_stream = 0;
+
+/* Pointer to current swap entry being loaded/saved. */
+static struct range * currentblockrange = NULL;
+static unsigned long currentblockoffset = 0;
+static int currentblockchain = 0;
+static int currentblocksperpage = 0;
+
+/* Header Page Information */
+static int header_pages_allocated = 0;
+static struct submit_params * first_header_submit_info = NULL,
+ * last_header_submit_info = NULL, * current_header_submit_info = NULL;
+
+/*
+ * ---------------------------------------------------------------
+ *
+ * User Specified Parameters
+ *
+ * ---------------------------------------------------------------
+ */
+
+static int resume_firstblock = 0;
+static int resume_firstblocksize = PAGE_SIZE;
+static dev_t resume_device = 0;
+static struct block_device * resume_block_device = NULL;
+
+/*
+ * ---------------------------------------------------------------
+ *
+ * Disk I/O routines
+ *
+ * ---------------------------------------------------------------
+ */
+extern char swapfilename[];
+
+extern int expected_compression;
+
+struct sysinfo swapinfo;
+
+#define MARK_SWAP_SUSPEND 0
+#define MARK_SWAP_RESUME 1
+
+static int swapwriter_invalidate_image(void);
+
+static int get_phys_params(swp_entry_t entry)
+{
+ int swapfilenum = swp_type(entry);
+ unsigned long offset = swp_offset(entry);
+ struct swap_info_struct * sis = get_swap_info_struct(swapfilenum);
+ sector_t sector = map_swap_page(sis, offset);
+
+ add_to_range_chain(&header_data.block_chain[swapfilenum], sector);
+ return 1;
+}
+
+static int get_header_params(struct submit_params * headerpage)
+{
+ swp_entry_t entry = headerpage->swap_address;
+ int swapfilenum = swp_type(entry);
+ unsigned long offset = swp_offset(entry);
+ struct swap_info_struct * sis = get_swap_info_struct(swapfilenum);
+ sector_t sector = map_swap_page(sis, offset);
+
+ headerpage->dev = sis->bdev,
+ headerpage->blocks[0] = sector;
+ headerpage->blocks_used = 1;
+ headerpage->readahead_index = -1;
+ return 0;
+}
+
+static inline int get_blocks_per_page(int chain)
+{
+ return 1;
+#if 0
+ int result = PAGE_SIZE /
+ suspend_bio_ops.get_block_size(swap_info[chain].bdev);
+ printk("Block size for chain %d is %d,\n",
+ chain, result);
+ return result;
+#endif
+}
+
+static int try_to_parse_resume_device(char * commandline)
+{
+ resume_device = name_to_dev_t(commandline);
+ if (!resume_device) {
+ if (test_suspend_state(SUSPEND_BOOT_TIME))
+ suspend_early_boot_message(1, "Failed to translate the device name into a device id.\n");
+ else
+ printk(name_suspend "Failed to translate \"%s\" into a device id.\n", commandline);
+ return 1;
+ }
+
+ resume_block_device = open_by_devnum(resume_device, FMODE_READ);
+
+ if (IS_ERR(resume_block_device)) {
+ printk("Open by devnum returned %p given %x.\n", resume_block_device, resume_device);
+ if (test_suspend_state(SUSPEND_BOOT_TIME))
+ suspend_early_boot_message(1, "Failed to get access to the device on which Software Suspend's header should be found.");
+ else
+ printk("Failed to get access to the device on which Software Suspend's header should be found.\n");
+ return 1;
+ }
+
+ return 0;
+}
+
+static int try_to_parse_header_device(void)
+{
+ header_block_device = open_by_devnum(header_device, FMODE_READ);
+
+ if (IS_ERR(header_block_device)) {
+ if (suspend_early_boot_message(1,
+ "Failed to get access to the "
+ "resume header device.\nYou could be "
+ "booting with a 2.6 kernel when you "
+ "suspended a 2.4 kernel."))
+ swapwriter_invalidate_image();
+
+ return -EINVAL;
+ }
+
+ if (set_blocksize(header_block_device, PAGE_SIZE) < 0) {
+ if (suspend_early_boot_message(1, "Failed to set the blocksize"
+ "for a swap device."))
+ do { } while(0);
+ swapwriter_invalidate_image();
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void open_other_swap_device(int i, dev_t thisdevice)
+{
+ swap_info[i].bdev = open_by_devnum(thisdevice, FMODE_READ);
+ set_blocksize(swap_info[i].bdev, PAGE_SIZE);
+}
+
+static inline char * get_path_for_swapfile(int which, char * path_page)
+{
+ return d_path( swap_info[which].swap_file->f_dentry,
+ swap_info[which].swap_file->f_vfsmnt,
+ path_page,
+ PAGE_SIZE);
+}
+
+static void swapwriter_noresume_reset(void)
+{
+ int i;
+
+ /*
+ * If we have read part of the image, we might have filled header_data with
+ * data that should be zeroed out.
+ */
+
+ memset((char *) &header_data, 0, sizeof(header_data));
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ swap_info[i].bdev = NULL;
+ }
+
+}
+
+static void swapwriter_dpm_set_devices(void)
+{
+ int i;
+
+ /* Set our device(s) as remaining on. */
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ if (!swap_info[i].bdev)
+ continue;
+
+ device_switch_trees((swap_info[i].bdev)->bd_disk->driverfs_dev,
+ suspend_device_tree);
+ }
+}
+
+/*
+ *
+ */
+
+int parse_signature(char * header, int restore)
+{
+ int type = -1;
+
+ if (!memcmp("SWAP-SPACE",header,10))
+ return 0;
+ else if (!memcmp("SWAPSPACE2",header,10))
+ return 1;
+
+ else if (!memcmp("pmdisk", header,6))
+ type = 2;
+
+ else if (!memcmp("S1SUSP",header,6))
+ type = 4;
+ else if (!memcmp("S2SUSP",header,6))
+ type = 5;
+
+ else if (!memcmp("1R",header,2))
+ type = 6;
+ else if (!memcmp("2R",header,2))
+ type = 7;
+
+ else if (!memcmp("std",header,3))
+ type = 8;
+ else if (!memcmp("STD",header,3))
+ type = 9;
+
+ else if (!memcmp("sd",header,2))
+ type = 10;
+ else if (!memcmp("SD",header,2))
+ type = 11;
+
+ else if (!memcmp("z",header,1))
+ type = 12;
+ else if (!memcmp("Z",header,1))
+ type = 13;
+
+ /*
+ * Put bdev of suspend header in last byte of swap header
+ * (unsigned short)
+ */
+ if (type > 11) {
+ dev_t * header_ptr = (dev_t *) &header[1];
+ unsigned char * headerblocksize_ptr =
+ (unsigned char *) &header[5];
+ unsigned long * headerblock_ptr = (unsigned long *) &header[6];
+ header_device = *header_ptr;
+ /*
+ * We are now using the highest bit of the char to indicate
+ * whether we have attempted to resume from this image before.
+ */
+ clear_suspend_state(SUSPEND_RESUMED_BEFORE);
+ if (((int) *headerblocksize_ptr) & 0x80)
+ set_suspend_state(SUSPEND_RESUMED_BEFORE);
+ headerblocksize = 512 * (((int) *headerblocksize_ptr) & 0xf);
+ headerblock = *headerblock_ptr;
+ }
+
+ if ((restore) && (type > 5)) {
+ /* We only reset our own signatures */
+ if (type & 1)
+ memcpy(header,"SWAPSPACE2",10);
+ else
+ memcpy(header,"SWAP-SPACE",10);
+ }
+
+ return type;
+}
+
+/*
+ * prepare_signature
+ */
+
+static int prepare_signature(struct submit_params * header_page_info,
+ char * current_header)
+{
+ int current_type = parse_signature(current_header, 0);
+ dev_t * header_ptr = (dev_t *) (&current_header[1]);
+ unsigned char * headerblocksize_ptr =
+ (unsigned char *) (&current_header[5]);
+ unsigned long * headerblock_ptr =
+ (unsigned long *) (&current_header[6]);
+
+ if ((current_type > 1) && (current_type < 6))
+ return 1;
+
+ if (current_type & 1)
+ current_header[0] = 'Z';
+ else
+ current_header[0] = 'z';
+ *header_ptr = header_page_info->dev->bd_dev;
+ *headerblocksize_ptr =
+ (unsigned char) (PAGE_SIZE / 512 /
+ header_page_info->blocks_used);
+ /* prev is the first/last swap page of the resume area */
+ *headerblock_ptr = (unsigned long) header_page_info->blocks[0];
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1,
+ "Saving header block size of %ld (%ld 512 "
+ "byte blocks per page).\n",
+ PAGE_SIZE / header_page_info->blocks_used,
+ PAGE_SIZE / 512 / header_page_info->blocks_used);
+ return 0;
+}
+
+extern int signature_check(char * header, int fix);
+
+static int free_swap_pages_for_header(void)
+{
+ if (!first_header_submit_info)
+ return 1;
+
+ PRINTFREEMEM("at start of free_swap_pages_for_header");
+
+ while (first_header_submit_info) {
+ struct submit_params * next = first_header_submit_info->next;
+ if (first_header_submit_info->swap_address.val)
+ swap_free(first_header_submit_info->swap_address);
+ kfree(first_header_submit_info);
+ first_header_submit_info = next;
+ }
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_LOW, 1,
+ " Freed %d swap pages in free_swap_pages_for_header.\n",
+ header_pages_allocated);
+ first_header_submit_info = last_header_submit_info = NULL;
+ header_pages_allocated = 0;
+ PRINTFREEMEM("at end of free_swap_pages_for_header");
+ suspend_store_free_mem(SUSPEND_FREE_HEADER_STORAGE, 1);
+ return 0;
+}
+
+static void get_main_pool_phys_params(void)
+{
+ struct range * rangepointer = NULL;
+ unsigned long address;
+ int i;
+
+ for (i = 0; i < MAX_SWAPFILES; i++)
+ if (header_data.block_chain[i].first)
+ put_range_chain(&header_data.block_chain[i]);
+
+ range_for_each(&header_data.swapranges, rangepointer, address)
+ get_phys_params(range_val_to_swap_entry(address));
+}
+
+extern void put_range(struct range * range);
+
+static unsigned long swapwriter_storage_allocated(void)
+{
+ return (header_data.swapranges.size + header_pages_allocated);
+}
+
+static long swapwriter_storage_available(void)
+{
+ si_swapinfo(&swapinfo);
+ return (swapinfo.freeswap + (long) swapwriter_storage_allocated());
+}
+
+static int swapwriter_initialise(void)
+{
+ manage_swapfile(1);
+ return 0;
+}
+
+static void swapwriter_cleanup(void)
+{
+ manage_swapfile(0);
+}
+
+static int swapwriter_release_storage(void)
+{
+ int i = 0, swapcount = 0;
+
+#ifdef CONFIG_SOFTWARE_SUSPEND_KEEP_IMAGE
+ if ((TEST_ACTION_STATE(SUSPEND_KEEP_IMAGE)) && test_suspend_state(SUSPEND_NOW_RESUMING))
+ return 0;
+#endif
+
+ free_swap_pages_for_header();
+
+ if (header_data.swapranges.first) {
+ /* Free swap entries */
+ struct range * rangepointer;
+ unsigned long rangevalue;
+ swp_entry_t entry;
+ range_for_each(&header_data.swapranges, rangepointer,
+ rangevalue) {
+ entry = range_val_to_swap_entry(rangevalue);
+ swap_free(entry);
+
+ swapcount++;
+ check_shift_keys(0, NULL);
+ }
+ put_range_chain(&header_data.swapranges);
+
+ for (i = 0; i < MAX_SWAPFILES; i++)
+ if (header_data.block_chain[i].first)
+ put_range_chain(&header_data.block_chain[i]);
+ }
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Freed %d swap pages in free_swap.\n", swapcount);
+
+ return 0;
+}
+
+static long swapwriter_allocate_header_space(unsigned long space_really_requested)
+{
+ /* space_requested was going to be in bytes... not yet */
+ int i, ret = 0;
+ unsigned long space_requested;
+
+ /*
+ * Up to here in the process, we haven't taken account of the fact
+ * that we need an extra four bytes per 4092 bytes written for link
+ * to the next page on which the header will be written. We factor
+ * that in here.
+ */
+ space_requested = ((4096 * space_really_requested + 4091) / 4092);
+ space_requested = (space_requested * 4 + 4091) / 4092;
+ space_requested += space_really_requested;
+
+ PRINTFREEMEM("at start of allocate_header_space");
+
+ for (i=(header_pages_allocated+1); i<=space_requested; i++) {
+ struct submit_params * new_submit_param;
+
+ /* Get a submit structure */
+ new_submit_param = kmalloc(sizeof(struct submit_params), GFP_ATOMIC);
+
+ if (!new_submit_param) {
+ header_pages_allocated = i - 1;
+ printk("Failed to kmalloc a struct submit param.\n");
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ memset(new_submit_param, 0, sizeof(struct submit_params));
+
+ if (last_header_submit_info) {
+ last_header_submit_info->next = new_submit_param;
+ last_header_submit_info = new_submit_param;
+ } else
+ last_header_submit_info = first_header_submit_info =
+ new_submit_param;
+
+ /* Get swap entry */
+ new_submit_param->swap_address = get_swap_page();
+
+ if ((!new_submit_param->swap_address.val) &&
+ (header_data.swapranges.first)) {
+ /*
+ * Steal one from pageset swap chain. If, as a result,
+ * it is too small, more swap will be allocated or
+ * memory eaten.
+ */
+
+ new_submit_param->swap_address =
+ range_val_to_swap_entry(
+ header_data.swapranges.first->minimum);
+ if (header_data.swapranges.first->minimum <
+ header_data.swapranges.first->maximum)
+ header_data.swapranges.first->minimum++;
+ else {
+ struct range * oldfirst =
+ header_data.swapranges.first;
+ header_data.swapranges.first = oldfirst->next;
+ header_data.swapranges.frees++;
+ header_data.swapranges.prevtoprev =
+ header_data.swapranges.prevtolastaccessed =
+ header_data.swapranges.lastaccessed = NULL;
+ if (header_data.swapranges.last == oldfirst)
+ header_data.swapranges.last = NULL;
+ put_range(oldfirst);
+ }
+
+ header_data.swapranges.size--;
+
+ /*
+ * Recalculate block chains for main pool.
+ * We don't assume blocks are at start of a chain and
+ * don't know how many blocks per swap entry.
+ */
+ get_main_pool_phys_params();
+ }
+ if (!new_submit_param->swap_address.val) {
+ free_swap_pages_for_header();
+ printk("Unable to allocate swap page for header.\n");
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (get_header_params(new_submit_param)) {
+ printk("Failed to get header parameters.\n");
+ ret = -EFAULT;
+ goto out;
+ }
+ suspend_message(SUSPEND_HEADER, SUSPEND_MEDIUM, 0,
+ " Got header page %d/%d. Dev is %x. Block is %lu. "
+ "Blocksperpage is %d.\n",
+ i, space_requested,
+ new_submit_param->dev,
+ new_submit_param->blocks[0],
+ new_submit_param->blocks_used);
+ }
+ header_pages_allocated = space_requested;
+ suspend_message(SUSPEND_HEADER, SUSPEND_LOW, 1,
+ " Have %d swap pages in swapwriter::"
+ "allocate_header_space.\n",
+ header_pages_allocated);
+out:
+ PRINTFREEMEM("at end of swapwriter::allocate_header_space");
+ suspend_store_free_mem(SUSPEND_FREE_HEADER_STORAGE, 0);
+ return ret;
+}
+
+static int swapwriter_allocate_storage(unsigned long space_requested)
+{
+ int i, swapcount = 0, result = 0;
+ int lastsize = header_data.swapranges.size;
+ int numwanted = (int) (space_requested);
+ int pages_to_get = numwanted - header_data.swapranges.size;
+
+ if (numwanted < 1)
+ return 0;
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Started with swapranges.size == %d. "
+ "Seeking to allocate %d more.\n",
+ header_data.swapranges.size,
+ pages_to_get);
+
+ for(i=0; i < pages_to_get; i++) {
+ swp_entry_t entry;
+ suspend_message(SUSPEND_WRITER, SUSPEND_VERBOSE, 1, "");
+ entry = get_swap_page();
+ if (!entry.val) {
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Allocated %d/%d swap pages for main pool "
+ "in allocate_swap.\n",
+ swapcount, numwanted);
+ printk("Unable to allocate enough swap."
+ " Got %d pages of %d wanted.\n",
+ i, pages_to_get);
+ result = -ENOSPC;
+ goto out;
+ }
+ swapcount++;
+ {
+ int result =
+ add_to_range_chain(&header_data.swapranges,
+ swap_entry_to_range_val(entry));
+ if (result)
+ printk("add_to_range_chain returned %d.\n",
+ result);
+ }
+ if (header_data.swapranges.size != (lastsize + 1))
+ printk("swapranges.size == %d.\n",
+ header_data.swapranges.size);
+ lastsize = header_data.swapranges.size;
+ check_shift_keys(0, NULL);
+ if (TEST_RESULT_STATE(SUSPEND_ABORTED))
+ break;
+ }
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ " Allocated %d/%d swap pages in allocate_swap.\n",
+ swapcount, numwanted);
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Finished with swapranges.size == %d.\n",
+ header_data.swapranges.size);
+
+out:
+ get_main_pool_phys_params();
+
+ /* Any memory we allocate will be for range pages */
+ suspend_store_free_mem(SUSPEND_FREE_RANGE_PAGES, 0);
+ return result;
+}
+
+static int swapwriter_write_header_chunk(char * buffer, int buffer_size);
+static int header_bytes_written;
+
+static int swapwriter_write_header_init(void)
+{
+ int i;
+
+ header_bytes_written = 0;
+
+ for (i = 0; i < MAX_SWAPFILES; i++)
+ if (swap_info[i].swap_file) {
+ header_data.swapdevs[i] = swap_info[i].bdev->bd_dev;
+ header_data.blocksizes[i] =
+ block_size(swap_info[i].bdev);
+ }
+
+ header_data.max_async_ios = max_async_ios;
+
+ swapwriter_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+ header_link =
+ (unsigned long *) (swapwriter_buffer + BYTES_PER_HEADER_PAGE);
+ swapwriter_page_index = 1;
+
+ current_header_submit_info = first_header_submit_info;
+
+ /* Info needed to bootstrap goes at the start of the header.
+ * First we save the 'header_data' struct, including the number
+ * of header pages. Then we save the structs containing data needed
+ * for reading the header pages back.
+ * Note that even if header pages take more than one page, when we
+ * read back the info, we will have restored the location of the
+ * next header page by the time we go to use it.
+ */
+ swapwriter_write_header_chunk((char *) &header_data,
+ sizeof(header_data));
+
+ return 0;
+}
+
+static int swapwriter_write_header_chunk(char * buffer, int buffer_size)
+{
+ int bytes_left = buffer_size;
+
+ /*
+ * We buffer the writes until a page is full and to use the last
+ * sizeof(swp_entry_t) bytes for links between pages. This is
+ * totally transparent to the caller.
+ *
+ * Note also that buffer_size can be > PAGE_SIZE.
+ */
+
+ header_bytes_written += buffer_size;
+
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "\nStart of write_header_chunk loop with %d bytes to store.\n",
+ buffer_size);
+
+ while (bytes_left) {
+ char * source_start = buffer + buffer_size - bytes_left;
+ char * dest_start = swapwriter_buffer + swapwriter_buffer_posn;
+ int dest_capacity = BYTES_PER_HEADER_PAGE - swapwriter_buffer_posn;
+ swp_entry_t next_header_page;
+ if (bytes_left <= dest_capacity) {
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "Storing %d bytes from %p-%p in page %d, %p-%p.\n",
+ bytes_left,
+ source_start, source_start + bytes_left - 1,
+ swapwriter_page_index,
+ dest_start, dest_start + bytes_left - 1);
+ memcpy(dest_start, source_start, bytes_left);
+ swapwriter_buffer_posn += bytes_left;
+ return 0;
+ }
+
+ /* A page is full */
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "Storing %d bytes from %p-%p in page %d, %p-%p.\n",
+ dest_capacity,
+ source_start, source_start + dest_capacity - 1,
+ swapwriter_page_index,
+ dest_start, dest_start + dest_capacity - 1);
+ memcpy(dest_start, source_start, dest_capacity);
+ bytes_left -= dest_capacity;
+
+ BUG_ON(!current_header_submit_info);
+
+ if (!current_header_submit_info->next) {
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "This submit_info is the last one. Link zeroed.\n");
+ *header_link = 0;
+ } else {
+ next_header_page =
+ swp_entry(swp_type(
+ current_header_submit_info->next->swap_address),
+ current_header_submit_info->next->blocks[0]);
+
+ *header_link = next_header_page.val;
+
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "Header link is at %p. "
+ "Contents set to swap device #%ld, block %ld.\n",
+ header_link,
+ (long) swp_type(next_header_page),
+ swp_offset(next_header_page));
+ }
+
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "Writing header page %d/%d. "
+ "Dev is %x. Block is %lu. Blocksperpage is %d.\n",
+ swapwriter_page_index, header_pages_allocated,
+ current_header_submit_info->dev->bd_dev,
+ current_header_submit_info->blocks[0],
+ current_header_submit_info->blocks_used);
+
+ current_header_submit_info->page =
+ virt_to_page(swapwriter_buffer);
+ check_shift_keys(0, NULL);
+ suspend_bio_ops.submit_io(WRITE, current_header_submit_info, 0);
+
+ swapwriter_buffer_posn = 0;
+ swapwriter_page_index++;
+ current_header_submit_info = current_header_submit_info->next;
+ }
+
+ return 0;
+}
+
+static int swapwriter_write_header_cleanup(void)
+{
+ /* Write any unsaved data */
+ if (swapwriter_buffer_posn) {
+ *header_link = 0;
+
+ suspend_message(SUSPEND_HEADER, SUSPEND_HIGH, 0,
+ "Writing header page %d/%d. "
+ "Dev is %x. Block is %lu. Blocksperpage is %d.\n",
+ swapwriter_page_index, header_pages_allocated,
+ current_header_submit_info->dev->bd_dev,
+ current_header_submit_info->blocks[0],
+ current_header_submit_info->blocks_used);
+
+ current_header_submit_info->page =
+ virt_to_page(swapwriter_buffer);
+ suspend_bio_ops.submit_io(WRITE,
+ current_header_submit_info, 0);
+ }
+
+ /* Adjust swap header */
+ suspend_bio_ops.bdev_page_io(READ, resume_block_device, resume_firstblock,
+ virt_to_page(swapwriter_buffer));
+
+ prepare_signature(first_header_submit_info,
+ ((union swap_header *) swapwriter_buffer)->magic.magic);
+
+ suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, resume_firstblock,
+ virt_to_page(swapwriter_buffer));
+
+ free_pages((unsigned long) swapwriter_buffer, 0);
+ swapwriter_buffer = NULL;
+ header_link = NULL;
+
+ suspend_bio_ops.finish_all_io();
+
+ return 0;
+}
+
+/* ------------------------- HEADER READING ------------------------- */
+
+/*
+ * read_header_init()
+ *
+ * Description:
+ * 1. Attempt to read the device specified with resume2=.
+ * 2. Check the contents of the swap header for our signature.
+ * 3. Warn, ignore, reset and/or continue as appropriate.
+ * 4. If continuing, read the swapwriter configuration section
+ * of the header and set up block device info so we can read
+ * the rest of the header & image.
+ *
+ * Returns:
+ * May not return if user choose to reboot at a warning.
+ * -EINVAL if cannot resume at this time. Booting should continue
+ * normally.
+ */
+
+static int swapwriter_read_header_init(void)
+{
+ int i;
+
+ swapwriter_page_index = 1;
+
+ swapwriter_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+
+ if (!header_device) {
+ printk("read_header_init called when we haven't "
+ "verified there is an image!\n");
+ return -EINVAL;
+ }
+
+ /*
+ * If the header is not on the resume_device, get the resume device first.
+ */
+ if (header_device != resume_device) {
+ int result = try_to_parse_header_device();
+
+ if (result)
+ return result;
+ } else
+ header_block_device = resume_block_device;
+
+ /* Read swapwriter configuration */
+ suspend_bio_ops.bdev_page_io(READ, header_block_device, headerblock,
+ virt_to_page((unsigned long) swapwriter_buffer));
+ //FIXME Remember location of next page to be read.
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Retrieving %d bytes from %x:%x to page %d, %p-%p.\n",
+ header_block_device->bd_dev, headerblock,
+ sizeof(header_data),
+ swapwriter_page_index,
+ swapwriter_buffer, swapwriter_buffer + sizeof(header_data) - 1);
+ memcpy(&header_data, swapwriter_buffer, sizeof(header_data));
+
+ /* Restore device info */
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ dev_t thisdevice = header_data.swapdevs[i];
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_VERBOSE, 1,
+ "Swap device %d is %x.", i, thisdevice);
+
+ if (!thisdevice)
+ continue;
+
+ if (thisdevice == resume_device) {
+ suspend_message(SUSPEND_WRITER, SUSPEND_VERBOSE, 0,
+ "Resume root device %x", thisdevice);
+ swap_info[i].bdev = resume_block_device;
+ /* Mark as used so the device doesn't get suspended. */
+ swap_info[i].swap_file = (struct file *) 0xffffff;
+ continue;
+ }
+
+ if (thisdevice == header_device) {
+ suspend_message(SUSPEND_WRITER, SUSPEND_VERBOSE, 0,
+ "Resume header device %x", thisdevice);
+ swap_info[i].bdev = header_block_device;
+ /* Mark as used so the device doesn't get suspended. */
+ swap_info[i].swap_file = (struct file *) 0xffffff;
+ continue;
+ }
+
+ open_other_swap_device(i, thisdevice);
+ swap_info[i].swap_file = (struct file *) 0xffffff;
+ }
+
+ max_async_ios = header_data.max_async_ios;
+
+ swapwriter_buffer_posn = sizeof(header_data);
+
+ return 0;
+}
+
+static int swapwriter_read_header_chunk(char * buffer, int buffer_size)
+{
+ int bytes_left = buffer_size, ret = 0;
+
+ check_shift_keys(0, "");
+
+ /* Read a chunk of the header */
+ while ((bytes_left) && (!ret)) {
+ swp_entry_t next =
+ ((union p_diskpage) swapwriter_buffer).pointer->link.next;
+ struct block_device * dev = swap_info[swp_type(next)].bdev;
+ int pos = swp_offset(next);
+ char * dest_start = buffer + buffer_size - bytes_left;
+ char * source_start =
+ swapwriter_buffer + swapwriter_buffer_posn;
+ int source_capacity =
+ BYTES_PER_HEADER_PAGE - swapwriter_buffer_posn;
+
+ if (bytes_left <= source_capacity) {
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Retrieving %d bytes from page %d, "
+ "%p-%p into %p-%p.\n",
+ bytes_left,
+ swapwriter_page_index,
+ source_start, source_start + bytes_left - 1,
+ dest_start, dest_start + bytes_left - 1);
+ memcpy(dest_start, source_start, bytes_left);
+ swapwriter_buffer_posn += bytes_left;
+ return buffer_size;
+ }
+
+ /* Next to read the next page */
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Retrieving %d bytes from page %d, %p-%p to %p-%p.\n",
+ source_capacity,
+ swapwriter_page_index,
+ source_start, source_start + source_capacity - 1,
+ dest_start, dest_start + source_capacity - 1);
+ memcpy(dest_start, source_start, source_capacity);
+ bytes_left -= source_capacity;
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Header link is at %p. Contents set to %lx = "
+ "swap device #%x, block %d.\n",
+ &((union p_diskpage) swapwriter_buffer).pointer->link.next,
+ ((union p_diskpage) swapwriter_buffer).pointer->link.next.val,
+ dev->bd_dev, pos);
+
+ swapwriter_page_index++;
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Reading header page %d. Dev is %x. Block is %lu.\n",
+ swapwriter_page_index, dev->bd_dev, pos);
+
+ suspend_bio_ops.bdev_page_io(READ, dev, pos, virt_to_page(swapwriter_buffer));
+
+ swapwriter_buffer_posn = 0;
+ }
+
+ return buffer_size - bytes_left;
+}
+
+static int swapwriter_read_header_cleanup(void)
+{
+ free_pages((unsigned long) swapwriter_buffer, 0);
+ return 0;
+}
+
+static int swapwriter_prepare_save_ranges(void)
+{
+ int i;
+
+ relativise_chain(&header_data.swapranges);
+
+ for (i = 0; i < MAX_SWAPFILES; i++)
+ relativise_chain(&header_data.block_chain[i]);
+
+ header_data.pd1start_block_range =
+ RANGE_RELATIVE(header_data.pd1start_block_range);
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Pagedir1 firstblockrange is %p.\n",
+ header_data.pd1start_block_range);
+
+ return 0;
+}
+
+static int swapwriter_post_load_ranges(void)
+{
+ int i;
+
+ if (get_rangepages_list())
+ return -ENOMEM;
+
+ absolutise_chain(&header_data.swapranges);
+
+ for (i = 0; i < MAX_SWAPFILES; i++)
+ absolutise_chain(&header_data.block_chain[i]);
+
+ header_data.pd1start_block_range =
+ RANGE_ABSOLUTE(header_data.pd1start_block_range);
+
+ return 0;
+}
+
+static int swapwriter_write_init(int stream_number)
+{
+ if (stream_number == 1) {
+ currentblockrange = header_data.pd1start_block_range;
+ currentblockoffset = header_data.pd1start_block_offset;
+ currentblockchain = header_data.pd1start_chain;
+ } else
+ for (currentblockchain = 0; currentblockchain < MAX_SWAPFILES;
+ currentblockchain++)
+ if (header_data.block_chain[currentblockchain].first) {
+ currentblockrange =
+ header_data.
+ block_chain[currentblockchain].first;
+ currentblockoffset = currentblockrange->minimum;
+ break;
+ }
+
+ BUG_ON(!currentblockrange);
+
+ currentblocksperpage = PAGE_SIZE /
+ suspend_bio_ops.get_block_size(swap_info[currentblockchain].bdev);
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Stream %d beginning from position: chain %d. "
+ "range %p, block %ld.\n",
+ stream_number,
+ currentblockchain, currentblockrange, currentblockoffset);
+
+ swapwriter_page_index = 1;
+ current_stream = stream_number;
+
+ suspend_bio_ops.reset_io_stats();
+
+ return 0;
+}
+
+static int swapwriter_write_chunk(struct page * buffer_page)
+{
+ int i;
+ struct submit_params submit_params;
+
+ if (TEST_ACTION_STATE(SUSPEND_TEST_FILTER_SPEED))
+ return 0;
+
+ if (currentblockchain == MAX_SWAPFILES) {
+ printk("Error! We have run out of blocks for writing data.\n");
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ if (!swap_info[i].swap_file)
+ printk("Swap slot %d is unused.\n", i);
+ else
+ printk("Swap slot %d is device %x.\n",
+ i, swap_info[i].bdev->bd_dev);
+ if (header_data.block_chain[i].size)
+ printk("Chain size for device %d is %d.\n", i,
+ header_data.block_chain[i].size);
+ }
+ return -ENOSPC;
+ }
+
+ if (!currentblockrange) {
+ do {
+ currentblockchain++;
+ } while ((currentblockchain < MAX_SWAPFILES) &&
+ (!header_data.block_chain[currentblockchain].first));
+
+ /* We can validly not have a new blockrange. We
+ * might be compressing data and the user was
+ * too optimistic in setting the compression
+ * ratio or we're just copying the pageset. */
+
+ if (currentblockchain == MAX_SWAPFILES) {
+ printk("Argh. Ran out of block chains.\n");
+ return -ENOSPC;
+ }
+
+ currentblockrange =
+ header_data.block_chain[currentblockchain].first;
+ currentblockoffset = currentblockrange->minimum;
+ currentblocksperpage = PAGE_SIZE /
+ suspend_bio_ops.get_block_size(swap_info[currentblockchain].bdev);
+ }
+
+ submit_params.readahead_index = -1;
+ submit_params.page = buffer_page;
+ submit_params.dev = swap_info[currentblockchain].bdev;
+ submit_params.blocks_used = currentblocksperpage;
+
+ /* Get the blocks */
+ for (i = 0; i < currentblocksperpage; i++) {
+ submit_params.blocks[i] = currentblockoffset;
+ GET_RANGE_NEXT(currentblockrange, currentblockoffset);
+ }
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Writing page %d. Dev is %x. Block is %lu. "
+ "Blocksperpage is %d.\n",
+ swapwriter_page_index,
+ submit_params.dev,
+ submit_params.blocks[0],
+ currentblocksperpage);
+
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "page:%d. bdev:%x. blocks (%d):",
+ swapwriter_page_index,
+ submit_params.dev->bd_dev,
+ submit_params.blocks_used);
+
+ for (i = 0; i < currentblocksperpage; i++)
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 0,
+ "0x%lx%s",
+ submit_params.blocks[i],
+ ((i+1) < currentblocksperpage) ? "," : "\n");
+
+ check_shift_keys(0, NULL);
+
+ suspend_bio_ops.submit_io(WRITE, &submit_params, 0);
+
+ swapwriter_page_index++;
+
+ return 0;
+}
+
+static int swapwriter_write_cleanup(void)
+{
+ if (current_stream == 2) {
+ header_data.pd1start_block_range = currentblockrange;
+ header_data.pd1start_block_offset = currentblockoffset;
+ header_data.pd1start_chain = currentblockchain;
+ }
+
+ suspend_bio_ops.finish_all_io();
+
+ suspend_bio_ops.check_io_stats();
+
+ return 0;
+}
+
+static int swapwriter_read_init(int stream_number)
+{
+ int i;
+
+ if (stream_number == 1) {
+ currentblockrange = header_data.pd1start_block_range;
+ currentblockoffset = header_data.pd1start_block_offset;
+ currentblockchain = header_data.pd1start_chain;
+ } else {
+ currentblockrange = NULL;
+ currentblockoffset = 0;
+ currentblockchain = 0;
+ for (currentblockchain = 0; currentblockchain < MAX_SWAPFILES;
+ currentblockchain++)
+ if (header_data.block_chain[currentblockchain].first) {
+ currentblockrange =
+ header_data.block_chain[currentblockchain].first;
+ currentblockoffset = currentblockrange->minimum;
+ break;
+ }
+
+ if (!currentblockrange){
+ printk("Error! Can't find any block chain data.\n");
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ if (!swap_info[i].swap_file)
+ printk("Swap slot %d is unused.\n", i);
+ else
+ printk("Swap slot %d is device %x.\n",
+ i, swap_info[i].bdev->bd_dev);
+ if (header_data.block_chain[i].size)
+ printk("Chain size for device %d"
+ " is %d.\n", i,
+ header_data.block_chain[i].size);
+ printk("First entry in chain at %p.\n",
+ header_data.block_chain[i].first);
+ }
+ BUG_ON(1);
+ }
+ }
+ suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0,
+ "Stream %d beginning from position: chain %d. "
+ "range %p, block %ld.\n",
+ stream_number,
+ currentblockchain, currentblockrange, currentblockoffset);
+
+ currentblocksperpage = get_blocks_per_page(currentblockchain);
+
+ swapwriter_page_index = 1;
+
+ suspend_bio_ops.reset_io_stats();
+
+ readahead_index = readahead_submit_index = -1;
+ readahead_allocs = readahead_frees = 0;
+
+ return 0;
+}
+
+static int swapwriter_begin_read_chunk(struct page * page,
+ int readahead_index, int sync)
+{
+ int i;
+ struct submit_params submit_params;
+
+ if (currentblockchain == MAX_SWAPFILES) {
+ /* Readahead might ask us to read too many blocks */
+ printk("Currentblockchain == MAX_SWAPFILES. Begin_read_chunk returning -ENODATA.\n");
+ return -ENODATA;
+ }
+
+ if (!currentblockrange) {
+ do {
+ currentblockchain++;
+ } while ((!header_data.block_chain[currentblockchain].first) &&
+ (currentblockchain < MAX_SWAPFILES));
+
+ /* We can validly not have a new blockrange. We
+ * might have allocated exactly the right amount
+ * of swap for the image and be reading the last
+ * block now.
+ */
+
+ if (currentblockchain == MAX_SWAPFILES) {
+ prepare_status(1, 0,
+ "Currentblockchain == MAX_SWAPFILES and "
+ "more data to be read. "
+ "Begin_read_chunk returning -ENOSPC.");
+ return -ENOSPC;
+ }
+
+ currentblockrange =
+ header_data.block_chain[currentblockchain].first;
+ currentblockoffset = currentblockrange->minimum;
+ currentblocksperpage = get_blocks_per_page(currentblockchain);
+ }
+
+ submit_params.readahead_index = readahead_index;
+ submit_params.page = page;
+ submit_params.dev = swap_info[currentblockchain].bdev;
+ submit_params.blocks_used = currentblocksperpage;
+
+ /* Get the blocks. There is no chance that they span chains. */
+ for (i = 0; i < currentblocksperpage; i++) {
+ submit_params.blocks[i] = currentblockoffset;
+ GET_RANGE_NEXT(currentblockrange, currentblockoffset);
+ }
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Reading page %d. Dev is %x. Block is %lu. "
+ "Blocksperpage is %d. Page is %p(%lx). Readahead index is %d.",
+ swapwriter_page_index,
+ submit_params.dev,
+ submit_params.blocks[0],
+ currentblocksperpage,
+ page, page_address(page),
+ readahead_index);
+
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ "page:%d. bdev:%x. blocks (%d):",
+ swapwriter_page_index,
+ submit_params.dev->bd_dev,
+ submit_params.blocks_used);
+
+ for (i = 0; i < currentblocksperpage; i++)
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 0,
+ "0x%lx%s",
+ submit_params.blocks[i],
+ ((i+1) < currentblocksperpage) ? "," : "\n");
+
+ check_shift_keys(0, NULL);
+
+ if ((i = suspend_bio_ops.submit_io(READ, &submit_params, sync)))
+ return -EPERM;
+
+ swapwriter_page_index++;
+
+ check_shift_keys(0, NULL);
+
+ return 0;
+}
+
+/* Note that we ignore the sync parameter. We are implementing
+ * read ahead, and will always wait until our readhead buffer has
+ * been read before returning.
+ */
+
+static int swapwriter_read_chunk(struct page * buffer_page, int sync)
+{
+ static int last_result;
+ unsigned long * virt;
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "At entrance to swapwriter_read_chunk.\n");
+
+ if (sync == SUSPEND_ASYNC)
+ return swapwriter_begin_read_chunk(buffer_page, -1, sync);
+
+ /* Start new readahead while we wait for our page */
+ if (readahead_index == -1) {
+ last_result = 0;
+ readahead_index = readahead_submit_index = 0;
+ }
+
+ /* Start a new readahead? */
+ if (last_result) {
+ /* We failed to submit a read, and have cleaned up
+ * all the readahead previously submitted */
+ if (readahead_submit_index == readahead_index)
+ return -EPERM;
+ goto wait;
+ }
+
+ do {
+ if ((test_suspend_state(SUSPEND_USE_MEMORY_POOL)) && (suspend_memory_pool_level(1) < 50))
+ break;
+
+ if (suspend_bio_ops.prepare_readahead(readahead_submit_index))
+ break;
+
+ readahead_allocs++;
+
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "\nBeginning new readahead %d.\n",
+ readahead_submit_index);
+
+ last_result = swapwriter_begin_read_chunk(
+ suspend_bio_ops.readahead_pages[readahead_submit_index],
+ readahead_submit_index, SUSPEND_ASYNC);
+ if (last_result) {
+ printk("Begin read chunk for page %d returned %d.\n",
+ readahead_submit_index, last_result);
+ suspend_bio_ops.cleanup_readahead(readahead_submit_index);
+ break;
+ }
+
+ readahead_submit_index++;
+
+ if (readahead_submit_index == max_async_ios)
+ readahead_submit_index = 0;
+
+ } while((!last_result) && (readahead_submit_index != readahead_index) &&
+ (!suspend_bio_ops.readahead_ready(readahead_index)));
+
+wait:
+ suspend_bio_ops.wait_on_readahead(readahead_index);
+
+ virt = kmap_atomic(buffer_page, KM_USER1);
+ suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0,
+ "Returned result of readahead %d,"
+ " Copying data from %p to %p.\n", readahead_index,
+ page_address(suspend_bio_ops.readahead_pages[readahead_index]),
+ virt);
+
+ memcpy(virt, page_address(suspend_bio_ops.readahead_pages[readahead_index]),
+ PAGE_SIZE);
+ kunmap_atomic(virt, KM_USER1);
+
+ suspend_bio_ops.cleanup_readahead(readahead_index);
+
+ readahead_frees++;
+
+ readahead_index++;
+ if (readahead_index == max_async_ios)
+ readahead_index = 0;
+
+ return 0;
+}
+
+static int swapwriter_read_cleanup(void)
+{
+ suspend_bio_ops.finish_all_io();
+ while (readahead_index != readahead_submit_index) {
+ suspend_bio_ops.cleanup_readahead(readahead_index);
+ readahead_frees++;
+ readahead_index++;
+ if (readahead_index == max_async_ios)
+ readahead_index = 0;
+ }
+ suspend_bio_ops.check_io_stats();
+ BUG_ON(readahead_allocs != readahead_frees);
+ return 0;
+}
+
+extern unsigned int nr_suspends;
+
+/* swapwriter_invalidate_image
+ *
+ */
+static int swapwriter_invalidate_image(void)
+{
+ union p_diskpage cur;
+ int result = 0;
+ char newsig[11];
+
+ cur.address = get_zeroed_page(GFP_ATOMIC);
+ if (!cur.address) {
+ printk("Unable to allocate a page for restoring the swap signature.\n");
+ return -ENOMEM;
+ }
+
+ suspend_store_free_mem(SUSPEND_FREE_INVALIDATE_IMAGE, 0);
+
+ /*
+ * If nr_suspends == 0, we must be booting, so no swap pages
+ * will be recorded as used yet.
+ */
+
+ if (nr_suspends > 0)
+ swapwriter_release_storage();
+
+ /*
+ * We don't do a sanity check here: we want to restore the swap
+ * whatever version of kernel made the suspend image.
+ *
+ * We need to write swap, but swap may not be enabled so
+ * we write the device directly
+ */
+
+ suspend_bio_ops.bdev_page_io(READ, resume_block_device,
+ resume_firstblock, virt_to_page(cur.pointer));
+
+ result = parse_signature(cur.pointer->swh.magic.magic, 1);
+
+ if (result < 4)
+ goto out;
+
+ strncpy(newsig, cur.pointer->swh.magic.magic, 10);
+ newsig[10] = 0;
+ suspend_message(SUSPEND_ANY_SECTION, SUSPEND_VERBOSE, 0,
+ "Swap signature will be set to %s.\n", newsig);
+
+ suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, resume_firstblock,
+ virt_to_page(cur.pointer));
+
+ if (!nr_suspends)
+ printk(KERN_WARNING name_suspend "Image invalidated.\n");
+out:
+ suspend_bio_ops.finish_all_io();
+ free_pages(cur.address, 0);
+ suspend_store_free_mem(SUSPEND_FREE_INVALIDATE_IMAGE, 1);
+ return 0;
+}
+
+/*
+ * workspace_size
+ *
+ * Description:
+ * Returns the number of bytes of RAM needed for this
+ * code to do its work. (Used when calculating whether
+ * we have enough memory to be able to suspend & resume).
+ *
+ */
+static unsigned long swapwriter_memory_needed(void)
+{
+ return 1;
+}
+
+/* Print debug info
+ *
+ * Description:
+ */
+
+static int swapwriter_print_debug_stats(char * buffer, int size)
+{
+ int len = 0;
+ struct sysinfo sysinfo;
+
+ if (active_writer != &swapwriterops) {
+ len = suspend_snprintf(buffer, size, "- Swapwriter inactive.\n");
+ return len;
+ }
+
+ len = suspend_snprintf(buffer, size, "- Swapwriter active.\n");
+ if (swapfilename[0])
+ len+= suspend_snprintf(buffer+len, size-len,
+ " Attempting to automatically swapon: %s.\n", swapfilename);
+
+ si_swapinfo(&sysinfo);
+
+ len+= suspend_snprintf(buffer+len, size-len, " Swap available for image: %ld pages.\n",
+ sysinfo.freeswap + swapwriter_storage_allocated());
+
+ return len;
+}
+
+/*
+ * Storage needed
+ *
+ * Returns amount of space in the swap header required
+ * for the swapwriter's data. This ignores the links between
+ * pages, which we factor in when allocating the space.
+ *
+ * We ensure the space is allocated, but actually save the
+ * data from write_header_init and therefore don't also define a
+ * save_config_info routine.
+ */
+static unsigned long swapwriter_storage_needed(void)
+{
+ return sizeof(header_data);
+}
+
+/*
+ * Image_exists
+ *
+ */
+
+static int swapwriter_image_exists(void)
+{
+ int signature_found;
+ union p_diskpage diskpage;
+
+ if (!resume_device) {
+ printk("Not even trying to read header because resume_device is not set.\n");
+ return 0;
+ }
+
+ //PRINTFREEMEM("at start of swapwriter_image_exists.");
+
+ diskpage.address = get_zeroed_page(GFP_ATOMIC);
+
+ /* FIXME: Make sure bdev_page_io handles wrong parameters */
+ suspend_bio_ops.bdev_page_io(READ, resume_block_device, resume_firstblock, virt_to_page(diskpage.ptr));
+ suspend_bio_ops.finish_all_io();
+ signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0);
+
+ if (signature_found < 2) {
+ printk(KERN_ERR name_suspend "This is normal swap space.\n" );
+ return 0; /* non fatal error */
+ } else if (signature_found == -1) {
+ printk(KERN_ERR name_suspend "Unable to find a signature. Could you have moved a swap file?\n");
+ return 0;
+ } else if (signature_found < 6) {
+ if ((!(test_suspend_state(SUSPEND_NORESUME_SPECIFIED)))
+ && suspend_early_boot_message(1, "Detected the signature of an alternate implementation.\n"))
+ set_suspend_state(SUSPEND_NORESUME_SPECIFIED);
+ return 0;
+ } else if ((signature_found >> 1) != SIGNATURE_VER) {
+ if ((!(test_suspend_state(SUSPEND_NORESUME_SPECIFIED))) &&
+ suspend_early_boot_message(1, "Found a different style suspend image signature."))
+ set_suspend_state(SUSPEND_NORESUME_SPECIFIED);
+ }
+
+ return 1;
+}
+
+/*
+ * Mark resume attempted.
+ *
+ * Record that we tried to resume from this image.
+ */
+
+static void swapwriter_mark_resume_attempted(void)
+{
+ union p_diskpage diskpage;
+ int signature_found;
+
+ if (!resume_device) {
+ printk("Not even trying to record attempt at resuming"
+ " because resume_device is not set.\n");
+ return;
+ }
+
+ diskpage.address = get_zeroed_page(GFP_ATOMIC);
+
+ /* FIXME: Make sure bdev_page_io handles wrong parameters */
+ suspend_bio_ops.bdev_page_io(READ, resume_block_device, resume_firstblock, virt_to_page(diskpage.ptr));
+ signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0);
+
+ switch (signature_found) {
+ case 12:
+ case 13:
+ diskpage.pointer->swh.magic.magic[5] |= 0x80;
+ break;
+ }
+
+ suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, resume_firstblock,
+ virt_to_page(diskpage.ptr));
+ suspend_bio_ops.finish_all_io();
+ free_pages(diskpage.address, 0);
+ return;
+}
+
+/*
+ * Parse Image Location
+ *
+ * Attempt to parse a resume2= parameter.
+ * Swap Writer accepts:
+ * resume2=swap:DEVNAME[:FIRSTBLOCK][@BLOCKSIZE]
+ *
+ * Where:
+ * DEVNAME is convertable to a dev_t by name_to_dev_t
+ * FIRSTBLOCK is the location of the first block in the swap file
+ * (specifying for a swap partition is nonsensical but not prohibited).
+ * BLOCKSIZE is the logical blocksize >= 512 & <= PAGE_SIZE,
+ * mod 512 == 0 of the device.
+ * Data is validated by attempting to read a swap header from the
+ * location given. Failure will result in swapwriter refusing to
+ * save an image, and a reboot with correct parameters will be
+ * necessary.
+ */
+
+static int swapwriter_parse_image_location(char * commandline, int only_writer)
+{
+ char *thischar, *devstart = NULL, *colon = NULL, *at_symbol = NULL;
+ union p_diskpage diskpage;
+ int signature_found;
+
+ CLEAR_RESULT_STATE(SUSPEND_ABORTED);
+
+ if (strncmp(commandline, "swap:", 5)) {
+ if (!only_writer) {
+ printk(name_suspend "Swapwriter: Image location doesn't begin with 'swap:'\n");
+ return 1;
+ }
+ } else
+ commandline += 5;
+
+ devstart = thischar = commandline;
+ while ((*thischar != ':') && ((thischar - commandline) < 250) && (*thischar))
+ thischar++;
+
+ if (*thischar == ':') {
+ colon = thischar;
+ *colon = 0;
+ thischar++;
+ }
+
+ while ((*thischar != '@') && ((thischar - commandline) < 250) && (*thischar))
+ thischar++;
+
+ if (*thischar == '@') {
+ at_symbol = thischar;
+ *at_symbol = 0;
+ }
+
+ if (colon)
+ resume_firstblock = (int) simple_strtoul(colon + 1, NULL, 0);
+ else
+ resume_firstblock = 0;
+ printk("Looking for first block of swap header at block %x.\n", resume_firstblock);
+
+ if (at_symbol) {
+ resume_firstblocksize = (int) simple_strtoul(at_symbol + 1, NULL, 0);
+ if (resume_firstblocksize & 0x1FF)
+ printk("Blocksizes are usually a multiple of 512. Don't expect this to work!\n");
+ } else
+ resume_firstblocksize = 4096;
+ printk("Setting logical block size of resume device to %d.\n", resume_firstblocksize);
+
+ if (try_to_parse_resume_device(devstart))
+ goto invalid;
+
+ if (colon)
+ *colon = ':';
+ if (at_symbol)
+ *at_symbol = '@';
+
+ if ((suspend_bio_ops.get_block_size(resume_block_device)
+ != resume_firstblocksize) &&
+ (suspend_bio_ops.set_block_size(resume_block_device, resume_firstblocksize)
+ == -EINVAL))
+ goto invalid;
+
+ diskpage.address = get_zeroed_page(GFP_ATOMIC);
+ if (suspend_bio_ops.bdev_page_io(READ, resume_block_device, resume_firstblock, virt_to_page(diskpage.ptr))) {
+ printk(KERN_ERR name_suspend "Failed to submit I/O.\n");
+ return -EINVAL;
+ }
+ suspend_bio_ops.finish_all_io();
+ signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0);
+ free_page((unsigned long) diskpage.address);
+
+ if (signature_found != -1) {
+ printk(KERN_ERR name_suspend "Swap space signature found.\n");
+ return 0;
+ }
+
+ printk(KERN_ERR name_suspend "Sorry. No swap signature found at specified location.\n");
+ return -EINVAL;
+
+invalid:
+ if (colon)
+ *colon = ':';
+ if (at_symbol)
+ *at_symbol = '@';
+ printk(KERN_ERR name_suspend "Sorry. Location looks invalid.\n");
+ return -EINVAL;
+}
+
+int header_locations_read_proc(char * page, char ** start, off_t off, int count,
+ int *eof, void *data)
+{
+ int i, printedpartitionsmessage = 0, len = 0, haveswap = 0, device_block_size;
+ struct inode *swapf = 0;
+ int zone;
+ char * path_page = (char *) __get_free_page(GFP_KERNEL);
+ char * path;
+ int path_len;
+
+ *eof = 1;
+ if (!page)
+ return 0;
+
+ for (i = 0; i < MAX_SWAPFILES; i++) {
+ if (!swap_info[i].swap_file)
+ continue;
+
+ if (S_ISBLK(swap_info[i].swap_file->f_dentry->d_inode->i_mode)) {
+ haveswap = 1;
+ if (!printedpartitionsmessage) {
+ len += sprintf(page + len,
+ "For swap partitions, simply use the format: resume2=swap:/dev/hda1.\n");
+ printedpartitionsmessage = 1;
+ }
+ } else {
+ path_len = 0;
+
+ path = get_path_for_swapfile(i, path_page);
+ path_len = sprintf(path_page, "%-31s ", path);
+
+ haveswap = 1;
+ swapf = swap_info[i].swap_file->f_dentry->d_inode;
+ device_block_size = block_size(swap_info[i].bdev);
+ if (!(zone = bmap(swapf,0))) {
+ len+= sprintf(page + len,
+ "Swapfile %-31s has been corrupted. Reuse mkswap on it and try again.\n",
+ path_page);
+ } else {
+ len+= sprintf(page + len, "For swapfile `%s`, use resume2=swap:/dev/<partition name>:0x%x@%d.\n",
+ path_page,
+ zone, device_block_size);
+ }
+
+ }
+ }
+
+ if (!haveswap)
+ len = sprintf(page, "You need to turn on swap partitions before examining this file.\n");
+
+ free_pages((unsigned long) path_page, 0);
+ return len;
+}
+
+extern int attempt_to_parse_resume_device(void);
+
+static struct suspend_proc_data swapwriter_proc_data[] = {
+ {
+ .filename = "swapfilename",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_STRING,
+ .data = {
+ .string = {
+ .variable = swapfilename,
+ .max_length = 255,
+ }
+ }
+ },
+
+ {
+ .filename = "headerlocations",
+ .permissions = PROC_READONLY,
+ .type = SUSPEND_PROC_DATA_CUSTOM,
+ .data = {
+ .special = {
+ .read_proc = header_locations_read_proc,
+ }
+ }
+ },
+
+ { .filename = "disable_swapwriter",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &swapwriterops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ },
+ .write_proc = attempt_to_parse_resume_device,
+ }
+};
+
+static struct suspend_plugin_ops swapwriterops = {
+ .type = WRITER_PLUGIN,
+ .name = "Swap Writer",
+ .memory_needed = swapwriter_memory_needed,
+ .print_debug_info = swapwriter_print_debug_stats,
+ .storage_needed = swapwriter_storage_needed,
+ .initialise = swapwriter_initialise,
+ .cleanup = swapwriter_cleanup,
+ .dpm_set_devices = swapwriter_dpm_set_devices,
+ .ops = {
+ .writer = {
+ .write_init = swapwriter_write_init,
+ .write_chunk = swapwriter_write_chunk,
+ .write_cleanup = swapwriter_write_cleanup,
+ .read_init = swapwriter_read_init,
+ .read_chunk = swapwriter_read_chunk,
+ .read_cleanup = swapwriter_read_cleanup,
+ .noresume_reset = swapwriter_noresume_reset,
+ .storage_available = swapwriter_storage_available,
+ .storage_allocated = swapwriter_storage_allocated,
+ .release_storage = swapwriter_release_storage,
+ .allocate_header_space = swapwriter_allocate_header_space,
+ .allocate_storage = swapwriter_allocate_storage,
+ .image_exists = swapwriter_image_exists,
+ .mark_resume_attempted = swapwriter_mark_resume_attempted,
+ .write_header_init = swapwriter_write_header_init,
+ .write_header_chunk = swapwriter_write_header_chunk,
+ .write_header_cleanup = swapwriter_write_header_cleanup,
+ .read_header_init = swapwriter_read_header_init,
+ .read_header_chunk = swapwriter_read_header_chunk,
+ .read_header_cleanup = swapwriter_read_header_cleanup,
+ .prepare_save_ranges = swapwriter_prepare_save_ranges,
+ .post_load_ranges = swapwriter_post_load_ranges,
+ .invalidate_image = swapwriter_invalidate_image,
+ .parse_image_location = swapwriter_parse_image_location,
+ }
+ }
+};
+
+/* ---- Registration ---- */
+static __init int swapwriter_load(void)
+{
+ int result;
+ int i, numfiles = sizeof(swapwriter_proc_data) / sizeof(struct suspend_proc_data);
+
+ if (!(result = suspend_register_plugin(&swapwriterops))) {
+ printk("Software Suspend Swap Writer registered.\n");
+
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&swapwriter_proc_data[i]);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void swapwriter_unload(void)
+{
+ int i, numfiles = sizeof(swapwriter_proc_data) / sizeof(struct suspend_proc_data);
+
+ printk("Software Suspend Swap Writer unloading.\n");
+
+ for (i=0; i< numfiles; i++)
+ suspend_unregister_procfile(&swapwriter_proc_data[i]);
+ suspend_unregister_plugin(&swapwriterops);
+}
+
+module_init(swapwriter_load);
+module_exit(swapwriter_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Suspend2 swap writer");
+#else
+late_initcall(swapwriter_load);
+#endif


2004-11-24 17:48:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 7/51: Reboot handler hook.

> -#ifdef CONFIG_SOFTWARE_SUSPEND
> +#ifdef CONFIG_SOFTWARE_SUSPEND2
> case LINUX_REBOOT_CMD_SW_SUSPEND:
> {
> - int ret = software_suspend();
> + int ret = -EINVAL;
> + if (!(test_suspend_state(SUSPEND_DISABLED))) {
> + suspend_try_suspend();
> + ret = 0;
> + }
> unlock_kernel();

total crap. Thbis patch breaks the existing swsusp and turns a clean
interface into a horrible one. Just implement am

int software_suspend(void)
{
if (test_suspend_state(SUSPEND_DISABLED))
return -EINVAL;
suspend_try_suspend();
return 0;
}

in your code.

2004-11-24 17:55:05

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

On Wed, 24 Nov 2004, Nigel Cunningham wrote:

> + * SMP support:
> + * All SMP processors enter this routine during suspend. The one through
> + * which the suspend is initiated (which, for simplicity, is always CPU 0)
> + * sends the others here using an IPI during do_suspend2_suspend_1. They
> + * remain here until after the atomic copy of the kernel is made, to ensure
> + * that they don't mess with memory in the meantime (even just idling will
> + * do that). Once the atomic copy is made, they are free to carry on idling.
> + * Note that we must let them go, because if we're using compression, the
> + * vfree calls in the compressors will result in IPIs being called and hanging
> + * because the CPUs are still here.
> + *
> + * At resume time, we do a similar thing. CPU 0 sends the others in here using
> + * an IPI. It then copies the original kernel back, restores its own processor
> + * context and flushes local tlbs before freeing the others to do the same.
> + * They can then go back to idling while CPU 0 reloads pageset 2, cleans up
> + * and unfreezes the processes.
> + *
> + * (Remember that freezing and thawing processes also uses IPIs, as may
> + * decompressing the data. Again, therefore, we cannot leave the other processors
> + * in here).
> + *
> + * At the moment, we do nothing about APICs, even though the code is there.

Ok,
Do you see anything missing (from an implementation point of view)
for the following?

Suspend:
1) suspend all cpus, save cpu0
2) proceed with state saving on cpu0 only
3) begin suspend

Resume:
1) begin resume
2) offline all currently online cpus
3) proceed with state restoring
4) online all previously online cpus

A lot of the subsystems which have work split across cpus will now have
work migrated across to cpu0, in that regard, which have you made swsusp
savvy? It looks like the timer changes might need looking at any others?

Thanks,
Zwane

2004-11-24 18:01:39

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Suspend 2 merge: 19/51: Remove MTRR sysdev support.

On Wed, 24 Nov 2004, Nigel Cunningham wrote:

> This patch removes sysdev support for MTRRs (potential SMP hang and
> shouldn't be done with interrupts done anyway). Instead, we save and
> restore MTRRs when entering and exiting the processor freezers (ie when
> saving the registers & context for each CPU via an SMP call).

I take it this has been tested with AGP and X11 running?

Thanks,
Zwane

2004-11-24 18:13:59

by Dave Hansen

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

On Wed, 2004-11-24 at 04:57, Nigel Cunningham wrote:
> While eating memory, we will potentially trigger this a lot. We
> therefore disable the message when suspending.
>
> diff -ruN 503-disable-page-alloc-warnings-while-suspending-old/mm/page_alloc.c 503-disable-page-alloc-warnings-while-suspending-new/mm/page_alloc.c
> --- 503-disable-page-alloc-warnings-while-suspending-old/mm/page_alloc.c 2004-11-06 09:24:37.231308424 +1100
> +++ 503-disable-page-alloc-warnings-while-suspending-new/mm/page_alloc.c 2004-11-06 09:24:40.844759096 +1100
> @@ -725,7 +725,10 @@
> }
>
> nopage:
> - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
> + if ((!(gfp_mask & __GFP_NOWARN)) &&
> + (!test_suspend_state(SUSPEND_RUNNING)) &&
> + printk_ratelimit()) {
> +
> printk(KERN_WARNING "%s: page allocation failure."
> " order:%d, mode:0x%x\n",
> p->comm, order, gfp_mask);

Following Documentation/SubmittingPatches, please submit patches made
with "diff -urp":

-p --show-c-function
Show which C function each change is in.

Otherwise, it's a lot harder to figure out what you're modifying.

-- Dave

2004-11-24 17:14:38

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 38/51: Page directory support.

A pageset is a group of pages that are saved as part of the image.
Suspend uses two pagesets: pageset2 contains the LRU pages and pageset1
contains all other pages saved. A pagedir is the original name for
pagesets. I use it more to refer to the metadata for the pageset.

Note that all of our metadata is actually stored in extents (I called
them ranges before I knew what an extent was). The struct pbe2 is an
abstraction of this data, roughly equivalent to the pbes that swsusp
uses (hence the name) and the *pbe* functions at the top of this file.

Here we also have the code for making our atomic copy when suspending,
allocating and freeing the metadata and working to ensure the copy of
pagedir1 loaded at resume time doesn't get overwritten by itself
('collide') as we restore the kernel.

diff -ruN 828-pagedir-old/kernel/power/pagedir.c 828-pagedir-new/kernel/power/pagedir.c
--- 828-pagedir-old/kernel/power/pagedir.c 1970-01-01 10:00:00.000000000 +1000
+++ 828-pagedir-new/kernel/power/pagedir.c 2004-11-17 19:05:47.000000000 +1100
@@ -0,0 +1,532 @@
+/*
+ * kernel/power/pagedir.c
+ *
+ * Copyright (C) 1998-2001 Gabor Kuti <[email protected]>
+ * Copyright (C) 1998,2001,2002 Pavel Machek <[email protected]>
+ * Copyright (C) 2002-2003 Florent Chabaud <[email protected]>
+ * Copyright (C) 2002-2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Routines for handling pagesets.
+ * Note that pbes aren't actually stored as such. They're stored as
+ * ranges (extents is the term, I'm told).
+ */
+
+#define SUSPEND_PAGEDIR_C
+#include <linux/suspend.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+
+extern struct pagedir pagedir1, pagedir2, pagedir_resume;
+
+#include "suspend.h"
+#include "pageflags.h"
+
+/* setup_pbe_variable
+ *
+ * Description: Set up one variable in a page backup entry from the range list.
+ * Arguments: unsigned long: The variable which will contain the
+ * value.
+ * struct range**: Address of the pointer to the current
+ * range.
+ * struct rangechain*: Address of the rangechain we are
+ * traversing.
+ */
+static inline void setup_pbe_variable(unsigned long * variable, struct range ** currentrange,
+ struct rangechain * chain)
+{
+ *currentrange = chain->first;
+ if (chain->first)
+ *variable = chain->first->minimum;
+ else
+ *variable = 0;
+}
+
+/* get_first_pbe
+ *
+ * Description: Get the first page backup entry for a pagedir.
+ * Arguments: struct pbe2 *: Address of the page backup entry we're
+ * populating.
+ * struct pagedir: Pagedir providing the data.
+ */
+void get_first_pbe(struct pbe2 * pbe, struct pagedir * pagedir)
+{
+ unsigned long currentorig, currentaddress;
+
+ pbe->pagedir = pagedir;
+
+ /* Get raw initial values */
+ setup_pbe_variable((unsigned long *) &pbe->origaddress,
+ &pbe->currentorigrange, &pagedir->origranges);
+ setup_pbe_variable((unsigned long *) &pbe->address,
+ &pbe->currentdestrange, &pagedir->destranges);
+
+ /* Convert to range values */
+ currentorig = (unsigned long) pbe->origaddress;
+ currentaddress = (unsigned long) pbe->address;
+
+ pbe->origaddress = mem_map + currentorig;
+ pbe->address = mem_map + currentaddress;
+
+ if ((currentaddress < 0) || (currentaddress > max_mapnr))
+ panic("Argh! Destination range value %ld is invalid!",
+ currentaddress);
+}
+
+/* get_next_pbe
+ *
+ * Description: Get the next page backup entry in a pagedir.
+ * Arguments: struct pbe2 *: Address of the pbe we're updating.
+ */
+void get_next_pbe(struct pbe2 * pbe)
+{
+ unsigned long currentorig, currentaddress;
+
+ /* Convert to range values */
+ currentorig = (pbe->origaddress - mem_map);
+ currentaddress = (pbe->address - mem_map);
+
+ /* Update values */
+ GET_RANGE_NEXT(pbe->currentorigrange, currentorig);
+ GET_RANGE_NEXT(pbe->currentdestrange, currentaddress);
+
+ pbe->origaddress = mem_map + currentorig;
+ pbe->address = mem_map + currentaddress;
+}
+
+/*
+ * --------------------------------------------------------------------------------------
+ *
+ * Local Page Flags routines.
+ *
+ * Rather than using the rare and precious flags in struct page, we allocate
+ * our own bitmaps dynamically.
+ *
+ */
+
+/* ------------------------------------------------------------------------- */
+
+/* copy_pageset1
+ *
+ * Description: Make the atomic copy of pageset1. We can't use copy_page (as we
+ * once did) because we can't be sure what side effects it has. On
+ * my old Duron, with 3DNOW, kernel_fpu_begin increments preempt
+ * count, making our preempt count at resume time 4 instead of 3.
+ *
+ * We don't want to call kmap_atomic unconditionally because it has
+ * the side effect of incrementing the preempt count, which will
+ * leave it one too high post resume (the page containing the
+ * preempt count will be copied after its incremented. This is
+ * essentially the same problem.
+ */
+
+void copy_pageset1(void)
+{
+ int i = 0;
+ struct pbe2 pbe;
+
+ get_first_pbe(&pbe, &pagedir1);
+
+ for (i = 0; i < pageset1_size; i++) {
+ int loop;
+ unsigned long * origpage;
+ unsigned long * copypage = page_address(pbe.address);
+
+ if (PageHighMem(pbe.origaddress))
+ origpage = kmap_atomic(pbe.origaddress, KM_USER1);
+ else
+ origpage = page_address(pbe.origaddress);
+
+ for (loop=0; loop < (PAGE_SIZE / sizeof(unsigned long)); loop++)
+ *(copypage + loop) = *(origpage + loop);
+ if (PageHighMem(pbe.origaddress))
+ kunmap_atomic(origpage, KM_USER1);
+
+ get_next_pbe(&pbe);
+ }
+}
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+void suspend_map_atomic_copy_pages(void)
+{
+ int i = 0;
+ struct pbe2 pbe;
+
+ get_first_pbe(&pbe, &pagedir1);
+
+ for (i = 0; i < pageset1_size; i++) {
+ int orig_was_mapped = 1, copy_was_mapped = 1;
+
+ if (!PageHighMem(pbe.origaddress)) {
+ orig_was_mapped = suspend_map_kernel_page(pbe.origaddress, 1);
+ if (!orig_was_mapped)
+ SetPageUnmap(pbe.origaddress);
+ }
+ copy_was_mapped = suspend_map_kernel_page(pbe.address, 1);
+ if (!copy_was_mapped)
+ SetPageUnmap(pbe.address);
+
+ get_next_pbe(&pbe);
+ }
+}
+
+void suspend_unmap_atomic_copy_pages(void)
+{
+ int i;
+ for (i = 0; i < max_mapnr; i++)
+ if (PageUnmap(mem_map + i))
+ suspend_map_kernel_page(mem_map + i, 0);
+}
+#endif
+
+/* free_pagedir
+ *
+ * Description: Free a previously allocated pagedir.
+ * Arguments: struct pagedir *: Pointer to the pagedir being freed.
+ */
+void free_pagedir(struct pagedir * p)
+{
+ PRINTFREEMEM("at start of free_pagedir");
+
+ if (p->allocdranges.first) {
+ /* Free allocated pages */
+ struct range * rangepointer;
+ unsigned long pagenumber;
+ range_for_each(&p->allocdranges, rangepointer, pagenumber) {
+ ClearPageNosave(mem_map+pagenumber);
+ free_page((unsigned long) page_address(mem_map+pagenumber));
+ }
+ }
+
+ suspend_store_free_mem(SUSPEND_FREE_EXTRA_PD1, 1);
+
+ /* For pagedir 2, destranges == origranges */
+ if (p->pagedir_num == 2)
+ p->destranges.first = NULL;
+
+ put_range_chain(&p->origranges);
+ put_range_chain(&p->destranges);
+ put_range_chain(&p->allocdranges);
+
+ PRINTFREEMEM("at end of free_pagedir");
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_MEDIUM, 0,
+ "Pageset size was %d.\n", p->pageset_size);
+ p->pageset_size = 0;
+}
+
+/* PageInPagedir
+ *
+ * Description: Determine whether a page is in a pagedir.
+ * Arguments: struct pagedir * The pagedir to search.
+ * struct page * The page to look for.
+ * Result: int Bitmap of state:
+ * Bit 0: Source page
+ * Bit 1: Dest page
+ * Bit 2: Allocated
+ * (Should only result in 0, 1, 2 or 6).
+ */
+
+int PageInPagedir(struct pagedir * p, struct page * page)
+{
+ int page_sought = page_to_pfn(page);
+ int result = 0;
+
+ if (p->origranges.first) {
+ struct range * rangepointer;
+ unsigned long pagenumber;
+ range_for_each(&p->origranges, rangepointer, pagenumber) {
+ if (pagenumber == page_sought)
+ result |= 1;
+ if (pagenumber >= page_sought)
+ break;
+ }
+ }
+
+ if (p->destranges.first) {
+ /* Free allocated pages */
+ struct range * rangepointer;
+ unsigned long pagenumber;
+ range_for_each(&p->destranges, rangepointer, pagenumber) {
+ if (pagenumber == page_sought)
+ result |= 2;
+ if (pagenumber >= page_sought)
+ break;
+ }
+ }
+
+ if (p->allocdranges.first) {
+ /* Free allocated pages */
+ struct range * rangepointer;
+ unsigned long pagenumber;
+ range_for_each(&p->allocdranges, rangepointer, pagenumber) {
+ if (pagenumber == page_sought)
+ result |= 4;
+ if (pagenumber >= page_sought)
+ break;
+ }
+ }
+
+ return result;
+}
+
+/* allocate_extra_pagedir_memory
+ *
+ * Description: Allocate memory for making the atomic copy of pagedir1 in the
+ * case where it is bigger than pagedir2.
+ * Arguments: struct pagedir *: The pagedir for which we should
+ * allocate memory.
+ * int: Size of pageset 1.
+ * int: Size of pageset 2.
+ * Result: int. Zero on success. One if unable to allocate enough memory.
+ */
+int allocate_extra_pagedir_memory(struct pagedir * p, int pageset_size,
+ int alloc_from)
+{
+ int num_to_alloc = pageset_size - alloc_from - p->allocdranges.size;
+ int j, order;
+
+ prepare_status(0, 0, "Preparing page directory.");
+
+ PRINTFREEMEM("at start of allocate_extra_pagedir_memory");
+
+ if (num_to_alloc < 1)
+ num_to_alloc = 0;
+
+ if (num_to_alloc) {
+ int num_added = 0, numnosaveallocated=0;
+ int origallocd = alloc_from + p->allocdranges.size;
+
+ PRINTFREEMEM("prior to attempt");
+
+ order = generic_fls(num_to_alloc);
+ if (order >= MAX_ORDER)
+ order = MAX_ORDER - 1;
+
+ while (num_added < num_to_alloc) {
+ struct page * newpage;
+ unsigned long virt;
+
+ while ((1 << order) > (num_to_alloc - num_added))
+ order--;
+
+ virt = get_grabbed_pages(order);
+ while ((!virt) && (order > 0)) {
+ order--;
+ virt = get_grabbed_pages(order);
+ }
+
+ if (!virt) {
+ p->pageset_size += num_added;
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ " Allocated (extra) memory for pages"
+ " from %d-%d (%d pages).\n",
+ origallocd + 1, pageset_size,
+ pageset_size - origallocd);
+ printk("Couldn't get enough yet."
+ " %d pages short.\n",
+ num_to_alloc - num_added);
+ PRINTFREEMEM("at abort of "
+ "allocate_extra_pagedir_memory");
+ suspend_store_free_mem(SUSPEND_FREE_EXTRA_PD1, 0);
+ return 1;
+ }
+
+ newpage = virt_to_page(virt);
+ suspend_store_free_mem(SUSPEND_FREE_EXTRA_PD1, 0);
+ for (j = 0; j < (1 << order); j++) {
+ SetPageNosave(newpage + j);
+ /* Pages will be freed one at a time. */
+ set_page_count(newpage + j, 1);
+ add_to_range_chain(&p->allocdranges, newpage - mem_map + j);
+ numnosaveallocated++;
+ }
+ suspend_store_free_mem(SUSPEND_FREE_RANGE_PAGES, 0);
+ num_added+= (1 << order);
+ }
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_VERBOSE, 1,
+ " Allocated (extra) memory for pages "
+ "from %d-%d (%d pages).\n",
+ origallocd + 1, pageset_size,
+ pageset_size - origallocd);
+ }
+
+ p->pageset_size = pageset_size;
+
+ suspend_store_free_mem(SUSPEND_FREE_EXTRA_PD1, 0);
+ PRINTFREEMEM("at end of allocate_extra_pagedir_memory");
+ return 0;
+}
+
+/* mark_pages_for_pageset2
+ *
+ * Description: Mark unshared pages in processes not needed for suspend as
+ * being able to be written out in a separate pagedir.
+ * HighMem pages are simply marked as pageset2. They won't be
+ * needed during suspend.
+ */
+
+void mark_pages_for_pageset2(void)
+{
+ int i, numpageset2 = 0;
+ struct zone * zone;
+ unsigned long flags;
+
+ if (max_mapnr != num_physpages) {
+ abort_suspend("mapnr is not expected");
+ return;
+ }
+
+ clear_map(pageset2_map);
+
+ /*
+ * Note that we don't clear the map to begin with!
+ * This is because if we eat memory, we loose track
+ * of LRU pages that are still in use but taken off
+ * the LRU. If I can figure out how the VM keeps
+ * track of them, I might be able to tweak this a
+ * little further and decrease pageset one's size
+ * further.
+ *
+ * (Memory grabbing clears the pageset2 flag on
+ * pages that are really freed!).
+ */
+
+ /* Add LRU pages */
+ for_each_zone(zone) {
+ spin_lock_irqsave(&zone->lru_lock, flags);
+ if (zone->nr_inactive) {
+ struct page * page;
+ list_for_each_entry(page, &zone->inactive_list, lru)
+ SetPagePageset2(page);
+ }
+ if (zone->nr_active) {
+ struct page * page;
+ list_for_each_entry(page, &zone->active_list, lru)
+ SetPagePageset2(page);
+ }
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ }
+
+
+ /* Ensure range pages are not Pageset2 */
+ if (num_range_pages) {
+ if (get_rangepages_list())
+ return;
+
+ for (i = 1; i <= num_range_pages; i++) {
+ struct page * page;
+ page = virt_to_page(get_rangepages_list_entry(i));
+ // Must be assigned by the time recalc stats is called
+ if (PagePageset2(page)) {
+ suspend_message(SUSPEND_PAGESETS, SUSPEND_ERROR, 1,
+ "Pagedir[%d] was marked as pageset2 -"
+ " unmarking.\n", i);
+ ClearPagePageset2(page);
+ numpageset2--;
+ }
+ }
+ }
+
+ /* Finally, ensure that Slab pages are not Pageset2. */
+
+ for (i = 0; i < max_mapnr; i++) {
+ if (PageSlab(mem_map+i)) {
+ if (TestAndClearPagePageset2(mem_map+i)) {
+ //suspend_message(SUSPEND_PAGESETS, SUSPEND_ERROR, 1,
+ printk(
+ "Found page %d is slab page "
+ "but marked pageset 2.\n", i);
+ numpageset2--;
+ }
+ }
+ }
+}
+
+/* warmup_collision_cache
+ *
+ * Description: Mark the pages which are used by the original kernel.
+ */
+void warmup_collision_cache(void) {
+ int i;
+ struct range * rangepointer = NULL;
+ unsigned long pagenumber;
+
+ /* Allocatemap doesn't get deallocated because it's forgotten when we
+ * copy PageDir1 back. It doesn't matter if it collides because it is
+ * not used during the copy back itself.
+ */
+ allocate_local_pageflags(&in_use_map, 0);
+ suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1, "Setting up pagedir cache...");
+ for (i = 0; i < max_mapnr; i++)
+ ClearPageInUse(mem_map+i);
+
+ range_for_each(&pagedir_resume.origranges, rangepointer, pagenumber)
+ SetPageInUse(mem_map+pagenumber);
+}
+
+/* get_pageset1_load_addresses
+ *
+ * Description: We check here that pagedir & pages it points to won't collide
+ * with pages where we're going to restore from the loaded pages
+ * later.
+ * Returns: Zero on success, one if couldn't find enough pages (shouldn't
+ * happen).
+ */
+
+int get_pageset1_load_addresses(void)
+{
+ int i, nrdone = 0, result = 0;
+ void **eaten_memory = NULL, **this;
+ struct page * pageaddr = NULL;
+
+ /*
+ * Because we're trying to make this work when we're saving as much
+ * memory as possible we need to remember the pages we reject here
+ * and then free them when we're done.
+ */
+
+ for(i=0; i < pagedir_resume.pageset_size; i++) {
+ while ((this = (void *) get_zeroed_page(GFP_ATOMIC))) {
+ memset(this, 0, PAGE_SIZE);
+ pageaddr = virt_to_page(this);
+ if (!PageInUse(pageaddr)) {
+ break;
+ }
+ *this = eaten_memory;
+ eaten_memory = this;
+ }
+ if (!this) {
+ abort_suspend("Error: Ran out of memory seeking locations for reloading data.");
+ result = 1;
+ break;
+ }
+ add_to_range_chain(&pagedir_resume.destranges, pageaddr - mem_map);
+ nrdone++;
+ }
+
+ /* Free unwanted memory */
+ while(eaten_memory) {
+ this = eaten_memory;
+ eaten_memory = *eaten_memory;
+ free_page((unsigned long) this);
+ }
+
+ return result;
+}
+
+/* set_chain_names
+ *
+ * Description: Set the chain names for a pagedir. (For debugging).
+ * Arguments: struct pagedir: The pagedir on which we want to set the names.
+ */
+
+void set_chain_names(struct pagedir * p)
+{
+ p->origranges.name = "original addresses";
+ p->destranges.name = "destination addresses";
+ p->allocdranges.name = "allocated addresses";
+}
+
+EXPORT_SYMBOL(get_first_pbe);
+EXPORT_SYMBOL(get_next_pbe);


2004-11-24 18:36:54

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 46/51: LZF support.

This is LZF support, contributed under a dual license (see below) by
Marc Lehmann. It flies! (Those stats in the debug info in an earlier
patch were real!).

diff -ruN 852-lzf-old/kernel/power/lzf/lzf_c.c 852-lzf-new/kernel/power/lzf/lzf_c.c
--- 852-lzf-old/kernel/power/lzf/lzf_c.c 1970-01-01 10:00:00.000000000 +1000
+++ 852-lzf-new/kernel/power/lzf/lzf_c.c 2004-11-04 16:27:41.000000000 +1100
@@ -0,0 +1,220 @@
+/*
+ * Copyright (c) 2000-2003 Marc Alexander Lehmann <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without modifica-
+ * tion, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * the GNU General Public License version 2 (the "GPL"), in which case the
+ * provisions of the GPL are applicable instead of the above. If you wish to
+ * allow the use of your version of this file only under the terms of the
+ * GPL and not to allow others to use your version of this file under the
+ * BSD license, indicate your decision by deleting the provisions above and
+ * replace them with the notice and other provisions required by the GPL. If
+ * you do not delete the provisions above, a recipient may use your version
+ * of this file under either the BSD or the GPL.
+ */
+
+#define HSIZE (1 << (HLOG))
+
+/*
+ * don't play with this unless you benchmark!
+ * decompression is not dependent on the hash function
+ * the hashing function might seem strange, just believe me
+ * it works ;)
+ */
+#define FRST(p) (((p[0]) << 8) + p[1])
+#define NEXT(v,p) (((v) << 8) + p[2])
+#define IDX(h) ((((h ^ (h << 5)) >> (3*8 - HLOG)) + h*3) & (HSIZE - 1))
+/*
+ * IDX works because it is very similar to a multiplicative hash, e.g.
+ * (h * 57321 >> (3*8 - HLOG))
+ * the next one is also quite good, albeit slow ;)
+ * (int)(cos(h & 0xffffff) * 1e6)
+ */
+
+#if 0
+/* original lzv-like hash function */
+# define FRST(p) (p[0] << 5) ^ p[1]
+# define NEXT(v,p) ((v) << 5) ^ p[2]
+# define IDX(h) ((h) & (HSIZE - 1))
+#endif
+
+#define MAX_LIT (1 << 5)
+#define MAX_OFF (1 << 13)
+#define MAX_REF ((1 << 8) + (1 << 3))
+
+/*
+ * compressed format
+ *
+ * 000LLLLL <L+1> ; literal
+ * LLLOOOOO oooooooo ; backref L
+ * 111OOOOO LLLLLLLL oooooooo ; backref L+7
+ *
+ */
+
+unsigned int
+lzf_compress (const void *const in_data, unsigned int in_len,
+ void *out_data, unsigned int out_len, void *hbuf)
+{
+ const u8 **htab = hbuf;
+ const u8 **hslot;
+ const u8 *ip = (const u8 *)in_data;
+ u8 *op = (u8 *)out_data;
+ const u8 *in_end = ip + in_len;
+ u8 *out_end = op + out_len;
+ const u8 *ref;
+
+ unsigned int hval = FRST (ip);
+ unsigned long off;
+ int lit = 0;
+
+#if INIT_HTAB
+# if USE_MEMCPY
+ memset (htab, 0, sizeof (htab));
+# else
+ for (hslot = htab; hslot < htab + HSIZE; hslot++)
+ *hslot++ = ip;
+# endif
+#endif
+
+ for (;;)
+ {
+ if (ip < in_end - 2)
+ {
+ hval = NEXT (hval, ip);
+ hslot = htab + IDX (hval);
+ ref = *hslot; *hslot = ip;
+
+ if (1
+#if INIT_HTAB && !USE_MEMCPY
+ && ref < ip /* the next test will actually take care of this, but this is faster */
+#endif
+ && (off = ip - ref - 1) < MAX_OFF
+ && ip + 4 < in_end
+ && ref > (u8 *)in_data
+#if STRICT_ALIGN
+ && ref[0] == ip[0]
+ && ref[1] == ip[1]
+ && ref[2] == ip[2]
+#else
+ && *(u16 *)ref == *(u16 *)ip
+ && ref[2] == ip[2]
+#endif
+ )
+ {
+ /* match found at *ref++ */
+ unsigned int len = 2;
+ unsigned int maxlen = in_end - ip - len;
+ maxlen = maxlen > MAX_REF ? MAX_REF : maxlen;
+
+ do
+ len++;
+ while (len < maxlen && ref[len] == ip[len]);
+
+ if (op + lit + 1 + 3 >= out_end)
+ return 0;
+
+ if (lit)
+ {
+ *op++ = lit - 1;
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+ }
+
+ len -= 2;
+ ip++;
+
+ if (len < 7)
+ {
+ *op++ = (off >> 8) + (len << 5);
+ }
+ else
+ {
+ *op++ = (off >> 8) + ( 7 << 5);
+ *op++ = len - 7;
+ }
+
+ *op++ = off;
+
+#if ULTRA_FAST
+ ip += len;
+ hval = FRST (ip);
+ hval = NEXT (hval, ip);
+ htab[IDX (hval)] = ip;
+ ip++;
+#else
+ do
+ {
+ hval = NEXT (hval, ip);
+ htab[IDX (hval)] = ip;
+ ip++;
+ }
+ while (len--);
+#endif
+ continue;
+ }
+ }
+ else if (ip == in_end)
+ break;
+
+ /* one more literal byte we must copy */
+ lit++;
+ ip++;
+
+ if (lit == MAX_LIT)
+ {
+ if (op + 1 + MAX_LIT >= out_end)
+ return 0;
+
+ *op++ = MAX_LIT - 1;
+#if USE_MEMCPY
+ memcpy (op, ip - MAX_LIT, MAX_LIT);
+ op += MAX_LIT;
+ lit = 0;
+#else
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+#endif
+ }
+ }
+
+ if (lit)
+ {
+ if (op + lit + 1 >= out_end)
+ return 0;
+
+ *op++ = lit - 1;
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+ }
+
+ return op - (u8 *) out_data;
+}
diff -ruN 852-lzf-old/kernel/power/lzf/lzf_d.c 852-lzf-new/kernel/power/lzf/lzf_d.c
--- 852-lzf-old/kernel/power/lzf/lzf_d.c 1970-01-01 10:00:00.000000000 +1000
+++ 852-lzf-new/kernel/power/lzf/lzf_d.c 2004-11-04 16:27:41.000000000 +1100
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2000-2002 Marc Alexander Lehmann <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without modifica-
+ * tion, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * the GNU General Public License version 2 (the "GPL"), in which case the
+ * provisions of the GPL are applicable instead of the above. If you wish to
+ * allow the use of your version of this file only under the terms of the
+ * GPL and not to allow others to use your version of this file under the
+ * BSD license, indicate your decision by deleting the provisions above and
+ * replace them with the notice and other provisions required by the GPL. If
+ * you do not delete the provisions above, a recipient may use your version
+ * of this file under either the BSD or the GPL.
+ */
+
+unsigned int
+lzf_decompress (const void *const in_data, unsigned int in_len,
+ void *out_data, unsigned int out_len)
+{
+ u8 const *ip = in_data;
+ u8 *op = out_data;
+ u8 const *const in_end = ip + in_len;
+ u8 *const out_end = op + out_len;
+
+ do
+ {
+ unsigned int ctrl = *ip++;
+
+ if (ctrl < (1 << 5)) /* literal run */
+ {
+ ctrl++;
+
+ if (op + ctrl > out_end)
+ return 0;
+
+#if USE_MEMCPY
+ memcpy (op, ip, ctrl);
+ op += ctrl;
+ ip += ctrl;
+#else
+ do
+ *op++ = *ip++;
+ while (--ctrl);
+#endif
+ }
+ else /* back reference */
+ {
+ unsigned int len = ctrl >> 5;
+
+ u8 *ref = op - ((ctrl & 0x1f) << 8) - 1;
+
+ if (len == 7)
+ len += *ip++;
+
+ ref -= *ip++;
+
+ if (op + len + 2 > out_end)
+ return 0;
+
+ if (ref < (u8 *)out_data)
+ return 0;
+
+ *op++ = *ref++;
+ *op++ = *ref++;
+
+ do
+ *op++ = *ref++;
+ while (--len);
+ }
+ }
+ while (op < out_end && ip < in_end);
+
+ return op - (u8 *)out_data;
+}
+
diff -ruN 852-lzf-old/kernel/power/suspend_lzf.c 852-lzf-new/kernel/power/suspend_lzf.c
--- 852-lzf-old/kernel/power/suspend_lzf.c 1970-01-01 10:00:00.000000000 +1000
+++ 852-lzf-new/kernel/power/suspend_lzf.c 2004-11-11 08:46:15.000000000 +1100
@@ -0,0 +1,554 @@
+/*
+ * kernel/power/lzf_compress.c
+ *
+ * Copyright (C) 2003 Marc Lehmann <[email protected]>
+ * Copyright (C) 2003,2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains data compression routines for suspend,
+ * using LZH compression.
+ *
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/highmem.h>
+#include <linux/vmalloc.h>
+
+#include "plugins.h"
+#include "proc.h"
+#include "suspend.h"
+
+static int expected_lzf_compression = 0;
+
+/*
+ * size of hashtable is (1 << HLOG) * sizeof (char *)
+ * decompression is independent of the hash table size
+ * the difference between 15 and 14 is very small
+ * for small blocks (and 14 is also faster).
+ * For a low-memory configuration, use HLOG == 13;
+ * For best compression, use 15 or 16.
+ */
+#ifndef HLOG
+# define HLOG 14
+#endif
+
+/*
+ * sacrifice some compression quality in favour of compression speed.
+ * (roughly 1-2% worse compression for large blocks and
+ * 9-10% for small, redundant, blocks and >>20% better speed in both cases)
+ * In short: enable this for binary data, disable this for text data.
+ */
+#ifndef ULTRA_FAST
+# define ULTRA_FAST 1
+#endif
+
+#define STRICT_ALIGN 0
+#define USE_MEMCPY 1
+#define INIT_HTAB 0
+
+#include "lzf/lzf_c.c"
+#include "lzf/lzf_d.c"
+
+static struct suspend_plugin_ops lzf_compression_ops;
+static struct suspend_plugin_ops * next_driver;
+
+static void *compression_workspace = NULL;
+static u8 *local_buffer = NULL;
+static struct page * local_buffer_page = NULL;
+static u8 *page_buffer = NULL;
+static struct page * page_buffer_page = NULL;
+static unsigned int bufofs;
+
+static __nosavedata unsigned long bytes_in = 0, bytes_out = 0;
+
+/* allocate_compression_space
+ *
+ * Description: Allocate space for use in [de]compressing our data.
+ * Each call must have a matching call to free_memory.
+ * Returns: Int: Zero if successful, -ENONEM otherwise.
+ */
+
+static inline int allocate_compression_space(void)
+{
+ BUG_ON(compression_workspace);
+
+ compression_workspace = vmalloc_32((1<<HLOG)*sizeof(char *));
+ if (!compression_workspace) {
+ printk(KERN_WARNING
+ "Failed to allocate %d bytes for lzf workspace\n",
+ (1<<HLOG)*sizeof(char *));
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/* free_zlib_memory
+ *
+ * Description: Frees memory allocated by the allocation routine (above).
+ */
+
+static inline void free_memory(void)
+{
+ if (!compression_workspace)
+ return;
+
+ vfree(compression_workspace);
+ compression_workspace = NULL;
+}
+
+/* ---- Local buffer management ---- */
+
+/* allocate_local_buffer
+ *
+ * Description: Allocates a page of memory for buffering output.
+ * Returns: Int: Zero if successful, -ENONEM otherwise.
+ */
+
+static int allocate_local_buffer(void)
+{
+ if (!local_buffer) {
+ local_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+
+ if (!local_buffer) {
+ printk(KERN_ERR
+ "Failed to allocate the local buffer for "
+ "lzf compression driver.\n");
+ return -ENOMEM;
+ }
+ local_buffer_page = virt_to_page(local_buffer);
+ }
+
+ if (!page_buffer) {
+ page_buffer = (char *) get_zeroed_page(GFP_ATOMIC);
+
+ if (!page_buffer) {
+ printk(KERN_ERR
+ "Failed to allocate the page buffer for "
+ "lzf compression driver.\n");
+ return -ENOMEM;
+ }
+ page_buffer_page = virt_to_page(page_buffer);
+ }
+
+ return 0;
+}
+
+/* free_local_buffer
+ *
+ * Description: Frees memory allocated for buffering output.
+ */
+
+static inline void free_local_buffer(void)
+{
+ if (local_buffer)
+ free_pages((unsigned long) local_buffer, 0);
+
+ local_buffer = NULL;
+ local_buffer_page = NULL;
+
+ if (page_buffer)
+ free_pages((unsigned long) page_buffer, 0);
+
+ page_buffer = NULL;
+ page_buffer_page = NULL;
+}
+
+/* ---- Exported functions ---- */
+
+/* write_init()
+ *
+ * Description: Allocate buffers and prepare to compress data.
+ * Arguments: Stream_number: Ignored.
+ * Returns: Zero on success, -ENOMEM if unable to vmalloc.
+ */
+
+static int lzf_write_init(int stream_number)
+{
+ int result;
+
+ next_driver = get_next_filter(&lzf_compression_ops);
+
+ if (!next_driver) {
+ printk("LZF Compression Driver: Argh! No one wants my output!");
+ return -ECHILD;
+ }
+
+ if ((result = allocate_compression_space()))
+ return result;
+
+ if ((result = allocate_local_buffer()))
+ return result;
+
+ /* Only reset the stats if starting to write an image */
+ if (stream_number == 2)
+ bytes_in = bytes_out = 0;
+
+ bufofs = 0;
+
+ return 0;
+}
+
+/* lzf_write()
+ *
+ * Description: Helper function for write_chunk. Write the compressed data.
+ * Arguments: u8*: Output buffer to be written.
+ * unsigned int: Length of buffer.
+ * Return: int: Result to be passed back to caller.
+ */
+
+static int lzf_write (u8 *buffer, unsigned int len)
+{
+ int ret;
+
+ bytes_out += len;
+
+ while (len + bufofs > PAGE_SIZE) {
+ unsigned int chunk = PAGE_SIZE - bufofs;
+ memcpy (local_buffer + bufofs, buffer, chunk);
+ buffer += chunk;
+ len -= chunk;
+ bufofs = 0;
+ if ((ret = next_driver->ops.filter.write_chunk(local_buffer_page)) < 0)
+ return ret;
+ }
+ memcpy (local_buffer + bufofs, buffer, len);
+ bufofs += len;
+ return 0;
+}
+
+/* lzf_write_chunk()
+ *
+ * Description: Compress a page of data, buffering output and passing on
+ * filled pages to the next plugin in the pipeline.
+ * Arguments: Buffer_page: Pointer to a buffer of size PAGE_SIZE,
+ * containing data to be compressed.
+ * Returns: 0 on success. Otherwise the error is that returned by later
+ * plugins, -ECHILD if we have a broken pipeline or -EPERM if
+ * zlib errs.
+ */
+
+static int lzf_write_chunk(struct page * buffer_page)
+{
+ int ret;
+ u16 len;
+ char * buffer_start = kmap(buffer_page);
+
+ bytes_in += PAGE_SIZE;
+
+ len = lzf_compress(buffer_start, PAGE_SIZE, page_buffer,
+ PAGE_SIZE - 3, compression_workspace);
+
+ if ((ret = lzf_write((u8 *)&len, 2)) >= 0) {
+ if (len) // some compression
+ ret = lzf_write(page_buffer, len);
+ else
+ ret = lzf_write(buffer_start, PAGE_SIZE);
+ }
+ kunmap(buffer_page);
+ return ret;
+}
+
+/* write_cleanup()
+ *
+ * Description: Write unflushed data and free workspace.
+ * Returns: Result of writing last page.
+ */
+
+static int lzf_write_cleanup(void)
+{
+ int ret;
+
+ ret = next_driver->ops.filter.write_chunk(local_buffer_page);
+
+ free_memory();
+ free_local_buffer();
+
+ return ret;
+}
+
+/* read_init()
+ *
+ * Description: Prepare to read a new stream of data.
+ * Arguments: int: Section of image about to be read.
+ * Returns: int: Zero on success, error number otherwise.
+ */
+
+static int lzf_read_init(int stream_number)
+{
+ int result;
+
+ next_driver = get_next_filter(&lzf_compression_ops);
+
+ if (!next_driver) {
+ printk("LZF Compression Driver: Argh! No one wants "
+ "to feed me data!");
+ return -ECHILD;
+ }
+
+ if ((result = allocate_local_buffer()))
+ return result;
+
+ bufofs = PAGE_SIZE;
+
+ return 0;
+}
+
+/* lzf_read()
+ *
+ * Description: Read data into compression buffer.
+ * Arguments: u8 *: Address of the buffer.
+ * unsigned int: Length
+ * Returns: int: Result of reading the image chunk.
+ */
+
+static int lzf_read (u8 * buffer, unsigned int len)
+{
+ int ret;
+
+ while (len + bufofs > PAGE_SIZE) {
+ unsigned int chunk = PAGE_SIZE - bufofs;
+ memcpy(buffer, local_buffer + bufofs, chunk);
+ buffer += chunk;
+ len -= chunk;
+ bufofs = 0;
+ if ((ret = next_driver->ops.filter.read_chunk(
+ local_buffer_page, SUSPEND_SYNC)) < 0) {
+ return ret;
+ }
+ }
+ memcpy (buffer, local_buffer + bufofs, len);
+ bufofs += len;
+ return 0;
+}
+
+/* lzf_read_chunk()
+ *
+ * Description: Retrieve data from later plugins and decompress it until the
+ * input buffer is filled.
+ * Arguments: Buffer_start: Pointer to a buffer of size PAGE_SIZE.
+ * Sync: Whether the previous plugin (or core) wants its
+ * data synchronously.
+ * Returns: Zero if successful. Error condition from me or from downstream
+ * on failure.
+ */
+
+static int lzf_read_chunk(struct page * buffer_page, int sync)
+{
+ int ret;
+ u16 len;
+ char * buffer_start = kmap(buffer_page);
+
+ /*
+ * All our reads must be synchronous - we can't decompress
+ * data that hasn't been read yet.
+ */
+
+ if ((ret = lzf_read ((u8 *)&len, 2)) >= 0) {
+ if (len == 0) { // uncompressed
+ ret = lzf_read(buffer_start, PAGE_SIZE);
+ } else { // compressed
+ if ((ret = lzf_read(page_buffer, len)) >= 0) {
+ ret = lzf_decompress(page_buffer, len, buffer_start, PAGE_SIZE);
+ if (ret != PAGE_SIZE)
+ ret = -EPERM; // why EPERM??
+ else
+ ret = 0;
+ }
+ }
+ }
+ kunmap(buffer_page);
+ return ret;
+}
+
+/* read_cleanup()
+ *
+ * Description: Clean up after reading part or all of a stream of data.
+ * Returns: int: Always zero. Never fails.
+ */
+
+static int lzf_read_cleanup(void)
+{
+ free_local_buffer();
+ return 0;
+}
+
+/* lzf_print_debug_stats
+ *
+ * Description: Print information to be recorded for debugging purposes into a
+ * buffer.
+ * Arguments: buffer: Pointer to a buffer into which the debug info will be
+ * printed.
+ * size: Size of the buffer.
+ * Returns: Number of characters written to the buffer.
+ */
+
+static int lzf_print_debug_stats(char * buffer, int size)
+{
+ int pages_in = bytes_in >> PAGE_SHIFT,
+ pages_out = bytes_out >> PAGE_SHIFT;
+ int len;
+
+ /* Output the compression ratio achieved. */
+ len = suspend_snprintf(buffer, size, "- LZF Compressor enabled.\n");
+ if (pages_in)
+ len+= suspend_snprintf(buffer+len, size - len,
+ " Compressed %ld bytes into %ld (%d percent compression).\n",
+ bytes_in, bytes_out, (pages_in - pages_out) * 100 / pages_in);
+ return len;
+}
+
+/* compression_memory_needed
+ *
+ * Description: Tell the caller how much memory we need to operate during
+ * suspend/resume.
+ * Returns: Unsigned long. Maximum number of bytes of memory required for
+ * operation.
+ */
+
+static unsigned long lzf_memory_needed(void)
+{
+ return PAGE_SIZE * 2 + (1<<HLOG)*sizeof(char *);
+}
+
+static unsigned long lzf_storage_needed(void)
+{
+ return 2 * sizeof(unsigned long);
+}
+
+/* lzf_save_config_info
+ *
+ * Description: Save informaton needed when reloading the image at resume time.
+ * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE.
+ * Returns: Number of bytes used for saving our data.
+ */
+
+static int lzf_save_config_info(char * buffer)
+{
+ *((unsigned long *) buffer) = bytes_in;
+ *((unsigned long *) (buffer + sizeof(unsigned long))) = bytes_out;
+ *((int *) (buffer + 2 * sizeof(unsigned long))) = expected_lzf_compression;
+ return 2 * sizeof(unsigned long) + sizeof(int);
+}
+
+/* lzf_load_config_info
+ *
+ * Description: Reload information needed for decompressing the image at
+ * resume time.
+ * Arguments: Buffer: Pointer to the start of the data.
+ * Size: Number of bytes that were saved.
+ */
+
+static void lzf_load_config_info(char * buffer, int size)
+{
+ if(size == 2 * sizeof(unsigned long) + sizeof(int)) {
+ bytes_in = *((unsigned long *) buffer);
+ bytes_out = *((unsigned long *) (buffer + sizeof(unsigned long)));
+ expected_lzf_compression = *((int *) (buffer + 2 * sizeof(unsigned long)));
+ } else
+ printk("Suspend LZF config info size mismatch: settings ignored.\n");
+ return;
+}
+
+/* lzf_get_expected_compression
+ *
+ * Description: Returns the expected ratio between data passed into this plugin
+ * and the amount of data output when writing.
+ * Returns: 100 if the plugin is disabled. Otherwise the value set by the
+ * user via our proc entry.
+ */
+
+static int lzf_get_expected_compression(void)
+{
+ return 100 - expected_lzf_compression;
+}
+
+/*
+ * data for our proc entries.
+ */
+
+static struct suspend_proc_data expected_compression_proc_data = {
+ .filename = "expected_lzf_compression",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &expected_lzf_compression,
+ .minimum = 0,
+ .maximum = 99,
+ }
+ }
+};
+
+static struct suspend_proc_data disable_compression_proc_data = {
+ .filename = "disable_lzf_compression",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &lzf_compression_ops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ }
+};
+
+/*
+ * Ops structure.
+ */
+
+static struct suspend_plugin_ops lzf_compression_ops = {
+ .type = FILTER_PLUGIN,
+ .name = "LZF Page Compressor",
+ .memory_needed = lzf_memory_needed,
+ .print_debug_info = lzf_print_debug_stats,
+ .save_config_info = lzf_save_config_info,
+ .load_config_info = lzf_load_config_info,
+ .storage_needed = lzf_storage_needed,
+ .ops = {
+ .filter = {
+ .write_init = lzf_write_init,
+ .write_chunk = lzf_write_chunk,
+ .write_cleanup = lzf_write_cleanup,
+ .read_init = lzf_read_init,
+ .read_chunk = lzf_read_chunk,
+ .read_cleanup = lzf_read_cleanup,
+ .expected_compression = lzf_get_expected_compression,
+ }
+ }
+};
+
+/* ---- Registration ---- */
+
+static __init int lzf_load(void)
+{
+ int result;
+
+ if (!(result = suspend_register_plugin(&lzf_compression_ops))) {
+ printk("Software Suspend LZF Compression Driver registered.\n");
+ suspend_register_procfile(&expected_compression_proc_data);
+ suspend_register_procfile(&disable_compression_proc_data);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void lzf_unload(void)
+{
+ printk("Software Suspend LZF Compression Driver unloading.\n");
+ suspend_unregister_procfile(&expected_compression_proc_data);
+ suspend_unregister_procfile(&disable_compression_proc_data);
+ suspend_unregister_plugin(&lzf_compression_ops);
+}
+
+
+module_init(lzf_load);
+module_exit(lzf_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Marc Lehmann");
+MODULE_DESCRIPTION("LZF Compression support for Suspend2");
+#else
+late_initcall(lzf_load);
+#endif


2004-11-24 18:39:48

by Nigel Cunningham

[permalink] [raw]
Subject: Suspend 2 merge: 44/51: Text UI plugin.

Here's our plugin that is used to display text output on the console.
When the console loglevel is 0 or 1, the user gets a nice progress bar.
Higher levels display increasing levels of debugging output.

diff -ruN 850-text-ui-old/kernel/power/suspend_text.c 850-text-ui-new/kernel/power/suspend_text.c
--- 850-text-ui-old/kernel/power/suspend_text.c 1970-01-01 10:00:00.000000000 +1000
+++ 850-text-ui-new/kernel/power/suspend_text.c 2004-11-15 09:41:00.000000000 +1100
@@ -0,0 +1,629 @@
+/*
+ * kernel/power/suspend2_text_display.c
+ *
+ * Copyright (C) 2002-2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Routines for Software Suspend's user interface.
+ *
+ * The user interface includes support for a text mode 'nice display'.
+ *
+ * The 'nice display' is text based and implements a progress bar and
+ * (optional) textual progress, as well as an overall description of
+ * the current action and the display of a header and the code version.
+ *
+ * It uses /dev/console, and thus also works on a serial console.
+ */
+#define SUSPEND_TEXT_MODE_C
+
+//#define __KERNEL_SYSCALLS__
+
+#include <linux/suspend.h>
+#include <linux/console.h>
+#include <linux/selection.h>
+#include <linux/tty.h>
+#include <linux/vt_kern.h>
+
+#include "plugins.h"
+#include "proc.h"
+#include "suspend.h"
+
+/*
+ * The original macros use currcons, which we don't have access to.
+ */
+#undef video_num_columns
+#define video_num_columns (vc_cons[fg_console].d->vc_cols)
+#undef video_num_lines
+#define video_num_lines (vc_cons[fg_console].d->vc_rows)
+
+static int barwidth = 0, barposn = -1, newbarposn = 0;
+static int draw_progress_bar = 1;
+static char print_buf[1024]; /* Same as printk - should be safe */
+
+/* We remember the last header that was (or could have been) displayed for
+ * use during log level switches */
+static char lastheader[512];
+static int lastheader_message_len = 0;
+
+static void hide_cursor(void);
+static void unblank_screen_via_file(void);
+
+static int suspend_console_fd = -1;
+static struct termios termios;
+static int lastloglevel = -1;
+
+#ifdef CONFIG_DEVFS_FS
+static int mounted_devfs = 0;
+#endif
+
+#define cond_console_print(chars) \
+ if (suspend_console_fd > -1) { \
+ int count = strlen(chars); \
+ sys_write(suspend_console_fd, chars, count); \
+ hide_cursor(); \
+ unblank_screen_via_file(); \
+ }
+
+static void move_cursor_to(unsigned char * xy)
+{
+ char buf[10];
+
+ snprintf(buf, 10, "\233%d;%dH", xy[1], xy[0]);
+ cond_console_print(buf);
+}
+
+static void clear_display(void)
+{
+ char buf[4] = "\2332J";
+ unsigned char home[2] = { 0, 0 };
+
+ cond_console_print(buf);
+ move_cursor_to(home);
+}
+
+static void hide_cursor(void)
+{
+ char buf[6] = "\033[?1c";
+ if (suspend_console_fd > -1)
+ sys_write(suspend_console_fd, buf, 5);
+}
+
+static void restore_cursor(void)
+{
+ char buf[6] = "\033[?0c";
+ if (suspend_console_fd > -1)
+ sys_write(suspend_console_fd, buf, 5);
+}
+
+static void unblank_screen_via_file(void)
+{
+ char buf[6] = "\033[13]";
+ if (suspend_console_fd > -1)
+ sys_write(suspend_console_fd, buf, 5);
+}
+
+/* prepare_status
+ * Description: Prepare the 'nice display', drawing the header and version,
+ * along with the current action and perhaps also resetting the
+ * progress bar.
+ * Arguments: int printalways: Whether to print the action when debugging
+ * is on.
+ * int clearbar: Whether to reset the progress bar.
+ * const char *fmt, ...: The action to be displayed.
+ */
+static void text_prepare_status(int printalways, int clearbar, const char *fmt, va_list args)
+{
+ unsigned char posn[2];
+
+ if (fmt)
+ lastheader_message_len = vsnprintf(lastheader, 512, fmt, args);
+
+ if (console_loglevel >= SUSPEND_ERROR) {
+
+ if (printalways)
+ printk("\n** %s\n", lastheader);
+ return;
+ }
+
+ barwidth = (video_num_columns - 2 * (video_num_columns / 4) - 2);
+
+ /* Print version */
+ posn[0] = (unsigned char) (0);
+ posn[1] = (unsigned char) (video_num_lines);
+ move_cursor_to(posn);
+ cond_console_print(SUSPEND_CORE_VERSION);
+
+ /* Print header */
+ posn[0] = (unsigned char) ((video_num_columns - 31) / 2);
+ posn[1] = (unsigned char) ((video_num_lines / 3) - 3);
+ move_cursor_to(posn);
+
+ cond_console_print("S O F T W A R E S U S P E N D");
+
+ /* Print action */
+ posn[1] = (unsigned char) (video_num_lines / 3);
+ posn[0] = (unsigned char) 0;
+ move_cursor_to(posn);
+
+ /* Clear old message */
+ for (barposn = 0; barposn < video_num_columns; barposn++)
+ cond_console_print(" ");
+
+ posn[0] = (unsigned char)
+ ((video_num_columns - lastheader_message_len) / 2);
+ move_cursor_to(posn);
+ cond_console_print(lastheader);
+
+ if (draw_progress_bar) {
+ /* Draw left bracket of progress bar. */
+ posn[0] = (unsigned char) (video_num_columns / 4);
+ posn[1]++;
+ move_cursor_to(posn);
+ cond_console_print("[");
+
+ /* Draw right bracket of progress bar. */
+ posn[0] = (unsigned char)
+ (video_num_columns - (video_num_columns / 4) - 1);
+ move_cursor_to(posn);
+ cond_console_print("]");
+
+ if (clearbar) {
+ /* Position at start of progress */
+ posn[0] = (unsigned char) (video_num_columns / 4 + 1);
+ move_cursor_to(posn);
+
+ /* Clear bar */
+ for (barposn = 0; barposn < barwidth; barposn++)
+ cond_console_print(" ");
+ move_cursor_to(posn);
+ }
+ }
+
+ hide_cursor();
+
+ barposn = 0;
+}
+
+/* text_loglevel_change
+ *
+ * Description: Update the display when the user changes the log level.
+ * Returns: Boolean indicating whether the level was changed.
+ */
+
+static void text_loglevel_change(void)
+{
+ /* Calculate progress bar width. Note that whether the
+ * splash screen is on might have changed (this might be
+ * the first call in a new cycle), so we can't take it
+ * for granted that the width is the same as last time
+ * we came in here */
+ barwidth = (video_num_columns - 2 * (video_num_columns / 4) - 2);
+ barposn = 0;
+
+ /* Only reset the display if we're switching between nice display
+ * and displaying debugging output */
+
+ if (console_loglevel >= SUSPEND_ERROR) {
+ char message[35];
+ if (lastloglevel < SUSPEND_ERROR)
+ clear_display();
+
+ snprintf(message, 35,
+ "Switched to console loglevel %d.\n",
+ console_loglevel);
+ cond_console_print(message);
+
+ if (lastloglevel < SUSPEND_ERROR) {
+ cond_console_print(lastheader);
+ cond_console_print("\n");
+ }
+
+ } else if (lastloglevel >= SUSPEND_ERROR) {
+ clear_display();
+
+ /* Get the nice display or last action [re]drawn */
+ text_prepare_status(1, 0, NULL, NULL);
+ }
+
+ lastloglevel = console_loglevel;
+}
+/* text_update_progress
+ *
+ * Description: Update the progress bar and (if on) in-bar message.
+ * Arguments: UL value, maximum: Current progress percentage (value/max).
+ * const char *fmt, ...: Message to be displayed in the middle
+ * of the progress bar.
+ * Note that a NULL message does not mean that any previous
+ * message is erased! For that, you need prepare_status with
+ * clearbar on.
+ * Returns: Unsigned long: The next value where status needs to be updated.
+ * This is to reduce unnecessary calls to text_update_progress.
+ */
+unsigned long text_update_progress(unsigned long value, unsigned long maximum,
+ const char *fmt, va_list args)
+{
+ unsigned long next_update = 0;
+ int bitshift = generic_fls(maximum) - 16;
+ unsigned char posn[2];
+ int message_len = 0;
+
+ if (!barwidth)
+ barwidth = (video_num_columns - 2 * (video_num_columns / 4) - 2);
+
+ if (!maximum)
+ return maximum;
+
+ if (value < 0)
+ value = 0;
+
+ if (value > maximum)
+ value = maximum;
+
+ /* Try to avoid math problems - we can't do 64 bit math here
+ * (and shouldn't need it - anyone got screen resolution
+ * of 65536 pixels or more?) */
+ if (bitshift > 0) {
+ unsigned long temp_maximum = maximum >> bitshift;
+ unsigned long temp_value = value >> bitshift;
+ newbarposn = (int) (temp_value * barwidth / temp_maximum);
+ } else
+ newbarposn = (int) (value * barwidth / maximum);
+
+ if (newbarposn < barposn)
+ barposn = 0;
+
+ next_update = ((newbarposn + 1) * maximum / barwidth) + 1;
+
+ if ((console_loglevel >= SUSPEND_ERROR) || (!draw_progress_bar))
+ return next_update;
+
+ /* Update bar */
+ if (draw_progress_bar) {
+ posn[1] = (unsigned char) ((video_num_lines / 3) + 1);
+
+ /* Clear bar if at start */
+ if (!barposn) {
+ posn[0] = (unsigned char) (video_num_columns / 4 + 1);
+ move_cursor_to(posn);
+ for (; barposn < barwidth; barposn++)
+ cond_console_print(" ");
+ barposn = 0;
+ }
+ posn[0] = (unsigned char) (video_num_columns / 4 + 1 + barposn);
+ move_cursor_to(posn);
+
+ for (; barposn < newbarposn; barposn++)
+ cond_console_print("-");
+ }
+
+ /* Print string in progress bar on loglevel 1 */
+ if ((fmt) && (console_loglevel)) {
+ message_len = vsnprintf(print_buf, sizeof(print_buf), " ", NULL);
+ message_len += vsnprintf(print_buf + message_len,
+ sizeof(print_buf) - message_len, fmt, args);
+ message_len += vsnprintf(print_buf + message_len,
+ sizeof(print_buf) - message_len, " ", NULL);
+
+ if (message_len) {
+ posn[0] = (unsigned char)
+ ((video_num_columns - message_len) / 2);
+ posn[1] = (unsigned char)
+ ((video_num_lines / 3) + 1);
+ move_cursor_to(posn);
+ cond_console_print(print_buf);
+ }
+ }
+
+ barposn = newbarposn;
+ hide_cursor();
+
+ return next_update;
+}
+
+extern asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd,
+ unsigned long arg);
+
+static void text_message(unsigned long section, unsigned long level,
+ int normally_logged,
+ const char *fmt, va_list args)
+{
+ int printed_len = 0;
+
+ if ((section) && (!TEST_DEBUG_STATE(section)))
+ return;
+
+ if (level == SUSPEND_STATUS) {
+ text_prepare_status(1, 0, fmt, args);
+ return;
+ }
+
+ if (level > console_loglevel)
+ return;
+
+ printed_len = vsnprintf(print_buf + printed_len,
+ sizeof(print_buf) - printed_len, fmt, args);
+
+
+ if ((TEST_ACTION_STATE(SUSPEND_LOGALL)) ||
+ (normally_logged)) {
+ /* If we didn't print anything, don't do the \n anyway! */
+ if (!printed_len)
+ return;
+ printk(print_buf);
+ } else
+ cond_console_print(print_buf);
+}
+/*
+ *
+ */
+
+static void suspend_get_dev_console(void)
+{
+ if (suspend_console_fd > -1)
+ return;
+
+ suspend_console_fd = sys_open("/dev/console", O_RDWR | O_NONBLOCK, 0);
+ if (suspend_console_fd < 0) {
+ sys_mkdir("/dev", 0700);
+#ifdef CONFIG_DEVFS_FS
+ sys_mount("devfs", "/dev", "devfs", 0, NULL);
+ mounted_devfs = 1;
+#endif
+ suspend_console_fd = sys_open("/dev/console", O_RDWR | O_NONBLOCK, 0);
+ }
+ if (suspend_console_fd < 0) {
+ printk("Can't open /dev/console. Error value was %d.\n",
+ suspend_console_fd);
+ suspend_console_fd = -1;
+ return;
+ }
+
+ sys_ioctl(suspend_console_fd, TCGETS, (long)&termios);
+ termios.c_lflag &= ~ICANON;
+ sys_ioctl(suspend_console_fd, TCSETSF, (long)&termios);
+}
+
+/* prepare_console
+ *
+ */
+static void text_prepare_console(void)
+{
+ suspend_get_dev_console();
+
+ if (console_loglevel < 2)
+ clear_display();
+
+ lastloglevel = console_loglevel;
+}
+
+/* cleanup_console
+ *
+ * Description: Close our handle on /dev/console. Must be done
+ * earlier than pm_restore_console to avoid problems with other
+ * processes trying to grab it when thawed.
+ */
+
+static void cleanup_console(void)
+{
+ if (console_loglevel < 2)
+ clear_display();
+ restore_cursor();
+ termios.c_lflag |= ICANON;
+ sys_ioctl(suspend_console_fd, TCSETSF, (long)&termios);
+ sys_close(suspend_console_fd);
+ suspend_console_fd = -1;
+
+#ifdef CONFIG_DEVFS_FS
+ if (mounted_devfs)
+ sys_umount("/dev", 0);
+#endif
+
+ lastloglevel = -1;
+ return;
+}
+
+static void text_redraw(void)
+{
+ sys_ioctl(suspend_console_fd, TIOCL_BLANKSCREEN, (long)&termios);
+ sys_ioctl(suspend_console_fd, TIOCL_UNBLANKSCREEN, (long)&termios);
+}
+
+static int text_keypress(unsigned int key)
+{
+ switch (key) {
+ case 48:
+ console_loglevel = 0;
+ break;
+ case 49:
+ console_loglevel = 1;
+ break;
+#ifdef CONFIG_SOFTWARE_SUSPEND_DEBUG
+ case 122:
+ /* `: Toggle slow */
+ suspend_action ^= (1 << SUSPEND_SLOW);
+ suspend2_core_ops->schedule_message(7);
+ break;
+ case 1:
+ /* F1: Toggle any section debugging. */
+ suspend_debug_state ^= (1 << SUSPEND_ANY_SECTION);
+ suspend2_core_ops->schedule_message(20);
+ break;
+ case 2:
+ /* F2: Freeze. */
+ suspend_debug_state ^= (1 << SUSPEND_FREEZER);
+ suspend2_core_ops->schedule_message(21);
+ break;
+ case 3:
+ /* F3: Eat Memory */
+ suspend_debug_state ^= (1 << SUSPEND_EAT_MEMORY);
+ suspend2_core_ops->schedule_message(22);
+ break;
+ case 4:
+ /* F4: Pagesets. */
+ suspend_debug_state ^= (1 << SUSPEND_PAGESETS);
+ suspend2_core_ops->schedule_message(23);
+ break;
+ case 5:
+ /* F5: IO. */
+ suspend_debug_state ^= (1 << SUSPEND_IO);
+ suspend2_core_ops->schedule_message(24);
+ break;
+ case 6:
+ /* F6: Bmapping of pages */
+ suspend_debug_state ^= (1 << SUSPEND_BMAP);
+ suspend2_core_ops->schedule_message(25);
+ break;
+ case 7:
+ /* F7: Writer */
+ suspend_debug_state ^= (1 << SUSPEND_WRITER);
+ suspend2_core_ops->schedule_message(26);
+ break;
+ case 8:
+ /* F8: Memory */
+ suspend_debug_state ^= (1 << SUSPEND_MEMORY);
+ suspend2_core_ops->schedule_message(27);
+ break;
+ case 9:
+ /* F9: Ranges */
+ suspend_debug_state ^= (1 << SUSPEND_RANGES);
+ suspend2_core_ops->schedule_message(28);
+ break;
+ case 10:
+ /* F10: Memory Pool */
+ suspend_debug_state ^= (1 << SUSPEND_MEM_POOL);
+ suspend2_core_ops->schedule_message(29);
+ break;
+ case 11:
+ /* F11: Nosave */
+ suspend_debug_state ^= (1 << SUSPEND_NOSAVE);
+ suspend2_core_ops->schedule_message(30);
+ break;
+ case 12:
+ /* F12: Integrity */
+ suspend_debug_state ^= (1 << SUSPEND_INTEGRITY);
+ suspend2_core_ops->schedule_message(31);
+ break;
+ case 112:
+ /* During suspend, toggle pausing with P */
+ suspend_action ^= (1 << SUSPEND_PAUSE);
+ suspend2_core_ops->schedule_message(1);
+ break;
+ case 115:
+ /* Otherwise, if S pressed, toggle single step */
+ suspend_action ^= (1 << SUSPEND_SINGLESTEP);
+ suspend2_core_ops->schedule_message(3);
+ break;
+ case 108:
+ /* Otherwise, if L pressed, toggle logging everything */
+ suspend_action ^= (1 << SUSPEND_LOGALL);
+ suspend2_core_ops->schedule_message(4);
+ break;
+ case 116:
+ /* T: Toggle freezing timers */
+ clear_suspend_state(SUSPEND_TIMER_FREEZER_ON);
+ suspend2_core_ops->schedule_message(99);
+ break;
+ case 50:
+ case 51:
+ case 52:
+ case 53:
+ case 54:
+ case 55:
+ case 56:
+ case 57:
+ console_loglevel = ((key - 48));
+ break;
+#endif
+ default:
+ return 0;
+ }
+ return 1;
+}
+
+/*
+ * User interface specific /proc/suspend entries.
+ */
+
+static struct suspend_plugin_ops text_mode_ops;
+
+static struct suspend_proc_data proc_params[] = {
+ { .filename = "text_mode_progress_bar",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &draw_progress_bar,
+ .minimum = 0,
+ .maximum = 1,
+
+ }
+ }
+ },
+
+ { .filename = "disable_textmode_support",
+ .permissions = PROC_RW,
+ .type = SUSPEND_PROC_DATA_INTEGER,
+ .data = {
+ .integer = {
+ .variable = &text_mode_ops.disabled,
+ .minimum = 0,
+ .maximum = 1,
+ }
+ }
+ }
+};
+
+static struct suspend_plugin_ops text_mode_ops = {
+ .type = UI_PLUGIN,
+ .name = "Text Mode Support",
+ .ops = {
+ .ui = {
+ .prepare = text_prepare_console,
+ .log_level_change = text_loglevel_change,
+ .message = text_message,
+ .update_progress = text_update_progress,
+ .cleanup = cleanup_console,
+ .keypress = text_keypress,
+ .post_kernel_restore_redraw =
+ text_redraw,
+ }
+ }
+};
+
+/* ---- Registration ---- */
+
+static __init int text_mode_load(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+ int result;
+
+ if (!(result = suspend_register_plugin(&text_mode_ops))) {
+ printk("Software Suspend text mode support loaded.\n");
+ for (i=0; i< numfiles; i++)
+ suspend_register_procfile(&proc_params[i]);
+ }
+ return result;
+}
+
+#ifdef MODULE
+static __exit void text_mode_unload(void)
+{
+ int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data);
+
+ printk("Software Suspend text mode support unloading.\n");
+
+ for (i=0; i< numfiles; i++)
+ suspend_unregister_procfile(&proc_params[i]);
+
+ suspend_unregister_plugin(&text_mode_ops);
+}
+
+module_init(text_mode_load);
+module_exit(text_mode_unload);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nigel Cunningham");
+MODULE_DESCRIPTION("Suspend2 Text Mode support");
+#else
+late_initcall(text_mode_load);
+#endif


2004-11-24 18:55:48

by Yaroslav Rastrigin

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

Hi, Christoph,
On 24 November 2004 16:29, Christoph Hellwig wrote:
> On Wed, Nov 24, 2004 at 11:59:02PM +1100, Nigel Cunningham wrote:
> > Here we add simple hooks so that the user can interact with suspend
> > while it is running. (Hmm. The serial console condition could be
> > simplified :>). The hooks allow you to do such things as:
> >
> > - cancel suspending
> > - change the amount of detail of debugging info shown
> > - change what debugging info is shown
> > - pause the process
> > - single step
> > - toggle rebooting instead of powering down
>
> And why would we want this? If the users calls the suspend call
> he surely wants to suspend, right?
Probably, wrong. Suspend running server remotely, to resolve hotswap issues,
f.e., and reboot to already prepared environment, without bothering
techsupport in the middle of the night.
>
> After all we don't have inkernel hooks to allow a user to read instead
> write after calling sys_write.
Wrong analogy here, sorry.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Managing your Territory since the dawn of times ...

2004-11-24 20:22:14

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 34/51: Includes

Hi.

On Thu, 2004-11-25 at 00:25, Christoph Hellwig wrote:
> please submit header changes together with the matching code changes.

I can split them a little, but most of these suspend2 specific includes
are used by multiple files. Ranges, for example, are used everywhere.

> And all this plugin thingies in here look like overengineering.

I can see that it might look that way, but it's actually fundamental to
the support for building as modules (which is required for LVM &
encryption), and has been really helpful in creating clear distinctions
between the different parts of suspend. It also provides a clear method
for someone to add support for their new wizz-bang storage method or
compressor.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 20:24:30

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 19/51: Remove MTRR sysdev support.

Hi.

On Thu, 2004-11-25 at 03:27, Zwane Mwaikambo wrote:
> On Wed, 24 Nov 2004, Nigel Cunningham wrote:
>
> > This patch removes sysdev support for MTRRs (potential SMP hang and
> > shouldn't be done with interrupts done anyway). Instead, we save and
> > restore MTRRs when entering and exiting the processor freezers (ie when
> > saving the registers & context for each CPU via an SMP call).
>
> I take it this has been tested with AGP and X11 running?

Absolutely. It is used all the time. (The machine I'm typing on now has
HT support and I normally suspend from X - Radeon driver and just double
checked that agpgart is loaded).

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 20:33:56

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 18/51: Debug page_alloc support.

Hi.

On Thu, 2004-11-25 at 03:02, Dave Hansen wrote:
> On Wed, 2004-11-24 at 04:58, Nigel Cunningham wrote:
> > +#ifdef CONFIG_HIGHMEM
> > + if (page >= highmem_start_page)
> > + return 0;
> > +#endif
>
> There's a patch pending in -mm to kill highmem_start_page. Please use
> PageHighMem().

That's not out-of-line, is it? (We use it while resuming too, IIRC).
I'll take a look.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 20:58:54

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi.

On Thu, 2004-11-25 at 01:15, Christoph Hellwig wrote:
> On Wed, Nov 24, 2004 at 11:57:55PM +1100, Nigel Cunningham wrote:
> > While eating memory, we will potentially trigger this a lot. We
> > therefore disable the message when suspending.
>
> So call the allocator with __GFP_NOWARN

Everywhere?

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 21:32:56

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

Hi.

On Thu, 2004-11-25 at 03:42, Zwane Mwaikambo wrote:
> Ok,
> Do you see anything missing (from an implementation point of view)
> for the following?
>
> Suspend:
> 1) suspend all cpus, save cpu0
> 2) proceed with state saving on cpu0 only
> 3) begin suspend
>
> Resume:
> 1) begin resume
> 2) offline all currently online cpus
> 3) proceed with state restoring
> 4) online all previously online cpus

That's roughly what we're doing now, apart from the offlining/onlining.
I had considered trying to take better advantage of SMP support (perhaps
run a decompression thread on one CPU and the writer on the other, eg),
so we might want to apply this just to the region immediately around the
atomic copy/restore. That makes me wonder, though, what the advantage is
to switching to using the hotplug functionality - is it x86 only, or
more cross platform? (If more cross platform, that might possibly be an
advantage over the current code).

> A lot of the subsystems which have work split across cpus will now have
> work migrated across to cpu0, in that regard, which have you made swsusp
> savvy? It looks like the timer changes might need looking at any others?

All of the other threads, including the migration threads, are frozen,
so I don't believe that anything gets migrated to CPU0. I'll double
check when I next suspend.

As to the timers, I fully agree. Thawing them needs a mechanism for
keeping the per-cpu timers staggered.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 21:37:09

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 42/51: Suspend.c

Hi.

On Thu, 2004-11-25 at 03:52, Zwane Mwaikambo wrote:
> On Thu, 25 Nov 2004, Nigel Cunningham wrote:
>
> > Here's the heart of the core :> (No, that's not a typo).
> >
> > - Device suspend/resume calls
> > - Power down
> > - Highest level routine
> > - all_settings proc entry handling
>
> This isn't the only patch (the utility.c file is another one) which
> introduces functions/helpers which are subsystem specific (like ACPI) but
> somehow land up in the same file with a suspend_ prefix. I understand that
> it'll be more work but can you get them integrated with the subsystem in
> question?

Okee doke. I've thought about separating some of those debugging
specific functions out into their own file too.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 21:45:06

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

Hi.

On Thu, 2004-11-25 at 00:29, Christoph Hellwig wrote:
> On Wed, Nov 24, 2004 at 11:59:02PM +1100, Nigel Cunningham wrote:
> > Here we add simple hooks so that the user can interact with suspend
> > while it is running. (Hmm. The serial console condition could be
> > simplified :>). The hooks allow you to do such things as:
> >
> > - cancel suspending
> > - change the amount of detail of debugging info shown
> > - change what debugging info is shown
> > - pause the process
> > - single step
> > - toggle rebooting instead of powering down
>
> And why would we want this? If the users calls the suspend call
> he surely wants to suspend, right?

Have you ever pressed control-alt-delete/init 0 and then gone "Oh. I
forgot, I wanted to..."? That's why you'd want to be able to cancel
suspending.

The ability to toggle rebooting is helpful because you don't have to
edit a config file/proc entry. You can use one key press to initiate the
suspend, and press 'R' iif you want to reboot (eg for dual booting)
instead of powering down.

The other options are really helpful when testing and debugging, and can
be turned off at compile time.

By the way, thanks for all the feedback.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 22:11:12

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi.

On Thu, 2004-11-25 at 03:00, Dave Hansen wrote:
> On Wed, 2004-11-24 at 04:57, Nigel Cunningham wrote:
> > While eating memory, we will potentially trigger this a lot. We
> > therefore disable the message when suspending.
> >
> > diff -ruN 503-disable-page-alloc-warnings-while-suspending-old/mm/page_alloc.c 503-disable-page-alloc-warnings-while-suspending-new/mm/page_alloc.c
> > --- 503-disable-page-alloc-warnings-while-suspending-old/mm/page_alloc.c 2004-11-06 09:24:37.231308424 +1100
> > +++ 503-disable-page-alloc-warnings-while-suspending-new/mm/page_alloc.c 2004-11-06 09:24:40.844759096 +1100
> > @@ -725,7 +725,10 @@
> > }
> >
> > nopage:
> > - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
> > + if ((!(gfp_mask & __GFP_NOWARN)) &&
> > + (!test_suspend_state(SUSPEND_RUNNING)) &&
> > + printk_ratelimit()) {
> > +
> > printk(KERN_WARNING "%s: page allocation failure."
> > " order:%d, mode:0x%x\n",
> > p->comm, order, gfp_mask);
>
> Following Documentation/SubmittingPatches, please submit patches made
> with "diff -urp":
>
> -p --show-c-function
> Show which C function each change is in.
>
> Otherwise, it's a lot harder to figure out what you're modifying.

Okay; thanks. I wont go redoing all of the patches now, but are there
specific ones you'd like to see?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 22:14:25

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 7/51: Reboot handler hook.

Hi.

On Thu, 2004-11-25 at 00:07, Christoph Hellwig wrote:
> > -#ifdef CONFIG_SOFTWARE_SUSPEND
> > +#ifdef CONFIG_SOFTWARE_SUSPEND2
> > case LINUX_REBOOT_CMD_SW_SUSPEND:
> > {
> > - int ret = software_suspend();
> > + int ret = -EINVAL;
> > + if (!(test_suspend_state(SUSPEND_DISABLED))) {
> > + suspend_try_suspend();
> > + ret = 0;
> > + }
> > unlock_kernel();
>
> total crap. Thbis patch breaks the existing swsusp and turns a clean
> interface into a horrible one. Just implement am

It doesn't break the existing suspend; rather it overrides the action. I
will however tidy it up. Sorry about that; I reversed the test a while
ago and didn't notice that it could be tidier. I note though that your
solution fails to unlock_kernel().

Regards,

Nigel

> int software_suspend(void)
> {
> if (test_suspend_state(SUSPEND_DISABLED))
> return -EINVAL;
> suspend_try_suspend();
> return 0;
> }
>
> in your code.
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 22:30:11

by Dave Hansen

[permalink] [raw]
Subject: Re: Suspend 2 merge: 18/51: Debug page_alloc support.

On Wed, 2004-11-24 at 12:17, Nigel Cunningham wrote:
> On Thu, 2004-11-25 at 03:02, Dave Hansen wrote:
> > On Wed, 2004-11-24 at 04:58, Nigel Cunningham wrote:
> > > +#ifdef CONFIG_HIGHMEM
> > > + if (page >= highmem_start_page)
> > > + return 0;
> > > +#endif
> >
> > There's a patch pending in -mm to kill highmem_start_page. Please use
> > PageHighMem().
>
> That's not out-of-line, is it? (We use it while resuming too, IIRC).
> I'll take a look.

Nope. That's a simple single-bit page->flags check.

-- Dave

2004-11-24 22:36:41

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Thu, 2004-11-25 at 00:28, Christoph Hellwig wrote:
> Your way of merging looks rather wrong. Please submit changes against the
> current swsusp code that introduce one feature after another to bring it
> at the level you want. You'll surely have to rewrok it a lot until all
> reviewers are happy.

I realise that it needs further cleanup; that's why I'm submitting it
now for comment and not asking 'please apply'. As to patching against
swsusp, I'm purposely not doing that. The reason is that suspend2 isn't
a bunch of incremental changes to swsusp. It has been redesigned from
the ground up and I'd have to pull swsusp to pieces and put it back
together to do the same things.

I'm thus seeking to simply merge the existing code, let Pavel and others
get to the point where they're ready to say "Okay, we're satisfied that
suspend2 does everything swsusp does and more and better." Then we can
remove swsusp. This is the plan that was discussed with Pavel and Andrew
ages ago. I've just been slow to get there because I'm doing this
part-time voluntary.

> And most importantly for each patch explain exactly what feature it
> implements and why, etc.. "swsusp2" tells exactly nothing about the
> changed you do.

Okay. The changes include:

- Almost no BUG() statements. Wherever possible, if something goes
wrong, we back out and give the user a perfectly usable system back
- Speed: All I/O is asynchronous where possible and readahead used where
not. Routines everywhere optimised to get things done as fast as poss.
(Think low battery).
- Flexible: You can tune performance to your system in a number of ways.
You can use/not use bootsplash, text output, compression drivers as you
choose. You can change your swap configuration without having to reboot
just to change the resume2= parameter. You can cancel a suspend if you
want, or disable the possibility of doing so.
- Reliability. I haven't run the tests for a while, but Michael Frank
produced a suite that was used to stress test the software (under 2.4)
while running 100s (1000s at least once) of cycles. There have been some
significant changes since then, but the software is essentially the
same.
- Test bed: Around 10,000 downloads of the 1.0 patch, 2730 to date of
the 2.1.5 version I released 2 weeks ago.
- Swap file support
- Support for LVM/dm-crypt and siblings
- Support for having device drivers as modules (resume from an
initrd/initramfs)
- Almost all memory allocations are order 0, making suspend more
reliable under load.
- Designed to save as much of memory as possible rather than as little
(making the system more responsive post-resume).
- Support for SMP
- Support for preempt
- Support for 4GB highmem (hope to do 64GB soonish)
- Support for suspending/resuming over a network possible but not yet
implemented (hope to do so soon)

I realise it's only some, but I think it gives you the jist :>

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 22:38:00

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.

Hi.

On Thu, 2004-11-25 at 01:44, Ingo Molnar wrote:
> * Nigel Cunningham <[email protected]> wrote:
>
> > New exports for suspend. I've cut them down some as a result of the
> > last review, but could perhaps do more? Would people prefer to see a
> > single struct wrapping exported functions?
>
> > --- 400-exports-old/kernel/sched.c 2004-11-06 09:23:53.364977120 +1100
> > +++ 400-exports-new/kernel/sched.c 2004-11-06 09:23:56.627481144 +1100
> > @@ -3798,6 +3798,7 @@
> >
> > read_unlock(&tasklist_lock);
> > }
> > +EXPORT_SYMBOL(show_state);
>
> this one is ok i think, but make it EXPORT_SYMBOL_GPL() please.

Okay. Will do.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-24 22:56:43

by Jan Rychter

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

>>>>> "Christoph" == Christoph Hellwig <[email protected]> writes:
Christoph> On Wed, Nov 24, 2004 at 11:59:02PM +1100, Nigel Cunningham
Christoph> wrote:
>> Here we add simple hooks so that the user can interact with suspend
>> while it is running. (Hmm. The serial console condition could be
>> simplified :>). The hooks allow you to do such things as:
>>
>> - cancel suspending
>> - change the amount of detail of debugging info shown
>> - change what debugging info is shown
>> - pause the process
>> - single step
>> - toggle rebooting instead of powering down

Christoph> And why would we want this? If the users calls the suspend
Christoph> call he surely wants to suspend, right?

Obviously you have never actually tried to use software suspend in real
life.

I would kindly suggest that you try to use it on your laptop for at
least several weeks in various circumstances. These features are a
result of years of user experience.

--J.

Subject: Re: Suspend 2 merge: 46/51: LZF support.

Hi Nigel!

Shouldn't LZF code go to lib/ ?

On Thu, 25 Nov 2004 00:02:09 +1100, Nigel Cunningham
<[email protected]> wrote:
> This is LZF support, contributed under a dual license (see below) by
> Marc Lehmann. It flies! (Those stats in the debug info in an earlier
> patch were real!).

2004-11-24 23:09:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

On Wed, Nov 24, 2004 at 01:57:58PM -0800, Jan Rychter wrote:
> Obviously you have never actually tried to use software suspend in real
> life.
>
> I would kindly suggest that you try to use it on your laptop for at
> least several weeks in various circumstances. These features are a
> result of years of user experience.

I tend to buy laptops that just suspend when closing the lid, and no, I never
had the strange desired to immediately reverse my choice. Neither do I want
to stop the shutdown that I just initiated.

But for those people who do shutdown has a nice option to delay the actual
shutdown/reboot - I'm pretty sure the same can be done for swsusp without
sprinkling hooks all over the kernel.

2004-11-24 23:42:42

by Roman Zippel

[permalink] [raw]
Subject: Re: Suspend 2 merge: 26/51: Kconfig and makefile.

Hi,

On Thu, 25 Nov 2004, Nigel Cunningham wrote:

> I'm not sure exactly what 'such indentations' means. Could you please
> give me a pointer to how it should look (I was blindly following what I
> thought was the pattern to follow and will happily follow something else
> :>).

What did you look at it? Where else did you find such indentations?

bye, Roman

2004-11-24 23:49:21

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.

Hi.

On Thu, 2004-11-25 at 00:12, Christoph Hellwig wrote:
> > /*
> > * Platforms implementing 32 bit compatibility ioctl handlers in
> > - * modules need this exported
> > + * modules need this exported. So does Suspend2 (when made as
> > + * modules), so the export_symbol is now unconditional.
> > */
> > -#ifdef CONFIG_COMPAT
> > EXPORT_SYMBOL(sys_ioctl);
> > -#endif
>
> This is definitly the wrong interface for whatever you want to do.

Do you know what I want to do?

Frankly, I actually agree. I'd rather use vt_console_print and gotoxy
directly to do the display of information that doesn't need to clutter
the logs, but when I first submitted code for doing that, people
suggested using /dev/console instead. That makes using these syscalls
necessary.

> > +EXPORT_SYMBOL(proc_match);
>
> Also nothing anything outside of procfs internals should do.

This was because, following the "use files" methodology above, I have to
be able to find the file I want to use (/proc/splash) even when /proc
isn't mounted yet. I'll happily just call splash_write_proc.

> > unsigned long avenrun[3];
> > +EXPORT_SYMBOL(avenrun);
>
> Nothing you should poke into.

Mmm. Commented on this elsewhere. Perhaps rather than saving and
restoring the values, I should inhibit them being updated? If the BIOS
was doing the suspending/resuming, we wouldn't object to them not being
updated. Does that sound like a better solution? (Remember the aim was
avoid making sendmail etc refuse to work for a while because the load
average is too high).

> > +/* Exported for Software Suspend 2 */
> > +EXPORT_SYMBOL(nr_free_highpages);
> > +EXPORT_SYMBOL(pgdat_list);
>
> Dito.

Used for preparing the image.

> > +EXPORT_SYMBOL(swap_free);
> > +EXPORT_SYMBOL(swap_info);
> > +EXPORT_SYMBOL(sys_swapoff);
> > +EXPORT_SYMBOL(sys_swapon);
> > +EXPORT_SYMBOL(si_swapinfo);
> > +EXPORT_SYMBOL(map_swap_page);
> > +EXPORT_SYMBOL(get_swap_page);
> > +EXPORT_SYMBOL(get_swap_info_struct);
>
> Dito. Lowlevel swapdevice access isn't something modules should poke
> into.

It is if they're writing to swap.

> Nigel, why do I have this strange feeling that exactly the same patch
> was rejected already but you resubmitted it again?

Not exactly the same but yes, substantially. I listened then to the
comments and applied changes. I'm listening now. Unfortunately, though,
you just reject things outright without even discussing why they're
there or whether there's a better way I might not have thought of. I
freely admit that I'm not the world's greatest Linux guru; that's part
of why I'm submitting these patches now for review. Could you please try
to be more helpful? I'll promise to be receptive!

> If you want anything merged drop the modular swsusp bits, I doubt it'll
> ever be merged.

Aside from other advantages (quicker development etc), the modular bits
allow you to free about 150k (debugging compiled in) when you're not
suspending. I thought that was well worth the little bit of extra code.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 00:02:51

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

On Thu, 25 Nov 2004, Nigel Cunningham wrote:

> That's roughly what we're doing now, apart from the offlining/onlining.
> I had considered trying to take better advantage of SMP support (perhaps
> run a decompression thread on one CPU and the writer on the other, eg),
> so we might want to apply this just to the region immediately around the
> atomic copy/restore. That makes me wonder, though, what the advantage is
> to switching to using the hotplug functionality - is it x86 only, or
> more cross platform? (If more cross platform, that might possibly be an
> advantage over the current code).

It's cross platform and removes the requirement for patches like;

Subject: Suspend 2 merge: 13/51: Disable highmem tlb flush for copyback.

2004-11-25 00:48:45

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 26/51: Kconfig and makefile.

Hi.

On Thu, 2004-11-25 at 08:46, Roman Zippel wrote:
> Hi,
>
> On Thu, 25 Nov 2004, Nigel Cunningham wrote:
>
> > I'm not sure exactly what 'such indentations' means. Could you please
> > give me a pointer to how it should look (I was blindly following what I
> > thought was the pattern to follow and will happily follow something else
> > :>).
>
> What did you look at it? Where else did you find such indentations?

I'm guessing now, but I don't think I've done anything inconsistent with
the rest of the file. Assuming you mean the spaces before the help text,
that is there in the help for CONFIG_PM, for example.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 01:10:29

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 31/51: Export tlb flushing

Hi.

On Thu, 2004-11-25 at 02:32, Martin J. Bligh wrote:
> --Nigel Cunningham <[email protected]> wrote (on Wednesday, November 24, 2004 23:59:50 +1100):
>
> > This patch adds a do_flush_tlb_all function that does the
> > SMP-appropriate thing for suspend after the image is restored.
>
> Is software suspend only designed for i386, or is that the only arch that
> didn't have such a function already? Seems like too low a level to be
> exporting to me.

There's lowlevel code for x86 and ppc at the moment, more arch specific
code can be added. This function is used from the x86 restoration of the
original kernel (arch/i386/power/suspend2.c).

Regards,

Nigel

> M.
>
> > diff -ruN 818-tlb-flushing-functions-old/arch/i386/kernel/smp.c 818-tlb-flushing-functions-new/arch/i386/kernel/smp.c
> > --- 818-tlb-flushing-functions-old/arch/i386/kernel/smp.c 2004-11-06 09:27:19.225681536 +1100
> > +++ 818-tlb-flushing-functions-new/arch/i386/kernel/smp.c 2004-11-04 16:27:41.000000000 +1100
> > @@ -476,7 +476,7 @@
> > preempt_enable();
> > }
> >
> > -static void do_flush_tlb_all(void* info)
> > +void do_flush_tlb_all(void* info)
> > {
> > unsigned long cpu = smp_processor_id();
> >
> > diff -ruN 818-tlb-flushing-functions-old/include/asm-i386/tlbflush.h 818-tlb-flushing-functions-new/include/asm-i386/tlbflush.h
> > --- 818-tlb-flushing-functions-old/include/asm-i386/tlbflush.h 2004-11-03 21:55:01.000000000 +1100
> > +++ 818-tlb-flushing-functions-new/include/asm-i386/tlbflush.h 2004-11-04 16:27:41.000000000 +1100
> > @@ -82,6 +82,7 @@
> > #define flush_tlb() __flush_tlb()
> > #define flush_tlb_all() __flush_tlb_all()
> > #define local_flush_tlb() __flush_tlb()
> > +#define local_flush_tlb_all() __flush_tlb_all();
> >
> > static inline void flush_tlb_mm(struct mm_struct *mm)
> > {
> > @@ -114,6 +115,10 @@
> > extern void flush_tlb_current_task(void);
> > extern void flush_tlb_mm(struct mm_struct *);
> > extern void flush_tlb_page(struct vm_area_struct *, unsigned long);
> > +extern void do_flush_tlb_all(void * info);
> > +
> > +#define local_flush_tlb_all() \
> > + do_flush_tlb_all(NULL);
> >
> > #define flush_tlb() flush_tlb_current_task()
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 01:12:53

by Dave Hansen

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

On Wed, 2004-11-24 at 13:06, Nigel Cunningham wrote:
> On Thu, 2004-11-25 at 03:00, Dave Hansen wrote:
> > Following Documentation/SubmittingPatches, please submit patches made
> > with "diff -urp":
> >
> > -p --show-c-function
> > Show which C function each change is in.
> >
> > Otherwise, it's a lot harder to figure out what you're modifying.
>
> Okay; thanks. I wont go redoing all of the patches now, but are there
> specific ones you'd like to see?

I'd just add it to whatever scripts you use to publish patches and do it
that way from now on for all of them.

-- Dave

2004-11-25 01:18:08

by Jan Rychter

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

>>>>> "Christoph" == Christoph Hellwig <[email protected]> writes:
Christoph> On Wed, Nov 24, 2004 at 01:57:58PM -0800, Jan Rychter wrote:
>> Obviously you have never actually tried to use software suspend in
>> real life.
>>
>> I would kindly suggest that you try to use it on your laptop for at
>> least several weeks in various circumstances. These features are a
>> result of years of user experience.

Christoph> I tend to buy laptops that just suspend when closing the
Christoph> lid, and no, I never had the strange desired to immediately
Christoph> reverse my choice. Neither do I want to stop the shutdown
Christoph> that I just initiated.

Christoph> But for those people who do shutdown has a nice option to
Christoph> delay the actual shutdown/reboot - I'm pretty sure the same
Christoph> can be done for swsusp without sprinkling hooks all over the
Christoph> kernel.

Please accept that there are people who requested these features and
there are people who find them useful.

I really hope you understand that delaying a suspend is very different
from allowing the user to interrupt it.

Also, many people suspend Linux only to reboot into Windows. In this
case, the ability to tell the machine to reboot instead of powering down
is very useful. Obviously, a perfect user would plan ahead and suspend
"appropriately", depending on what he wants to do
afterwards. Unfortunately, I have found that I'm not a perfect user and
that I really tend to use and like these features.

As for me, I would much rather have a useful kernel that a beautiful
kernel, because I rather tend to use it than watch it, but I can
understand that other people may have different priorities. I would
suggest we find a compromise.

--J.

2004-11-25 01:18:09

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 26/51: Kconfig and makefile.

Hi.

On Thu, 2004-11-25 at 03:34, Roman Zippel wrote:
> Hi,
>
> On Wed, 24 Nov 2004, Nigel Cunningham wrote:
>
> > +menu "Software Suspend 2"
> > +
> > +config SOFTWARE_SUSPEND2_CORE
> > + tristate "Software Suspend 2"
> > + depends on PM
> > + select SOFTWARE_SUSPEND2
> > + ---help---
> > + Software Suspend 2 is the 'new and improved' suspend support. You
> > + can now build it as modules, but be aware that this requires
> > + initrd support (the modules you use in saving the image have to
> > + be loaded in order for you to be able to resume!)
> > +
> > + See the Software Suspend home page (softwaresuspend.berlios.de)
> > + for FAQs, HOWTOs and other documentation.
> > +
> > + config SOFTWARE_SUSPEND2
> > + bool
> > +
> > + if SOFTWARE_SUSPEND2
> > + config SOFTWARE_SUSPEND2_WRITER
> > + bool
> > +
>
> Please don't use such indentations.

I'm not sure exactly what 'such indentations' means. Could you please
give me a pointer to how it should look (I was blindly following what I
thought was the pattern to follow and will happily follow something else
:>).

> There is no need to use to select here either. If you really want to make
> it modular (and you can convince Christoph), you want to do something like
> this:

Okay. I've struggled a bit with the config language, and again looked to
other places to see how to achieve things. I've obviously missed better
code. Will give this a try.

Thanks very much!

Nigel

> config SOFTWARE_SUSPEND2
> tristate "Software Suspend 2"
> depends on PM
>
> config SOFTWARE_SUSPEND2_BUILTIN
> def_bool SOFTWARE_SUSPEND2
>
> and let everything else depend on SOFTWARE_SUSPEND2.
>
> > + config SOFTWARE_SUSPEND_SWAPWRITER
> > + tristate ' Swap Writer'
> > + depends on SWAP && SOFTWARE_SUSPEND2_CORE
> > + select SOFTWARE_SUSPEND2_WRITER
>
> This select is also bogus.
>
> > +
> > +ifeq ($(CONFIG_SOFTWARE_SUSPEND2),y)
> > +obj-y += suspend_builtin.o proc.o
> > +endif
>
> Use SOFTWARE_SUSPEND2_BUILTIN here without the ifeq.
>
> bye, Roman
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 02:42:40

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 26/51: Kconfig and makefile.

Hi.

On Thu, 2004-11-25 at 03:34, Roman Zippel wrote:
> Please don't use such indentations.
> There is no need to use to select here either. If you really want to make
> it modular (and you can convince Christoph), you want to do something like
> this:

I think I've caught on to what you're meaning. Is this better?

menu "Software Suspend 2"

config SOFTWARE_SUSPEND2
tristate "Software Suspend 2"
depends on PM
---help---
Software Suspend 2 is the 'new and improved' suspend support. You
can now build it as modules, but be aware that this requires
initrd support (the modules you use in saving the image have to
be loaded in order for you to be able to resume!)

See the Software Suspend home page (softwaresuspend.berlios.de)
for FAQs, HOWTOs and other documentation.

config SOFTWARE_SUSPEND2_CORE
def_bool SOFTWARE_SUSPEND2

if SOFTWARE_SUSPEND2
comment 'Image Storage (you need at least one writer)'
depends on SOFTWARE_SUSPEND2_CORE

config SOFTWARE_SUSPEND_SWAPWRITER
tristate ' Swap Writer'
depends on SWAP && SOFTWARE_SUSPEND2
---help---
This option enabled support for storing an image in your
swap space. Swap partitions are supported. Swap file
support is currently broken (16 April 2004).

comment 'Page Transformers'

config SOFTWARE_SUSPEND_LZF_COMPRESSION
tristate ' LZF image compression (Preferred)'
---help---
This option enables compression of pages stored during suspending
to disk, using LZF compression. LZF compression is fast and
still achieves a good compression ratio.

You probably want to say 'Y'.

config SOFTWARE_SUSPEND_GZIP_COMPRESSION
tristate ' GZIP image Compression (Slow)'
select ZLIB_DEFLATE
select ZLIB_INFLATE
---help---
This option enables compression of pages stored during Software Suspend
process. Pages are compressed using the zlib library, with a default
setting (in code) of fastest compression (still VERY slow!). If your swap
device is painfully slow compared to your CPU, you might possibly want
this. Then again, you might just want to upgrade your storage (if you
can).

Just in case you haven't gotten the hint yet, this option should be off
for most people. If will make your computer take a minute to suspend
when it could take seconds.

config SOFTWARE_SUSPEND_DEVICE_MAPPER
tristate ' Device Mapper support'
depends on BLK_DEV_DM
---help---
This option creates a module which allows Suspend to tell the
device mapper code to allocate enough memory for its work while
suspending. It doesn't do anything else, but without it, dm-crypt
won't work properly.

This option should be off for most people.

comment 'User Interface Options'

config SOFTWARE_SUSPEND_BOOTSPLASH
tristate ' Bootsplash support'
depends on BOOTSPLASH
---help---
This option enables support for Bootsplash (bootsplash.org). Suspend
can set the progress bar value and switch between silent and verbose
modes. (Silent mode is used when the debug level is 0 or 1).

config SOFTWARE_SUSPEND_TEXT_MODE
tristate ' Text mode console support'
depends on VT
---help---
This option enables support for a text mode 'nice display'. If you don't
have/want bootsplash support, you probably want this.

comment 'General Options'

config SOFTWARE_SUSPEND_DEFAULT_RESUME2
string ' Default resume device name'
---help---
You normally need to add a resume2= parameter to your lilo.conf or
equivalent. With this option properly set, the kernel has a value
to default. No damage will be done if the value is invalid.

config SOFTWARE_SUSPEND_KEEP_IMAGE
bool ' Allow Keep Image Mode'
---help---
This option allows you to keep and image and reuse it. It is intended
__ONLY__ for use with systems where all filesystems are mounted read-
only (kiosks, for example). To use it, compile this option in and boot
normally. Set the KEEP_IMAGE flag in /proc/software_suspend and suspend.
When you resume, the image will not be removed. You will be unable to turn
off swap partitions (assuming you are using the swap writer), but future
suspends simply do a power-down. The image can be updated using the
kernel command line parameter suspend_act= to turn off the keep image
bit. Keep image mode is a little less user friendly on purpose - it
should not be used without thought!

comment 'Debugging'

config SOFTWARE_SUSPEND_DEBUG
bool ' Compile in debugging output'
---help---
This option enables the inclusion of debugging info in the software
suspend code. Turning it off will reduce the kernel size but make
debugging suspend & resume issues harder to do.

For normal usage, this option can be turned off.

config SOFTWARE_SUSPEND_CHECKSUMS
tristate ' Compile checksum module'
---help---
This option enables compilation of a checksumming module, which can
be used to verify the correct operation of suspend.

For normal usage, this option can be turned off.
endif

endmenu


2004-11-25 02:45:22

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 7/51: Reboot handler hook.

Sorry.

Misunderstood you the first time. Got it right now.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 02:55:26

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 34/51: Includes

Hi.

On Thu, 2004-11-25 at 10:19, Matthew Garrett wrote:
> Nigel Cunningham <[email protected]> wrote:
>
> > I can see that it might look that way, but it's actually fundamental to
> > the support for building as modules (which is required for LVM &
> > encryption), and has been really helpful in creating clear distinctions
> > between the different parts of suspend. It also provides a clear method
> > for someone to add support for their new wizz-bang storage method or
> > compressor.
>
> I'm not entirely clear on this. Surely all that's needed for LVM and
> encryption support is for that to be set up in userspace and then allow
> userspace to trigger a second attempt at resume? I have a hacky patch
> for swsusp that allows that (at the moment it just adds a "resume"
> method to /sys/power/state), which gives you the functionality without
> the module pain.

Yes, sorry. I'm confusing initrd/ramfs support with modules. You can
resume from an initrd/ramfs without building as modules.

Regardless, building support as modules does have the other advantages
noted above, and I haven't found adding support for building as modules
to be a pain at all.

Sorry again for confusing the issue.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 03:10:20

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 46/51: LZF support.

Hi.

On Thu, 2004-11-25 at 10:01, Bartlomiej Zolnierkiewicz wrote:
> Hi Nigel!
>
> Shouldn't LZF code go to lib/ ?

I suppose it could. Will do.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 06:37:10

by Hu Gang

[permalink] [raw]
Subject: Re: Suspend 2 merge: 46/51: LZF support.

On Thu, Nov 25, 2004 at 01:38:44PM +1100, Nigel Cunningham wrote:
> Hi.
>
> On Thu, 2004-11-25 at 10:01, Bartlomiej Zolnierkiewicz wrote:
> > Hi Nigel!
> >
> > Shouldn't LZF code go to lib/ ?
>
> I suppose it could. Will do.
>
> Regards,

=== include/linux/lzf.h
==================================================================
--- include/linux/lzf.h (revision 24482)
+++ include/linux/lzf.h (revision 24483)
@@ -0,0 +1,7 @@
+#ifndef _LZF_H_
+#define _LZF_H_
+
+unsigned int lzf_decompress (const void *const in_data, unsigned int in_len, void *out_data, unsigned int out_len);
+unsigned int lzf_compress (const void *const in_data, unsigned int in_len, void *out_data, unsigned int out_len, void *hbuf);
+
+#endif
=== lib/lzf_d.c
==================================================================
--- lib/lzf_d.c (revision 24482)
+++ lib/lzf_d.c (revision 24483)
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2000-2002 Marc Alexander Lehmann <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without modifica-
+ * tion, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * the GNU General Public License version 2 (the "GPL"), in which case the
+ * provisions of the GPL are applicable instead of the above. If you wish to
+ * allow the use of your version of this file only under the terms of the
+ * GPL and not to allow others to use your version of this file under the
+ * BSD license, indicate your decision by deleting the provisions above and
+ * replace them with the notice and other provisions required by the GPL. If
+ * you do not delete the provisions above, a recipient may use your version
+ * of this file under either the BSD or the GPL.
+ */
+
+unsigned int
+lzf_decompress (const void *const in_data, unsigned int in_len,
+ void *out_data, unsigned int out_len)
+{
+ u8 const *ip = in_data;
+ u8 *op = out_data;
+ u8 const *const in_end = ip + in_len;
+ u8 *const out_end = op + out_len;
+
+ do
+ {
+ unsigned int ctrl = *ip++;
+
+ if (ctrl < (1 << 5)) /* literal run */
+ {
+ ctrl++;
+
+ if (op + ctrl > out_end)
+ return 0;
+
+#if USE_MEMCPY
+ memcpy (op, ip, ctrl);
+ op += ctrl;
+ ip += ctrl;
+#else
+ do
+ *op++ = *ip++;
+ while (--ctrl);
+#endif
+ }
+ else /* back reference */
+ {
+ unsigned int len = ctrl >> 5;
+
+ u8 *ref = op - ((ctrl & 0x1f) << 8) - 1;
+
+ if (len == 7)
+ len += *ip++;
+
+ ref -= *ip++;
+
+ if (op + len + 2 > out_end)
+ return 0;
+
+ if (ref < (u8 *)out_data)
+ return 0;
+
+ *op++ = *ref++;
+ *op++ = *ref++;
+
+ do
+ *op++ = *ref++;
+ while (--len);
+ }
+ }
+ while (op < out_end && ip < in_end);
+
+ return op - (u8 *)out_data;
+}
+
=== lib/Kconfig
==================================================================
--- lib/Kconfig (revision 24482)
+++ lib/Kconfig (revision 24483)
@@ -30,6 +30,9 @@
require M here. See Castagnoli93.
Module will be libcrc32c.

+config LZF
+ tristate "LZF Compress/Decompress Support"
+
#
# compression support is select'ed if needed
#
=== lib/lzf.c
==================================================================
--- lib/lzf.c (revision 24482)
+++ lib/lzf.c (revision 24483)
@@ -0,0 +1,46 @@
+/*
+ * lib/lzf.c
+ *
+ * Copyright (C) 2003 Marc Lehmann <[email protected]>
+ * Copyright (C) 2003,2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains LZH data compression and decompress for kernel.
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+
+/*
+ * size of hashtable is (1 << HLOG) * sizeof (char *)
+ * decompression is independent of the hash table size
+ * the difference between 15 and 14 is very small
+ * for small blocks (and 14 is also faster).
+ * For a low-memory configuration, use HLOG == 13;
+ * For best compression, use 15 or 16.
+ */
+#ifndef HLOG
+# define HLOG 14
+#endif
+
+/*
+ * sacrifice some compression quality in favour of compression speed.
+ * (roughly 1-2% worse compression for large blocks and
+ * 9-10% for small, redundant, blocks and >>20% better speed in both cases)
+ * In short: enable this for binary data, disable this for text data.
+ */
+#ifndef ULTRA_FAST
+# define ULTRA_FAST 1
+#endif
+
+#define STRICT_ALIGN 0
+#define USE_MEMCPY 1
+#define INIT_HTAB 0
+
+#include "lzf_c.c"
+#include "lzf_d.c"
+
+EXPORT_SYMBOL_GPL(lzf_compress);
+EXPORT_SYMBOL_GPL(lzf_decompress);
+
=== lib/Makefile
==================================================================
--- lib/Makefile (revision 24482)
+++ lib/Makefile (revision 24483)
@@ -15,6 +15,7 @@
lib-y += dec_and_lock.o
endif

+
obj-$(CONFIG_CRC_CCITT) += crc-ccitt.o
obj-$(CONFIG_CRC32) += crc32.o
obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
@@ -23,6 +24,8 @@
obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/

+obj-$(CONFIG_LZF) += lzf.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

=== lib/lzf_c.c
==================================================================
--- lib/lzf_c.c (revision 24482)
+++ lib/lzf_c.c (revision 24483)
@@ -0,0 +1,220 @@
+/*
+ * Copyright (c) 2000-2003 Marc Alexander Lehmann <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without modifica-
+ * tion, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * the GNU General Public License version 2 (the "GPL"), in which case the
+ * provisions of the GPL are applicable instead of the above. If you wish to
+ * allow the use of your version of this file only under the terms of the
+ * GPL and not to allow others to use your version of this file under the
+ * BSD license, indicate your decision by deleting the provisions above and
+ * replace them with the notice and other provisions required by the GPL. If
+ * you do not delete the provisions above, a recipient may use your version
+ * of this file under either the BSD or the GPL.
+ */
+
+#define HSIZE (1 << (HLOG))
+
+/*
+ * don't play with this unless you benchmark!
+ * decompression is not dependent on the hash function
+ * the hashing function might seem strange, just believe me
+ * it works ;)
+ */
+#define FRST(p) (((p[0]) << 8) + p[1])
+#define NEXT(v,p) (((v) << 8) + p[2])
+#define IDX(h) ((((h ^ (h << 5)) >> (3*8 - HLOG)) + h*3) & (HSIZE - 1))
+/*
+ * IDX works because it is very similar to a multiplicative hash, e.g.
+ * (h * 57321 >> (3*8 - HLOG))
+ * the next one is also quite good, albeit slow ;)
+ * (int)(cos(h & 0xffffff) * 1e6)
+ */
+
+#if 0
+/* original lzv-like hash function */
+# define FRST(p) (p[0] << 5) ^ p[1]
+# define NEXT(v,p) ((v) << 5) ^ p[2]
+# define IDX(h) ((h) & (HSIZE - 1))
+#endif
+
+#define MAX_LIT (1 << 5)
+#define MAX_OFF (1 << 13)
+#define MAX_REF ((1 << 8) + (1 << 3))
+
+/*
+ * compressed format
+ *
+ * 000LLLLL <L+1> ; literal
+ * LLLOOOOO oooooooo ; backref L
+ * 111OOOOO LLLLLLLL oooooooo ; backref L+7
+ *
+ */
+
+unsigned int
+lzf_compress (const void *const in_data, unsigned int in_len,
+ void *out_data, unsigned int out_len, void *hbuf)
+{
+ const u8 **htab = hbuf;
+ const u8 **hslot;
+ const u8 *ip = (const u8 *)in_data;
+ u8 *op = (u8 *)out_data;
+ const u8 *in_end = ip + in_len;
+ u8 *out_end = op + out_len;
+ const u8 *ref;
+
+ unsigned int hval = FRST (ip);
+ unsigned long off;
+ int lit = 0;
+
+#if INIT_HTAB
+# if USE_MEMCPY
+ memset (htab, 0, sizeof (htab));
+# else
+ for (hslot = htab; hslot < htab + HSIZE; hslot++)
+ *hslot++ = ip;
+# endif
+#endif
+
+ for (;;)
+ {
+ if (ip < in_end - 2)
+ {
+ hval = NEXT (hval, ip);
+ hslot = htab + IDX (hval);
+ ref = *hslot; *hslot = ip;
+
+ if (1
+#if INIT_HTAB && !USE_MEMCPY
+ && ref < ip /* the next test will actually take care of this, but this is faster */
+#endif
+ && (off = ip - ref - 1) < MAX_OFF
+ && ip + 4 < in_end
+ && ref > (u8 *)in_data
+#if STRICT_ALIGN
+ && ref[0] == ip[0]
+ && ref[1] == ip[1]
+ && ref[2] == ip[2]
+#else
+ && *(u16 *)ref == *(u16 *)ip
+ && ref[2] == ip[2]
+#endif
+ )
+ {
+ /* match found at *ref++ */
+ unsigned int len = 2;
+ unsigned int maxlen = in_end - ip - len;
+ maxlen = maxlen > MAX_REF ? MAX_REF : maxlen;
+
+ do
+ len++;
+ while (len < maxlen && ref[len] == ip[len]);
+
+ if (op + lit + 1 + 3 >= out_end)
+ return 0;
+
+ if (lit)
+ {
+ *op++ = lit - 1;
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+ }
+
+ len -= 2;
+ ip++;
+
+ if (len < 7)
+ {
+ *op++ = (off >> 8) + (len << 5);
+ }
+ else
+ {
+ *op++ = (off >> 8) + ( 7 << 5);
+ *op++ = len - 7;
+ }
+
+ *op++ = off;
+
+#if ULTRA_FAST
+ ip += len;
+ hval = FRST (ip);
+ hval = NEXT (hval, ip);
+ htab[IDX (hval)] = ip;
+ ip++;
+#else
+ do
+ {
+ hval = NEXT (hval, ip);
+ htab[IDX (hval)] = ip;
+ ip++;
+ }
+ while (len--);
+#endif
+ continue;
+ }
+ }
+ else if (ip == in_end)
+ break;
+
+ /* one more literal byte we must copy */
+ lit++;
+ ip++;
+
+ if (lit == MAX_LIT)
+ {
+ if (op + 1 + MAX_LIT >= out_end)
+ return 0;
+
+ *op++ = MAX_LIT - 1;
+#if USE_MEMCPY
+ memcpy (op, ip - MAX_LIT, MAX_LIT);
+ op += MAX_LIT;
+ lit = 0;
+#else
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+#endif
+ }
+ }
+
+ if (lit)
+ {
+ if (op + lit + 1 >= out_end)
+ return 0;
+
+ *op++ = lit - 1;
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+ }
+
+ return op - (u8 *) out_data;
+}

--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-25 07:00:49

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: Suspend 2 merge: 46/51: LZF support.

Hi,

On Thursday 25 November 2004 01:32 am, [email protected] wrote:
....
> +
> + if (lit)
> + {
> + if (op + lit + 1 >= out_end)
> + return 0;
> +
> + *op++ = lit - 1;
> + lit = -lit;
> + do
> + *op++ = ip[lit];
> + while (++lit);
> + }
> +
> + return op - (u8 *) out_data;
> +}
>

Since this is a completely new file (as far as kernel tree is concerned)
could you convert it to proper coding style (braces placement, identation)?

--
Dmitry

2004-11-25 07:03:30

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

Hi.

On Thu, 2004-11-25 at 08:55, Zwane Mwaikambo wrote:
> On Thu, 25 Nov 2004, Nigel Cunningham wrote:
>
> > That's roughly what we're doing now, apart from the offlining/onlining.
> > I had considered trying to take better advantage of SMP support (perhaps
> > run a decompression thread on one CPU and the writer on the other, eg),
> > so we might want to apply this just to the region immediately around the
> > atomic copy/restore. That makes me wonder, though, what the advantage is
> > to switching to using the hotplug functionality - is it x86 only, or
> > more cross platform? (If more cross platform, that might possibly be an
> > advantage over the current code).
>
> It's cross platform and removes the requirement for patches like;
>
> Subject: Suspend 2 merge: 13/51: Disable highmem tlb flush for copyback.

Good point. I didn't see that.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-25 07:35:28

by Hu Gang

[permalink] [raw]
Subject: Re: Suspend 2 merge: 46/51: LZF support.

On Thu, Nov 25, 2004 at 01:52:33AM -0500, Dmitry Torokhov wrote:
> Hi,
>
> On Thursday 25 November 2004 01:32 am, [email protected] wrote:
> ....

> Since this is a completely new file (as far as kernel tree is concerned)
> could you convert it to proper coding style (braces placement, identation)?

Lindent lib/lzf*.c include/linux/lzf*.h

=== include/linux/lzf.h
==================================================================
--- include/linux/lzf.h (revision 24480)
+++ include/linux/lzf.h (revision 24489)
@@ -0,0 +1,10 @@
+#ifndef _LZF_H_
+#define _LZF_H_
+
+unsigned int lzf_decompress (const void *const in_data, unsigned int in_len, void *out_data, unsigned int out_len);
+unsigned int lzf_compress (const void *const in_data, unsigned int in_len, void *out_data, unsigned int out_len, void *hbuf);
+
+char *lzf_new(void);
+void lzf_free(char *);
+
+#endif
=== lib/lzf_d.c
==================================================================
--- lib/lzf_d.c (revision 24480)
+++ lib/lzf_d.c (revision 24489)
@@ -0,0 +1,94 @@
+/*
+ * Copyright (c) 2000-2002 Marc Alexander Lehmann <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without modifica-
+ * tion, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * the GNU General Public License version 2 (the "GPL"), in which case the
+ * provisions of the GPL are applicable instead of the above. If you wish to
+ * allow the use of your version of this file only under the terms of the
+ * GPL and not to allow others to use your version of this file under the
+ * BSD license, indicate your decision by deleting the provisions above and
+ * replace them with the notice and other provisions required by the GPL. If
+ * you do not delete the provisions above, a recipient may use your version
+ * of this file under either the BSD or the GPL.
+ */
+
+unsigned int
+lzf_decompress(const void *const in_data, unsigned int in_len,
+ void *out_data, unsigned int out_len)
+{
+ u8 const *ip = in_data;
+ u8 *op = out_data;
+ u8 const *const in_end = ip + in_len;
+ u8 *const out_end = op + out_len;
+
+ do {
+ unsigned int ctrl = *ip++;
+
+ if (ctrl < (1 << 5)) { /* literal run */
+ ctrl++;
+
+ if (op + ctrl > out_end)
+ return 0;
+
+#if USE_MEMCPY
+ memcpy(op, ip, ctrl);
+ op += ctrl;
+ ip += ctrl;
+#else
+ do
+ *op++ = *ip++;
+ while (--ctrl);
+#endif
+ } else { /* back reference */
+
+ unsigned int len = ctrl >> 5;
+
+ u8 *ref = op - ((ctrl & 0x1f) << 8) - 1;
+
+ if (len == 7)
+ len += *ip++;
+
+ ref -= *ip++;
+
+ if (op + len + 2 > out_end)
+ return 0;
+
+ if (ref < (u8 *) out_data)
+ return 0;
+
+ *op++ = *ref++;
+ *op++ = *ref++;
+
+ do
+ *op++ = *ref++;
+ while (--len);
+ }
+ }
+ while (op < out_end && ip < in_end);
+
+ return op - (u8 *) out_data;
+}
=== lib/Kconfig
==================================================================
--- lib/Kconfig (revision 24480)
+++ lib/Kconfig (revision 24489)
@@ -30,6 +30,9 @@
require M here. See Castagnoli93.
Module will be libcrc32c.

+config LZF
+ tristate "LZF Compress/Decompress Support"
+
#
# compression support is select'ed if needed
#
=== lib/lzf.c
==================================================================
--- lib/lzf.c (revision 24480)
+++ lib/lzf.c (revision 24489)
@@ -0,0 +1,61 @@
+/*
+ * lib/lzf.c
+ *
+ * Copyright (C) 2003 Marc Lehmann <[email protected]>
+ * Copyright (C) 2003,2004 Nigel Cunningham <[email protected]>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This file contains LZH data compression and decompress for kernel.
+ */
+
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/lzf.h>
+
+/*
+ * size of hashtable is (1 << HLOG) * sizeof (char *)
+ * decompression is independent of the hash table size
+ * the difference between 15 and 14 is very small
+ * for small blocks (and 14 is also faster).
+ * For a low-memory configuration, use HLOG == 13;
+ * For best compression, use 15 or 16.
+ */
+#ifndef HLOG
+# define HLOG 14
+#endif
+
+/*
+ * sacrifice some compression quality in favour of compression speed.
+ * (roughly 1-2% worse compression for large blocks and
+ * 9-10% for small, redundant, blocks and >>20% better speed in both cases)
+ * In short: enable this for binary data, disable this for text data.
+ */
+#ifndef ULTRA_FAST
+# define ULTRA_FAST 1
+#endif
+
+#define STRICT_ALIGN 0
+#define USE_MEMCPY 1
+#define INIT_HTAB 0
+
+#include "lzf_c.c"
+#include "lzf_d.c"
+
+EXPORT_SYMBOL_GPL(lzf_compress);
+EXPORT_SYMBOL_GPL(lzf_decompress);
+
+char *lzf_new(void)
+{
+ char *wk = vmalloc_32((1 << HLOG) * sizeof(char *));
+ return (wk);
+}
+
+void lzf_free(char *wk)
+{
+ vfree(wk);
+}
+
+EXPORT_SYMBOL_GPL(lzf_new);
+EXPORT_SYMBOL_GPL(lzf_free);
=== lib/Makefile
==================================================================
--- lib/Makefile (revision 24480)
+++ lib/Makefile (revision 24489)
@@ -15,6 +15,7 @@
lib-y += dec_and_lock.o
endif

+
obj-$(CONFIG_CRC_CCITT) += crc-ccitt.o
obj-$(CONFIG_CRC32) += crc32.o
obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
@@ -23,6 +24,8 @@
obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/

+obj-$(CONFIG_LZF) += lzf.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

=== lib/lzf_c.c
==================================================================
--- lib/lzf_c.c (revision 24480)
+++ lib/lzf_c.c (revision 24489)
@@ -0,0 +1,209 @@
+/*
+ * Copyright (c) 2000-2003 Marc Alexander Lehmann <[email protected]>
+ *
+ * Redistribution and use in source and binary forms, with or without modifica-
+ * tion, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * 3. The name of the author may not be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
+ * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+ * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
+ * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
+ * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Alternatively, the contents of this file may be used under the terms of
+ * the GNU General Public License version 2 (the "GPL"), in which case the
+ * provisions of the GPL are applicable instead of the above. If you wish to
+ * allow the use of your version of this file only under the terms of the
+ * GPL and not to allow others to use your version of this file under the
+ * BSD license, indicate your decision by deleting the provisions above and
+ * replace them with the notice and other provisions required by the GPL. If
+ * you do not delete the provisions above, a recipient may use your version
+ * of this file under either the BSD or the GPL.
+ */
+
+#define HSIZE (1 << (HLOG))
+
+/*
+ * don't play with this unless you benchmark!
+ * decompression is not dependent on the hash function
+ * the hashing function might seem strange, just believe me
+ * it works ;)
+ */
+#define FRST(p) (((p[0]) << 8) + p[1])
+#define NEXT(v,p) (((v) << 8) + p[2])
+#define IDX(h) ((((h ^ (h << 5)) >> (3*8 - HLOG)) + h*3) & (HSIZE - 1))
+/*
+ * IDX works because it is very similar to a multiplicative hash, e.g.
+ * (h * 57321 >> (3*8 - HLOG))
+ * the next one is also quite good, albeit slow ;)
+ * (int)(cos(h & 0xffffff) * 1e6)
+ */
+
+#if 0
+/* original lzv-like hash function */
+# define FRST(p) (p[0] << 5) ^ p[1]
+# define NEXT(v,p) ((v) << 5) ^ p[2]
+# define IDX(h) ((h) & (HSIZE - 1))
+#endif
+
+#define MAX_LIT (1 << 5)
+#define MAX_OFF (1 << 13)
+#define MAX_REF ((1 << 8) + (1 << 3))
+
+/*
+ * compressed format
+ *
+ * 000LLLLL <L+1> ; literal
+ * LLLOOOOO oooooooo ; backref L
+ * 111OOOOO LLLLLLLL oooooooo ; backref L+7
+ *
+ */
+
+unsigned int
+lzf_compress(const void *const in_data, unsigned int in_len,
+ void *out_data, unsigned int out_len, void *hbuf)
+{
+ const u8 **htab = hbuf;
+ const u8 **hslot;
+ const u8 *ip = (const u8 *)in_data;
+ u8 *op = (u8 *) out_data;
+ const u8 *in_end = ip + in_len;
+ u8 *out_end = op + out_len;
+ const u8 *ref;
+
+ unsigned int hval = FRST(ip);
+ unsigned long off;
+ int lit = 0;
+
+#if INIT_HTAB
+# if USE_MEMCPY
+ memset(htab, 0, sizeof(htab));
+# else
+ for (hslot = htab; hslot < htab + HSIZE; hslot++)
+ *hslot++ = ip;
+# endif
+#endif
+
+ for (;;) {
+ if (ip < in_end - 2) {
+ hval = NEXT(hval, ip);
+ hslot = htab + IDX(hval);
+ ref = *hslot;
+ *hslot = ip;
+
+ if (1
+#if INIT_HTAB && !USE_MEMCPY
+ && ref < ip /* the next test will actually take
+ care of this, but this is faster */
+#endif
+ && (off = ip - ref - 1) < MAX_OFF
+ && ip + 4 < in_end && ref > (u8 *) in_data
+#if STRICT_ALIGN
+ && ref[0] == ip[0]
+ && ref[1] == ip[1]
+ && ref[2] == ip[2]
+#else
+ && *(u16 *) ref == *(u16 *) ip && ref[2] == ip[2]
+#endif
+ ) {
+ /* match found at *ref++ */
+ unsigned int len = 2;
+ unsigned int maxlen = in_end - ip - len;
+ maxlen = maxlen > MAX_REF ? MAX_REF : maxlen;
+
+ do
+ len++;
+ while (len < maxlen && ref[len] == ip[len]);
+
+ if (op + lit + 1 + 3 >= out_end)
+ return 0;
+
+ if (lit) {
+ *op++ = lit - 1;
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+ }
+
+ len -= 2;
+ ip++;
+
+ if (len < 7) {
+ *op++ = (off >> 8) + (len << 5);
+ } else {
+ *op++ = (off >> 8) + (7 << 5);
+ *op++ = len - 7;
+ }
+
+ *op++ = off;
+
+#if ULTRA_FAST
+ ip += len;
+ hval = FRST(ip);
+ hval = NEXT(hval, ip);
+ htab[IDX(hval)] = ip;
+ ip++;
+#else
+ do {
+ hval = NEXT(hval, ip);
+ htab[IDX(hval)] = ip;
+ ip++;
+ }
+ while (len--);
+#endif
+ continue;
+ }
+ } else if (ip == in_end)
+ break;
+
+ /* one more literal byte we must copy */
+ lit++;
+ ip++;
+
+ if (lit == MAX_LIT) {
+ if (op + 1 + MAX_LIT >= out_end)
+ return 0;
+
+ *op++ = MAX_LIT - 1;
+#if USE_MEMCPY
+ memcpy(op, ip - MAX_LIT, MAX_LIT);
+ op += MAX_LIT;
+ lit = 0;
+#else
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+#endif
+ }
+ }
+
+ if (lit) {
+ if (op + lit + 1 >= out_end)
+ return 0;
+
+ *op++ = lit - 1;
+ lit = -lit;
+ do
+ *op++ = ip[lit];
+ while (++lit);
+ }
+
+ return op - (u8 *) out_data;
+}
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-26 19:03:50

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 51/51: Notes

Hi.

On Fri, 2004-11-26 at 11:01, Pavel Machek wrote:
> Hi!
>
> > When I started, I thought I did have 51 patches, really! One of them
> > turned out to be a couple of things I intend to reverse :>
>
> :-))))
>
> > In posting all of this, I recognise of course that no one else
> > understands how it all fits together. I'm hoping that those who care
> > enough will ask questions that I'll happily answer, learn from and
> > through which I'll improve the code.
> >
> > For now, though, I'm going to bed.
>
> I still had not fallen asleep at keyboard, and that is pretty
> amazing...

:>

> It is just too big. suspend2 is small operating system on its own, and
> that is not good thing :-(.

I feel like you overstate your case a lot. I can see what you mean in
some ways, though.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 19:07:16

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

On Wed, Nov 24, 2004 at 05:22:27PM -0800, Jan Rychter wrote:
> Please accept that there are people who requested these features and
> there are people who find them useful.

There are millions of features usefull to some people. If these features
can be implemented without affecting the existing codebase it's usually
a no go. If OTOH the do nasty things with kernel internals, and the feature
isn't exactly the most important in the world judgment is different.

2004-11-26 19:44:21

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi!

> > > I thought I wrote - perhaps I'm wrong here - that I understand that your
> > > new work in this area might make this unnecessary. I really only want to
> > > do it this way because I don't know what other drivers might be doing
> > > while we're writing the LRU pages. I'm not worried about them touching
> > > LRU. What I am worried about is them allocating memory and starving
> > > suspend so that we get hangs due to being oom. If they're suspended, we
> > > have more certainty as to how memory is being used. I don't remember
> > > what prompted me to do this in the first place, but I'm pretty sure it
> > > would have been a real observed issue.
> >
> > Uh... It seems like quite a lot of work. Would not reserving few more
> > pages help here? Or perhaps right solution is to fix "broken" drivers
> > that need too much memory...
>
> I'd agree, except that I don't know how many to allocate. It makes
> getting a reliable suspend the result of guess work and favourable
> circumstances. Fixing 'broken' drivers by really suspending them seems
> to me to be the right solution. Make their memory requirements perfectly
> predictable.

Except for the few drivers that are between suspend device and
root. So you still have the same problem, and still need to
guess. Plus you get complex changes to driver model.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 19:53:28

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge:L 12/51: Disable OOM killer when suspending.

Hi!

> > > When preparing the image, suspend eats all the memory in sight, both to
> > > reduce the image size and to improve the reliability of our stats (We've
> > > worked hard to make it work reliably under heavy load - 100+). Of course
> > > this can result in the OOM killer being triggered, so this simple test
> > > stops that happening.
> >
> > andrew's shrink_all_memory should enable you to free memory without
> > hacking OOM killer, no?
>
> I do use shrink_all_memory, but I also then allocate those pages that
> were freed. We added that when seeking to get Suspend to work well and
> reliably under heavy load. IIRC, the issue was that pages that were
> freed were immediately getting allocated by other programs. Having said
> this, it is a while since I looked at the code for preparing the image.
> I can take a look and confirm my thinking.

How is it possible that other programs steal memory when they are
frozen? That just should not happen.

> > Hmm, yes, something like this migh be usefull for BUG_ONs etc...
> > For consistency, right name is probably in_suspend(void).
>
> There is a difference; there is sections of time where we're in_suspend
> (test_suspend_state(SUSPEND_RUNNING)) but the freezer isn't on (initial
> set up and cleanup). As far as the OOM killer goes, it probably doesn't
> matter which is used, but I thought it important to point out that
> freezer being on !== in_suspend(). (Freezer could also be on for S3?..
> 'spose you don't care of OOM killer runs then, though). Would you like
> to see in_freezer()?

There was discussion on linux-pm that something like in_freezer()
would be nice for sanity-checks, but don't introduce it just because
of that.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 19:55:08

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 18/51: Debug page_alloc support.

Hi!

> This patch provides support for making suspend work when DEBUG_PAGEALLOC
> is enabled.

Is swsusp1 broken in this config?
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-26 19:55:10

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi!

> > > While eating memory, we will potentially trigger this a lot. We
> > > therefore disable the message when suspending.
> >
> > You should only trigger this while eating memory, so *one* GFP_NOWARN should be
> > enough. And shrink_all_memory should fix it anyway.
>
> Agreed. I wasn't seriously suggesting changing everywhere to be
> GFP_NOWARN. Perhaps I should be more explicit in what I'm saying here.
> The problem isn't just suspend trying to allocate memory. It's
> _ANYTHING_ that might be running trying to allocate memory while we're
> eating memory. (Remember that we don't just call shrink_all_memory, but
> also allocate that memory so other processes don't grab it and stop us
> making forward progress). As a result, they're going to scream when they
> can't allocate a page.

Hmm, that does not look too healthy. That means that userland programs
will see all kinds of weird error conditions that normally
"almost-can't-happen" during normal usage.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 20:07:26

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > Your way of merging looks rather wrong. Please submit changes against the
> > current swsusp code that introduce one feature after another to bring it
> > at the level you want. You'll surely have to rewrok it a lot until all
> > reviewers are happy.
>
> I realise that it needs further cleanup; that's why I'm submitting it
> now for comment and not asking 'please apply'. As to patching against
> swsusp, I'm purposely not doing that. The reason is that suspend2 isn't
> a bunch of incremental changes to swsusp. It has been redesigned from
> the ground up and I'd have to pull swsusp to pieces and put it back
> together to do the same things.
>
> I'm thus seeking to simply merge the existing code, let Pavel and others
> get to the point where they're ready to say "Okay, we're satisfied that
> suspend2 does everything swsusp does and more and better." Then we can
> remove swsusp. This is the plan that was discussed with Pavel and Andrew
> ages ago. I've just been slow to get there because I'm doing this
> part-time voluntary.

hugang seems to show that it indeed is possible to incrementally turn
swsusp into suspend2. I do not think Andrew really wanted it that way,
and I thought of that as of neccessary evil.

[Okay, at this point I'll understand when you'll put my picture as a
texture to some doom3 monster and shoot me thousand times... Lot of
work went into suspend2, but in the meantime lot of work went into
swsusp1, too...]

> > And most importantly for each patch explain exactly what feature it
> > implements and why, etc.. "swsusp2" tells exactly nothing about the
> > changed you do.
>
> Okay. The changes include:
>
> - Almost no BUG() statements. Wherever possible, if something goes
> wrong, we back out and give the user a perfectly usable system back

Patrick did a lot of work in this area, and there are 10 BUGs() in
swsusp just now. [And I do not think "no BUGs()" is a feature -- look
at my comments, at one point you just ignored "can not happen
condition". That's bad, it can hide real bugs.] I have no reports of
swsusp1 going BUG() for users, and that's what counts.

> - Speed: All I/O is asynchronous where possible and readahead used where
> not. Routines everywhere optimised to get things done as fast as poss.
> (Think low battery).

I fixed O(n^2) behaviour in swsusp1 (not yet in). I do not think that
asynchronous I/O is does that much difference.

> - Reliability. I haven't run the tests for a while, but Michael Frank
> produced a suite that was used to stress test the software (under 2.4)
> while running 100s (1000s at least once) of cycles. There have been some
> significant changes since then, but the software is essentially the
> same.

Well, swsusp1 is getting a lot of testing too. Is the test-suite
somewhere easily available?

> - Test bed: Around 10,000 downloads of the 1.0 patch, 2730 to date of
> the 2.1.5 version I released 2 weeks ago.

Hmm, look at number of downloads of 2.6.9 kernel, I think I win here
;-)))). SuSE9.2 is actually shipping swsusp1 and advertising it as a
feature. And it seems to work for people...

> - Swap file support
> - Support for LVM/dm-crypt and siblings
> - Support for having device drivers as modules (resume from an
> initrd/initramfs)

Okay, you win these.

> - Almost all memory allocations are order 0, making suspend more
> reliable under load.

I'll have to fix this. Fortunately hugang already has a patch.

> - Designed to save as much of memory as possible rather than as little
> (making the system more responsive post-resume).

hugang already has a patch, but I'm not 100% sure if I want it
in. Yes, people seem to like this feature, but it complicates
*design*, quite a lot.

> - Support for SMP
> - Support for preempt
> - Support for 4GB highmem (hope to do 64GB soonish)

This works in swsusp1, too. Parts of SMP support need to be rewritten
to assembly, but same is probably true for suspend2.

I'm not sure if there are still problems with swsusp1 refrigerator, if
so add

- Suspend2 actually works under load

to the list.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 20:20:51

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 18/51: Debug page_alloc support.

Hi.

On Fri, 2004-11-26 at 05:21, Pavel Machek wrote:
> Hi!
>
> > This patch provides support for making suspend work when DEBUG_PAGEALLOC
> > is enabled.
>
> Is swsusp1 broken in this config?

I'd be surprised if it wasn't. You need to map pages in (and unmap them)
to do the copy/restore.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:20:49

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 17/51: Disable MCE checking during suspend.

Hi.

On Fri, 2004-11-26 at 05:19, Pavel Machek wrote:
> Hi!
>
> > Avoid a potential SMP deadlock here.
>
> ..and loose MCE report.

Deadlock or get an MCE report and do a printk when we're shutting down
anyway?
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:25:36

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 23/51: PPC support.

Hi.

On Fri, 2004-11-26 at 05:40, Pavel Machek wrote:
> Hi!
>
> > Not updated for a while, so I'm not sure if it still works. If not, it
> > shouldn't take much to get it going again.
>
> It should have a lot in common with hugang's swsusp1/ppc support, right?
> Can you coordinate with him and get that in?

He submitted it in the first place, so I'm already relying on him to
send updates.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:29:54

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi.

On Fri, 2004-11-26 at 05:53, Pavel Machek wrote:
> Hi!
>
> > This patch allows the device tree to be split up into multiple trees. I
> > don't really expect it to be merged, but it is an important part of
> > suspend at the moment, and I certainly want to see something like it
> > that will allow us to suspend some parts of the device tree and not
> > others. Suspend2 uses it to keep alive the hard drive (or equivalent)
> > that we're writing the image to while suspending other devices, thus
> > improving the consistency of the image written.
> >
> > I remember from last time this was posted that someone commented on
> > exporting the default device tree; I haven't changed that yet.
>
> Q: I do not understand why you have such strong objections to idea of
> selective suspend.
>
> A: Do selective suspend during runtime power managment, that's
> okay. But
> its useless for suspend-to-disk. (And I do not see how you could use
> it for suspend-to-ram, I hope you do not want that).
>
> Lets see, so you suggest to
>
> * SUSPEND all but swap device and parents
> * Snapshot
> * Write image to disk
> * SUSPEND swap device and parents
> * Powerdown
>
> Oh no, that does not work, if swap device or its parents uses DMA,
> you've corrupted data. You'd have to do
>
> * SUSPEND all but swap device and parents
> * FREEZE swap device and parents
> * Snapshot
> * UNFREEZE swap device and parents
> * Write
> * SUSPEND swap device and parents
>
> Which means that you still need that FREEZE state, and you get more
> complicated code. (And I have not yet introduce details like system
> devices).

There's obviously a misunderstanding here. What I do is:

SUSPEND all but swap device and parents
WRITE LRU pages
SUSPEND swap device and parents (+sysdev)
Snapshot
RESUME swap device and parents (+sysdev)
WRITE snapshot
SUSPEND swap device and parents
POWERDOWN everything

I thought I wrote - perhaps I'm wrong here - that I understand that your
new work in this area might make this unnecessary. I really only want to
do it this way because I don't know what other drivers might be doing
while we're writing the LRU pages. I'm not worried about them touching
LRU. What I am worried about is them allocating memory and starving
suspend so that we get hangs due to being oom. If they're suspended, we
have more certainty as to how memory is being used. I don't remember
what prompted me to do this in the first place, but I'm pretty sure it
would have been a real observed issue.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:34:03

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Fri, 2004-11-26 at 06:20, Pavel Machek wrote:
> Hi!
>
> > > Your way of merging looks rather wrong. Please submit changes against the
> > > current swsusp code that introduce one feature after another to bring it
> > > at the level you want. You'll surely have to rewrok it a lot until all
> > > reviewers are happy.
> >
> > I realise that it needs further cleanup; that's why I'm submitting it
> > now for comment and not asking 'please apply'. As to patching against
> > swsusp, I'm purposely not doing that. The reason is that suspend2 isn't
> > a bunch of incremental changes to swsusp. It has been redesigned from
> > the ground up and I'd have to pull swsusp to pieces and put it back
> > together to do the same things.
> >
> > I'm thus seeking to simply merge the existing code, let Pavel and others
> > get to the point where they're ready to say "Okay, we're satisfied that
> > suspend2 does everything swsusp does and more and better." Then we can
> > remove swsusp. This is the plan that was discussed with Pavel and Andrew
> > ages ago. I've just been slow to get there because I'm doing this
> > part-time voluntary.
>
> hugang seems to show that it indeed is possible to incrementally turn
> swsusp into suspend2. I do not think Andrew really wanted it that way,
> and I thought of that as of neccessary evil.

With some changes, yes. But when you come to using extents or
abstracting the method of storage and implementing plugins, it will be
ground-up redesign. Of course you might not want to go that far.

> [Okay, at this point I'll understand when you'll put my picture as a
> texture to some doom3 monster and shoot me thousand times... Lot of
> work went into suspend2, but in the meantime lot of work went into
> swsusp1, too...]

Not at all. Perhaps I'm overstating the case or not spending enough time
looking at your code, but I don't actually think swsusp has changed a
lot in the two years since I started working on this. (Want my picture
now? :>)

> > > And most importantly for each patch explain exactly what feature it
> > > implements and why, etc.. "swsusp2" tells exactly nothing about the
> > > changed you do.
> >
> > Okay. The changes include:
> >
> > - Almost no BUG() statements. Wherever possible, if something goes
> > wrong, we back out and give the user a perfectly usable system back
>
> Patrick did a lot of work in this area, and there are 10 BUGs() in
> swsusp just now. [And I do not think "no BUGs()" is a feature -- look
> at my comments, at one point you just ignored "can not happen
> condition". That's bad, it can hide real bugs.] I have no reports of
> swsusp1 going BUG() for users, and that's what counts.

Not sure what you're talking about with the 'can not happen condition'.
Regarding real reports, I agree.

> > - Speed: All I/O is asynchronous where possible and readahead used where
> > not. Routines everywhere optimised to get things done as fast as poss.
> > (Think low battery).
>
> I fixed O(n^2) behaviour in swsusp1 (not yet in). I do not think that
> asynchronous I/O is does that much difference.

Oh, it makes a huge difference once you're not eating all the memory you
can. If I submit I/O one at a time, I do 1 or 2 MB/s. With asynchrounous
I/O, I can write 70MB/s and read 110MB/s with compression, 58|58 without
compression (that's the maximum throughput of the drive I'm using at the
moment). If I can streamline things a further, I should be able to lift
that write rate further, too.

> > - Reliability. I haven't run the tests for a while, but Michael Frank
> > produced a suite that was used to stress test the software (under 2.4)
> > while running 100s (1000s at least once) of cycles. There have been some
> > significant changes since then, but the software is essentially the
> > same.
>
> Well, swsusp1 is getting a lot of testing too. Is the test-suite
> somewhere easily available?

I believe Michael is preparing a new version. I assume he'll put it on
https://developer.berlios.de/projects/lstress/ when he's done.

> > - Test bed: Around 10,000 downloads of the 1.0 patch, 2730 to date of
> > the 2.1.5 version I released 2 weeks ago.
>
> Hmm, look at number of downloads of 2.6.9 kernel, I think I win here
> ;-)))). SuSE9.2 is actually shipping swsusp1 and advertising it as a
> feature. And it seems to work for people...

:> But not everyone who uses 2.6.9 uses swsusp. :>

> > - Swap file support
> > - Support for LVM/dm-crypt and siblings
> > - Support for having device drivers as modules (resume from an
> > initrd/initramfs)
>
> Okay, you win these.

I don't want to have a competition, really. I just want to convince you
that I've done some worthwhile work :>

> > - Almost all memory allocations are order 0, making suspend more
> > reliable under load.
>
> I'll have to fix this. Fortunately hugang already has a patch.
>
> > - Designed to save as much of memory as possible rather than as little
> > (making the system more responsive post-resume).
>
> hugang already has a patch, but I'm not 100% sure if I want it
> in. Yes, people seem to like this feature, but it complicates
> *design*, quite a lot.

It does. But if there were fundamental flaws in the approach, we would
have found them by now. Since you're using bio calls and not swap's own
read/write functions, you shouldn't have any problems.

> > - Support for SMP
> > - Support for preempt
> > - Support for 4GB highmem (hope to do 64GB soonish)
>
> This works in swsusp1, too. Parts of SMP support need to be rewritten
> to assembly, but same is probably true for suspend2.

See separate comments - I think it can all stay as C.

> I'm not sure if there are still problems with swsusp1 refrigerator, if
> so add
>
> - Suspend2 actually works under load
>

Hopefully, we'll merge the refrigerator changes soon; then you can say
that for swsusp1 too.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:38:59

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi.

On Fri, 2004-11-26 at 05:33, Pavel Machek wrote:
> Hi!
>
> > Included in this patch is a new try_to_freeze() macro Andrew M suggested
> > a while back. The refrigerator declarations are put in sched.h to save
> > extra includes of suspend.h.
>
> try_to_freeze looks nice. Could we get it in after 2.6.10 opens?

I'm hoping to get the whole thing in mm once all these replies are dealt
with. Does that sound unrealistic?

try_to_freeze() should certainly be possible, because it was Andrew's
idea to start with.

> > +++ 582-refrigerator-new/drivers/pnp/pnpbios/core.c 2004-11-24 17:58:33.769748640 +1100
> > @@ -179,6 +179,10 @@
> > * Poll every 2 seconds
> > */
> > msleep_interruptible(2000);
> > +
> > + if(current->flags & PF_FREEZE)
> > + refrigerator(PF_FREEZE);
> > +
> > if(signal_pending(current))
> > break;
> >
>
> Use new interface here?

Could do. Will change.

> > */
> > int fsync_super(struct super_block *sb)
> > {
> > + int ret;
> > +
> > + /* A safety net. During suspend, we might overwrite
> > + * memory containing filesystem info. We don't then
> > + * want to sync it to disk. */
> > + if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
> > + return 0;
> > +
>
> If it is safety net, do BUG_ON().

Could get triggered by user pressing SysRq. (Or via a panic?). I don't
think the SysRq should result in a panic; nor should a panic result in a
recursive call to panic (although I'm wondering here, wasn't the call to
syncing in panic taken out?).

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:49:52

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi.

On Fri, 2004-11-26 at 09:36, Pavel Machek wrote:
> Hi!
>
> > > > Included in this patch is a new try_to_freeze() macro Andrew M suggested
> > > > a while back. The refrigerator declarations are put in sched.h to save
> > > > extra includes of suspend.h.
> > >
> > > try_to_freeze looks nice. Could we get it in after 2.6.10 opens?
> >
> > I'm hoping to get the whole thing in mm once all these replies are dealt
> > with. Does that sound unrealistic?
>
> Yes, a little ;-).

I'm not talking about talking about problems and then doing nothing :>
I'm writing a list of changes as I look at each of these responses.
Assuming they're all addressed (or not changed for good reasons), and
the code is actually useful, why shouldn't it go into mm?

> > > > */
> > > > int fsync_super(struct super_block *sb)
> > > > {
> > > > + int ret;
> > > > +
> > > > + /* A safety net. During suspend, we might overwrite
> > > > + * memory containing filesystem info. We don't then
> > > > + * want to sync it to disk. */
> > > > + if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
> > > > + return 0;
> > > > +
> > >
> > > If it is safety net, do BUG_ON().
> >
> > Could get triggered by user pressing SysRq. (Or via a panic?). I don't
> > think the SysRq should result in a panic; nor should a panic result in a
> > recursive call to panic (although I'm wondering here, wasn't the call to
> > syncing in panic taken out?).
>
> Silently doing nothing when user asked for sync is not nice,
> either. BUG() is better solution than that.

I don't think we should BUG because the user presses Sys-Rq S while
suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
when suspending. Sound reasonable?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:48:10

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi.

On Fri, 2004-11-26 at 09:41, Pavel Machek wrote:
> > I thought I wrote - perhaps I'm wrong here - that I understand that your
> > new work in this area might make this unnecessary. I really only want to
> > do it this way because I don't know what other drivers might be doing
> > while we're writing the LRU pages. I'm not worried about them touching
> > LRU. What I am worried about is them allocating memory and starving
> > suspend so that we get hangs due to being oom. If they're suspended, we
> > have more certainty as to how memory is being used. I don't remember
> > what prompted me to do this in the first place, but I'm pretty sure it
> > would have been a real observed issue.
>
> Uh... It seems like quite a lot of work. Would not reserving few more
> pages help here? Or perhaps right solution is to fix "broken" drivers
> that need too much memory...

I'd agree, except that I don't know how many to allocate. It makes
getting a reliable suspend the result of guess work and favourable
circumstances. Fixing 'broken' drivers by really suspending them seems
to me to be the right solution. Make their memory requirements perfectly
predictable.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:02:15

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi.

On Fri, 2004-11-26 at 08:56, Pavel Machek wrote:
> Hi!
>
> > > > While eating memory, we will potentially trigger this a lot. We
> > > > therefore disable the message when suspending.
> > >
> > > You should only trigger this while eating memory, so *one* GFP_NOWARN should be
> > > enough. And shrink_all_memory should fix it anyway.
> >
> > Agreed. I wasn't seriously suggesting changing everywhere to be
> > GFP_NOWARN. Perhaps I should be more explicit in what I'm saying here.
> > The problem isn't just suspend trying to allocate memory. It's
> > _ANYTHING_ that might be running trying to allocate memory while we're
> > eating memory. (Remember that we don't just call shrink_all_memory, but
> > also allocate that memory so other processes don't grab it and stop us
> > making forward progress). As a result, they're going to scream when they
> > can't allocate a page.
>
> Hmm, that does not look too healthy. That means that userland programs
> will see all kinds of weird error conditions that normally
> "almost-can't-happen" during normal usage.

Failure to allocate memory should be something any caller to get_*_page
deals with, so if they don't, are we to be blamed?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:05:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

> I'm hoping to get the whole thing in mm once all these replies are dealt
> with. Does that sound unrealistic?

Yes. Absolutely.

2004-11-26 21:14:15

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 17/51: Disable MCE checking during suspend.

Hi.

On Fri, 2004-11-26 at 09:31, Pavel Machek wrote:
> Hi!
>
> > > > Avoid a potential SMP deadlock here.
> > >
> > > ..and loose MCE report.
> >
> > Deadlock or get an MCE report and do a printk when we're shutting down
> > anyway?
>
> If MCE happens, I'd like user to report it. Loosing it is wrong,
> deadlocking may be actually better because at least you get the
> report. I'd BUG().
>
> MCEs are hardware problem, right? They should not be common.

It's not them occurring that's the problem, it's checking for them that
involves an SMP call :<

> static void mce_work_fn(void *data)
> {
> if (!test_suspend_state(SUSPEND_RUNNING))
> on_each_cpu(mce_checkregs, NULL, 1, 1);
> schedule_delayed_work(&mce_work, MCE_RATE);
> }

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 20:07:25

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.


> > Here we add simple hooks so that the user can interact with suspend
> > while it is running. (Hmm. The serial console condition could be
> > simplified :>). The hooks allow you to do such things as:
> >
> > - cancel suspending

I can understand that you want this one. I do not think uglyness is
worth it, through.

> > - change the amount of detail of debugging info shown

Use sysrq-X as you do during runtime.

> > - change what debugging info is shown
> > - pause the process
> > - single step

Usefull for developing swsusp but not for using it. Should live as a
separate patch.

> > - toggle rebooting instead of powering down

This is prety much nonsensical. You can do echo reboot > disk. If you
forget to do it, all you have to do is press power after it powers
down. That's about as much work as pressing 'R' while you are
suspending, right?

Please drop this one.

> And why would we want this? If the users calls the suspend call
> he surely wants to suspend, right?
>
> After all we don't have inkernel hooks to allow a user to read instead
> write after calling sys_write.

:-))))))))).
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 21:29:35

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Fri, 2004-11-26 at 11:39, Pavel Machek wrote:
> Hi!
>
> > > I'd prefer not to get plugins and abstract storage. I'm not sure about
> > > extents, but as soon as I can get rid of order-8 allocations, things
> > > should be ok.
> >
> > Don't you need more than that? Unless things have changed, you still
> > spend most of your time eating memory so that you only have a few megs
> > (well, maybe a little more) to write to disk. If you do reduce the
> > amount of memory you eat, then you need to work to make the I/O
> > faster.
>
> I'm not *that* concerned about speed. Getting rid of order-8 is
> for preventing "sorry, not enough RAM to suspend to disk".

Priority wise, I agree. But given that the order 8 issue is dealt with,
speed is important. Particularly when your power just went out and your
UPS battery is running down.

> > > Okay, 58MB/sec is better than 1MB/sec. I do not think I want the
> > > complexity neccessary to get me 70MB/sec.
> >
> > Fair enough for you, but not everyone can say that. At the end of the
> > day, I'm not writing this code just for me to use, though. Many of the
> > features I've added have been added for the benefit of other people; I
> > assume you'd do the same. Most laptops can't do 58MB/s, so the
> > difference is much bigger. (My original laptop hard drive did 17/s; with
> > compression I think it achieved something close to double that,
> > depending on the data being compressed, of course).
>
> I do have too fast machines around me. But notice that compression
> only does factor-2 speedup. If we wanted to make whole kernel uglier,
> we could probably achieve factor-2 speedup for any benchmark... just
> it would be bad idea.

Again, when you're running on limited time, twice as fast is still twice
as fast.

> > > In some ways, suspend2 is two years ahead of rest of kernel:
> > > * you have interactive debugging
> > > * file compression
> > > * nice splash screen
> > > * plugin interface for transparent network support
> > >
> > > Unfortunately, we do not want compression done like that. It would
> > > make sense to do compressed-LVM or something like that (that way
> > > everyone would get the benefit), but it does not make sense to have it
> > > just for suspend2. And we do not want the rest of features, too,
> > > unless they work for the rest of kernel.
> >
> > The cryptoapi provides support for both compression and encryption. I'd
> > happily make use of that, but we still need a way for the user to choose
> > what compression/encryption they want and configure it. I'm not at all
> > adverse to the idea of shifting the lzf compression support into being a
> > cryptoapi plugin. That shouldn't be hard to do precisely because I have
> > the plugin system :>.
>
> Actually I'd like to see lzf done at LVM level; that way it is usefull
> for people not doing suspend, too, and we should not need plugin
> infrastructure in suspend2 (LVM provides us with that service).

That ignores that the vast majority of people don't use LVM at the
moment. Perhaps you could argue that they should. The other thing is,
I'm trying not to make assumptions about how we're writing the image,
either. If you want to pipe your image over a network to some server,
you should be able to, and not have to implement compression again in
the writer for that.

> > > You did wonderfull work -- you shown what is possible with
> > > suspend2. Now we just need to scale it back to what is practical. It
> > > needs not only to work, it also needs to be nice, simple, and easy to
> > > maintain.
> >
> > I think it is practical. Apart from the bootsplash support, I don't
> > think I have added any feature because I thought "Hey, this looks like a
> > fun thing to try.". Every feature has been added because it makes
> > suspend faster, more reliable, more user friendly, more versatile or the
> > like. If we want Linux to get adopted by desktop users, it needs to
>
> I believe you need to say "no" way more often. One user is not enough
> to justify feature in mainline kernel, and any number of users should
> not be enough to make GZIP compression supported by suspend2.

Okay. Let's say I drop GZIP. I've just asked on the suspend list for
good reasons not to do it. I'll be surprised if I get any :> (And I'll
ask for proof that they get a higher throughput with GZIP then with
LZF!).

I still think the plugin system is useful. It made adding LZF
compression and DM support really easy, and also means work can be done
on a generic file writer without needing to pull out all of the
swapwriter code. It also made making suspend modular far easier, which
in turn means you don't have to have the memory in use all the time,
when you only really want it the functionality ready to go when you want
to power down.

> > > I believe it has at least one pretty bad flaw: it has hooks all over
> > > the place and will be nightmare to maintain. Puting suspend hooks into
> > > memory allocation is not nice.
> >
> > That's a big statement.
> >
> > "Hooks all over the place" was a phrase first used to refer to the
> > attempts at making freezing more reliable. That's irrelevant now with
> > the simplified three stages to freezing. The hooks are the same as for
> > swsusp1 there.
> >
> > The hooks you've seen in the rest of the kernel are generally only
> > further supplements to the freezing, so that swsusp should probably have
> > them too.
> >
> > The hooks in the memory management are minimal and wrapped in
> > unlikely(), so they shouldn't really be a problem in the normal flow
> > of
>
> Yes, they are unlikely(); but still they are hooks into memory
> managment. They are at least ugly as hell. And no swsusp1 does not
> this particular set of hooks, and does not need to patch sysrq-S.

How is ugly defined here? Can you give me an example that does the same
thing, but which you consider less ugly?

if (unlikely(test_suspend_state(SUSPEND_USE_MEMORY_POOL))) {
suspend2_free_pool_pages(page, order);
return;
}

I nearly launched into a flame war, but I'll try to be more gracious
than that.

> > things. While suspend is running, they serve a good and necessary
> > purpose. Using high level routines, we can't guarantee that new slab
>
> They are neccessary because of two-stages LRU saving... I'm trying to
> argue "two-stages LRU saving is wrong"...

I know you are. What I'm not sure about is whether you believe that the
user should never have the option of saving a full image of their
memory, or whether you think there's a better way to do it.

> > > swsusp1 is pretty self-contained. As long as drivers stop the DMA and
> > > NMI does nothing wrong, atomic snapshot will indeed be atomic.
> > >
> > > Can you list conditions neccessary for suspend2 to work?
> >
> > Not really sure what you mean. At the moment, the main hinderance to it
> > working properly is driver model support (USB, DRI, as said previously).
> > That forces us to have a userspace script to compile as modules and
> > stop/unload support around kernel call. Given that these things are
> > done, and that suspend is able to get enough memory to do it's work
> > (almost never a problem), suspend should always work.
>
> No, assume driver problems are solved.

Okay.

> For swsusp2, you need drivers to stop the DMA, NMI not interfering,
> sync may not happen after you have saved LRU, memory may not be
> alocated from slab after you have saved LRU. (something else? This
> needs to be written down somewhere, and all kernel hackers will need
> to be carefull not to break these rules. Do you see why it wories me?)
>
> swsusp1 is more self-contained. As long as drivers stop the DMA and
> NMI does nothing wrong, atomic snapshot will indeed be atomic.

Syncing may not happen after we've done the atomic copy, but since it is
already done when freezing processes, there shouldn't be any dirty data
to sync anyway... except for the syslog data from our printks.

The LRU pages can't change, but this shouldn't be a problem because all
userspace threads and most of kernel space, including kswapd, kjournald
and so on is stopped. The only guys who need to worry about this are the
MM guys, and so long as the scanning continues to run via a process, the
buddy allocator or a timer (interrupts aren't an option, are they?),
that activity will be paused during suspend without them having to add a
single line of code.

If bio page I/O was changed to interact with the LRU, we might be in
trouble.

Slab _can_ be allocated while we're saving the LRU, but the allocations
need to come from pages that we know will be included in the atomic
copy. This happens transparently (page allocator), so other kernel
hackers don't need to worry about any of these issues. If we use the
memory pool idea, everything else that needs to run can run just like
normal, without any suspend specific changes. (You might be being
confused here by those printks in the slab get-a-new-page code; I guess
I forgot to write at the top of that patch that the printks are only
there while I'm seeking to determine whether suspend is the cause of
some occasional slab corruption I've been seeing. I'm trying to
determine whether the pages I see the oops at are ones allocated while
writing the image, or not. Unfortunately, it happens so infrequently
that I am taking a long time to see.

DMA only needs to be stopped when the drivers are quiesced, which is not
a suspend2 specific requirement.

The NMI watchdog turned out not to be a problem at all.

In short, there are no rules that "all kernel hackers" will need to be
careful not to break. The main thing constraint added is that we need to
be able to stop all changes to the LRU.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:29:50

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Fri, 2004-11-26 at 15:32, [email protected] wrote:
> > For swsusp2, you need drivers to stop the DMA, NMI not interfering,
> > sync may not happen after you have saved LRU, memory may not be
> > alocated from slab after you have saved LRU. (something else? This
> > needs to be written down somewhere, and all kernel hackers will need
> > to be carefull not to break these rules. Do you see why it wories me?)
> Ok, I got it. I think making LRU safe must sure
> 1: LRU can't change after saved.
> 2: LRU memory can't change after saved.
> The first one is done, the second we can't sure in current design, can
> we using COW do it?

2 is simple: LRU doesn't change because everything that would change it
is frozen, and the memory pool hooks ensure that scanning of the list
doesn't happen while suspending either.

I don't see the point to saving LRU pages separately when you're still
eating all the memory you can. You'll have the same number of pages to
save, just fewer to copy (and copying takes far less time than saving).

> Pagecaches still in, but disable by default, active using sysctl,
> I'd like not merge it right now, Hope other chagnes can merge into. :)

Pavel's going to think you are trying to turn swsusp into suspend2!!

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:37:28

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

Hi.

On Fri, 2004-11-26 at 05:39, Pavel Machek wrote:
> Hi!
>
> > +#include "../../../kernel/power/suspend.h"
>
> Ouch.
>
> > +#define loaddebug(thread,register) > + __asm__("movl %0,%%db" #register > + : /* no output */ > + :"r" ((thread)->debugreg[register]))
>
> This should be already defined somewhere...

Will look for it.

> > + * Note that the context and timing of this function is pretty critical.
> > + * With a minimal amount of things going on in the caller and in here, gcc
> > + * does a good job of being just a dumb compiler. Watch the assembly output
> > + * if anything changes, though, and make sure everything is going in the right
> > + * place.
>
> You should include assembly source (unless you can test all the compilers...). Feel free
> to include C version, too, but #ifdef it out.

I'm thinking I should actually be removing the comment. The C is simple,
clear, fast and easy to maintain and we haven't actually had any
problems at all with compilers. All my tweaking in here has turned out
to be irrelevant to the real cause of problems (I recently found a bug
where work queues were wrongly inheriting freezer flags; since fixing
that, all the symptoms in this area have gone away).

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 19:55:08

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi!

> 1) Make name_to_dev_t non init. Why should you need to reboot if all you
> want to do is change the device you're using to suspend? That's M_'s way

Well, if you change it using /proc and forget to change kernel cmd line, you'll have
a problem. Do you really change this so often?

And if you really want to make it changeable, pass major:minor from userland; once
userland is running getting them is easy.

> 2) Hooks for resuming. Suspend2 functionality can be compiled as modules
> or built in. Resuming can be activated via an initrd. These hooks allow
> for all of the combinations of the above. Allowing resuming from within
> an initrd is important because then you can set up LVM volumes
> (including encrypted devices), compile drivers for your resume device as
> modules and so on.

Hmm , this will need a lot of testing and a lot of care... You for example
mah not write to your fs's before activating it. And if you use this feature,
kernel no longer has chance to kill suspend signature on normal boot,
making "shoot(self, foot)" easier.

But for encrypted stuff it is probably only way to go, so... Just
make sure people are not using it unless they *have* to.
Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-26 21:42:32

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi.

On Fri, 2004-11-26 at 10:25, Pavel Machek wrote:
> Hi!
>
> > > > > > Included in this patch is a new try_to_freeze() macro Andrew M suggested
> > > > > > a while back. The refrigerator declarations are put in sched.h to save
> > > > > > extra includes of suspend.h.
> > > > >
> > > > > try_to_freeze looks nice. Could we get it in after 2.6.10 opens?
> > > >
> > > > I'm hoping to get the whole thing in mm once all these replies are dealt
> > > > with. Does that sound unrealistic?
> > >
> > > Yes, a little ;-).
> >
> > I'm not talking about talking about problems and then doing nothing :>
> > I'm writing a list of changes as I look at each of these responses.
> > Assuming they're all addressed (or not changed for good reasons), and
> > the code is actually useful, why shouldn't it go into mm?
>
> It has chance to go into mm, but I do not think all 51 patches will go
> at once. And I expect few more rounds of patches / comments. (And then
> some patch / "it is too big" flamewar, too :-).

Didn't see any flamewar over the size of Reiser4. :>

> > > Silently doing nothing when user asked for sync is not nice,
> > > either. BUG() is better solution than that.
> >
> > I don't think we should BUG because the user presses Sys-Rq S while
> > suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
> > when suspending. Sound reasonable?
>
> Yes, that's better. ... only that it means just another hook somewhere
> :-(.

:<. But we're only talking two or three lines. Let's keep it in
perspective.
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:46:45

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi.

On Fri, 2004-11-26 at 10:26, Pavel Machek wrote:
> > > > I thought I wrote - perhaps I'm wrong here - that I understand that your
> > > > new work in this area might make this unnecessary. I really only want to
> > > > do it this way because I don't know what other drivers might be doing
> > > > while we're writing the LRU pages. I'm not worried about them touching
> > > > LRU. What I am worried about is them allocating memory and starving
> > > > suspend so that we get hangs due to being oom. If they're suspended, we
> > > > have more certainty as to how memory is being used. I don't remember
> > > > what prompted me to do this in the first place, but I'm pretty sure it
> > > > would have been a real observed issue.
> > >
> > > Uh... It seems like quite a lot of work. Would not reserving few more
> > > pages help here? Or perhaps right solution is to fix "broken" drivers
> > > that need too much memory...
> >
> > I'd agree, except that I don't know how many to allocate. It makes
> > getting a reliable suspend the result of guess work and favourable
> > circumstances. Fixing 'broken' drivers by really suspending them seems
> > to me to be the right solution. Make their memory requirements perfectly
> > predictable.
>
> Except for the few drivers that are between suspend device and
> root. So you still have the same problem, and still need to
> guess. Plus you get complex changes to driver model.

I think you're overstating your case. All we're talking about doing is
quiescing the same drivers that would be quiesced later, in the same
way, but earlier in the process. Apart from the code I already have in
that patch, nothing else is needed. The changes aren't that complex,
either.
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:41:51

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Fri, 2004-11-26 at 10:22, Pavel Machek wrote:
> I'd prefer not to get plugins and abstract storage. I'm not sure about
> extents, but as soon as I can get rid of order-8 allocations, things
> should be ok.

Don't you need more than that? Unless things have changed, you still
spend most of your time eating memory so that you only have a few megs
(well, maybe a little more) to write to disk. If you do reduce the
amount of memory you eat, then you need to work to make the I/O faster.

> > > [Okay, at this point I'll understand when you'll put my picture as a
> > > texture to some doom3 monster and shoot me thousand times... Lot of
> > > work went into suspend2, but in the meantime lot of work went into
> > > swsusp1, too...]
> >
> > Not at all. Perhaps I'm overstating the case or not spending enough time
> > looking at your code, but I don't actually think swsusp has changed a
> > lot in the two years since I started working on this. (Want my picture
> > now? :>)
>
> Well, it was rewriten by Patrick so it actually looks okay, and it
> started to work for users...

:> Okay.

> > > > - Speed: All I/O is asynchronous where possible and readahead used where
> > > > not. Routines everywhere optimised to get things done as fast as poss.
> > > > (Think low battery).
> > >
> > > I fixed O(n^2) behaviour in swsusp1 (not yet in). I do not think that
> > > asynchronous I/O is does that much difference.
> >
> > Oh, it makes a huge difference once you're not eating all the memory you
> > can. If I submit I/O one at a time, I do 1 or 2 MB/s. With asynchrounous
> > I/O, I can write 70MB/s and read 110MB/s with compression, 58|58 without
> > compression (that's the maximum throughput of the drive I'm using at the
> > moment). If I can streamline things a further, I should be able to lift
> > that write rate further, too.
>
> Okay, 58MB/sec is better than 1MB/sec. I do not think I want the
> complexity neccessary to get me 70MB/sec.

Fair enough for you, but not everyone can say that. At the end of the
day, I'm not writing this code just for me to use, though. Many of the
features I've added have been added for the benefit of other people; I
assume you'd do the same. Most laptops can't do 58MB/s, so the
difference is much bigger. (My original laptop hard drive did 17/s; with
compression I think it achieved something close to double that,
depending on the data being compressed, of course).

> In some ways, suspend2 is two years ahead of rest of kernel:
> * you have interactive debugging
> * file compression
> * nice splash screen
> * plugin interface for transparent network support
>
> Unfortunately, we do not want compression done like that. It would
> make sense to do compressed-LVM or something like that (that way
> everyone would get the benefit), but it does not make sense to have it
> just for suspend2. And we do not want the rest of features, too,
> unless they work for the rest of kernel.

The cryptoapi provides support for both compression and encryption. I'd
happily make use of that, but we still need a way for the user to choose
what compression/encryption they want and configure it. I'm not at all
adverse to the idea of shifting the lzf compression support into being a
cryptoapi plugin. That shouldn't be hard to do precisely because I have
the plugin system :>.

> > > > - Test bed: Around 10,000 downloads of the 1.0 patch, 2730 to date of
> > > > the 2.1.5 version I released 2 weeks ago.
> > >
> > > Hmm, look at number of downloads of 2.6.9 kernel, I think I win here
> > > ;-)))). SuSE9.2 is actually shipping swsusp1 and advertising it as a
> > > feature. And it seems to work for people...
> >
> > :> But not everyone who uses 2.6.9 uses swsusp. :>
>
> But they should ;-).

I'll beg to differ there :>

> > > > - Swap file support
> > > > - Support for LVM/dm-crypt and siblings
> > > > - Support for having device drivers as modules (resume from an
> > > > initrd/initramfs)
> > >
> > > Okay, you win these.
> >
> > I don't want to have a competition, really. I just want to convince you
> > that I've done some worthwhile work :>
>
> You did wonderfull work -- you shown what is possible with
> suspend2. Now we just need to scale it back to what is practical. It
> needs not only to work, it also needs to be nice, simple, and easy to
> maintain.

I think it is practical. Apart from the bootsplash support, I don't
think I have added any feature because I thought "Hey, this looks like a
fun thing to try.". Every feature has been added because it makes
suspend faster, more reliable, more user friendly, more versatile or the
like. If we want Linux to get adopted by desktop users, it needs to have
these features. Making it harder to use by forcing people to reboot to
change a parameter or forcing them to do an ls in /dev with obscure
parameters (to get the major and minor numbers) when they already know
they want /dev/sda1 isn't user friendly. Obviously user friendliness is
more important to me than to you. That's fine, but let's agree to differ
and let the software be more helpful rather than less.

> > > > - Designed to save as much of memory as possible rather than as little
> > > > (making the system more responsive post-resume).
> > >
> > > hugang already has a patch, but I'm not 100% sure if I want it
> > > in. Yes, people seem to like this feature, but it complicates
> > > *design*, quite a lot.
> >
> > It does. But if there were fundamental flaws in the approach, we would
> > have found them by now. Since you're using bio calls and not swap's own
> > read/write functions, you shouldn't have any problems.
>
> I believe it has at least one pretty bad flaw: it has hooks all over
> the place and will be nightmare to maintain. Puting suspend hooks into
> memory allocation is not nice.

That's a big statement.

"Hooks all over the place" was a phrase first used to refer to the
attempts at making freezing more reliable. That's irrelevant now with
the simplified three stages to freezing. The hooks are the same as for
swsusp1 there.

The hooks you've seen in the rest of the kernel are generally only
further supplements to the freezing, so that swsusp should probably have
them too.

The hooks in the memory management are minimal and wrapped in
unlikely(), so they shouldn't really be a problem in the normal flow of
things. While suspend is running, they serve a good and necessary
purpose. Using high level routines, we can't guarantee that new slab
pages, for example, aren't allocated while we're writing the LRU pages.
If you can't be sure of that and don't want to satisfy allocations from
a memory pool, you'll need to recheck that your saving all the pages
after writing LRU. And if you do that, you might need to allocate more
memory for the metadata, which might mean freeing an LRU page or two,
which will make your image inconsistent. A memory pool is a simple and
effective solution to that issue: we know exactly what pages may be
allocated when we do our atomic copy, and we copy them without having to
figure out which ones were actually used.

> swsusp1 is pretty self-contained. As long as drivers stop the DMA and
> NMI does nothing wrong, atomic snapshot will indeed be atomic.
>
> Can you list conditions neccessary for suspend2 to work?

Not really sure what you mean. At the moment, the main hinderance to it
working properly is driver model support (USB, DRI, as said previously).
That forces us to have a userspace script to compile as modules and
stop/unload support around kernel call. Given that these things are
done, and that suspend is able to get enough memory to do it's work
(almost never a problem), suspend should always work.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:50:36

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 4/51: Get module list.

Hi!

> This provides access to the list of loaded modules for suspend's
> debugging output. When a cycle finishes, suspend outputs something the
> following:
>
> > Please include the following information in bug reports:
> > - SUSPEND core : 2.1.5.7
> > - Kernel Version : 2.6.9
> > - Compiler vers. : 3.3
> > - Modules loaded : tuner bttv videodev snd_seq_oss snd_seq_midi_event
> > snd_seq snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm
> > snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device
> > snd soundcore visor usbserial usblp joydev evdev usbmouse usbhid
> > uhci_hcd usbcore ppp_deflate zlib_deflate zlib_inflate bsd_comp
> > ipt_LOG ipt_state ipt_MASQUERADE iptable_nat ip_conntrack
> > ipt_multiport ipt_REJECT iptable_filter ip_tables ppp_async
> > ppp_generic slhc crc_ccitt video_buf v4l2_common btcx_risc Win4Lin
> > mki_adapter radeon agpgart parport_pc lp parport sg ide_cd sr_mod
> > cdrom floppy af_packet e1000 loop dm_mod tsdev suspend_bootsplash
> > suspend_text suspend_swap suspend_block_io suspend_lzf suspend_core
> > - Attempt number : 9
> > - Parameters : 0 2304 32768 1 0 4096 5
> > - Limits : 261680 pages RAM. Initial boot: 252677.
> > - Overall expected compression percentage: 0.
> > - LZF Compressor enabled.
> > Compressed 922112000 bytes into 437892038 (52 percent compression).
> > - Swapwriter active.
> > Swap available for image: 294868 pages.
> > - Debugging compiled in.
> > - Preemptive kernel.
> > - SMP kernel.
> > - Highmem Support.
> > - I/O speed: Write 72 MB/s, Read 119 MB/s.
>
> Including the modules loaded is very helpful for debugging problems.

It might be usefull as an add-on patch when people are actually debugging it,
but I do not think it is needed for mainline. You can just do lsmod before suspend...
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-26 21:56:20

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi.

On Fri, 2004-11-26 at 10:56, Pavel Machek wrote:
> Hi!
>
> > A plugin for verifying the consistency of an image. Working with kdb, it
> > can look up the locations of variations. There will always be some
> > variations shown, simply because we're touching memory before we get
> > here and as we check the image.
>
> Debugging code, can live as external patch pretty well.

Doesn't most of the kernel have debugging code in it? Maybe not as much,
but most of the kernel isn't doing the same thing. Remember we have the
option of compiling it out. If it lives as a separate patch, we're just
making more work for me (I have to maintain the debug version and then
make some transformation on it to get the mainline version).

By the way, I'm really appreciating your interaction over all these
points. I was getting worried that I wasn't getting enough. I should say
now, too, that I'm away all weekend, so you won't get replies tomorrow
and the day after.

Thanks and regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:59:51

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 43/51: Utility functions.

Hi.

On Fri, 2004-11-26 at 10:46, Pavel Machek wrote:
> Hi!
>
> > These are the routines that I think could possibly be useful elsewhere
> > too.
> >
> > - A snprintf routine that returns the number of bytes actually put into
> > the buffer, not the number that would have been put in if the buffer was
> > big enough.
> > - Routine for finding a proc dir entry (we use it to find /proc/splash
> > when)
> > - Support routines for dynamically allocated pageflags. Save those
> > precious bits!
>
> How many bits do you need? Two? I'd rather use thow two bits than have
> yet another abstraction. Also note that it is doing big order
> allocation.

Three if checksumming is enabled IIRC. I'll happily use normal page
flags, but we only need them when suspending, and I understood they were
rarer than hen's teeth :>

MM guys copied so they can tell me I'm wrong :>

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:56:26

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi.

On Fri, 2004-11-26 at 08:58, Pavel Machek wrote:
> Hi!
>
> > > > > And if you really want to make it changeable, pass major:minor from userland; once
> > > > > userland is running getting them is easy.
> > > >
> > > > Yes, but that's also far uglier, and who thinks in terms of major and
> > > > minor numbers anyway? I think of my harddrive as /dev/sda, not 08:xx.
> > > > The parsing accepts majors and minors, of course, but shouldn't we make
> > > > these things easier to do, not harder? (Would we insist on using majors
> > > > and minors for root=?).
> > >
> > > Kernel interface is not supposed to be "easy". root= has exception,
> > > that's init code, and you can't easily ls -al /dev at that point. If
> > > you want easy interface, create userland program that looks up
> > > minor/major in /dev/ and uses them.
> >
> > That's a fair possibility, but is it really worth it when all we need to
> > do is make two routines not be init? We would still have to duplicate
> > some of this code elsewhere anyway, because we need to parse the major
> > and minor numbers.
>
> Parsing major/minor should be as simple as sscanf("%d %d"). And you'll
> have one less modification to generic code. Yes I think it is worth
> it.

In that case, we shouldn't access names at boot time either; the
interface should be consistent, shouldn't it? I really would prefer to
keep things as they are; is it worth all this fuss?

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:56:25

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > I'd prefer not to get plugins and abstract storage. I'm not sure about
> > extents, but as soon as I can get rid of order-8 allocations, things
> > should be ok.
>
> Don't you need more than that? Unless things have changed, you still
> spend most of your time eating memory so that you only have a few megs
> (well, maybe a little more) to write to disk. If you do reduce the
> amount of memory you eat, then you need to work to make the I/O
> faster.

I'm not *that* concerned about speed. Getting rid of order-8 is
for preventing "sorry, not enough RAM to suspend to disk".

> > Okay, 58MB/sec is better than 1MB/sec. I do not think I want the
> > complexity neccessary to get me 70MB/sec.
>
> Fair enough for you, but not everyone can say that. At the end of the
> day, I'm not writing this code just for me to use, though. Many of the
> features I've added have been added for the benefit of other people; I
> assume you'd do the same. Most laptops can't do 58MB/s, so the
> difference is much bigger. (My original laptop hard drive did 17/s; with
> compression I think it achieved something close to double that,
> depending on the data being compressed, of course).

I do have too fast machines around me. But notice that compression
only does factor-2 speedup. If we wanted to make whole kernel uglier,
we could probably achieve factor-2 speedup for any benchmark... just
it would be bad idea.


> > In some ways, suspend2 is two years ahead of rest of kernel:
> > * you have interactive debugging
> > * file compression
> > * nice splash screen
> > * plugin interface for transparent network support
> >
> > Unfortunately, we do not want compression done like that. It would
> > make sense to do compressed-LVM or something like that (that way
> > everyone would get the benefit), but it does not make sense to have it
> > just for suspend2. And we do not want the rest of features, too,
> > unless they work for the rest of kernel.
>
> The cryptoapi provides support for both compression and encryption. I'd
> happily make use of that, but we still need a way for the user to choose
> what compression/encryption they want and configure it. I'm not at all
> adverse to the idea of shifting the lzf compression support into being a
> cryptoapi plugin. That shouldn't be hard to do precisely because I have
> the plugin system :>.

Actually I'd like to see lzf done at LVM level; that way it is usefull
for people not doing suspend, too, and we should not need plugin
infrastructure in suspend2 (LVM provides us with that service).

> > You did wonderfull work -- you shown what is possible with
> > suspend2. Now we just need to scale it back to what is practical. It
> > needs not only to work, it also needs to be nice, simple, and easy to
> > maintain.
>
> I think it is practical. Apart from the bootsplash support, I don't
> think I have added any feature because I thought "Hey, this looks like a
> fun thing to try.". Every feature has been added because it makes
> suspend faster, more reliable, more user friendly, more versatile or the
> like. If we want Linux to get adopted by desktop users, it needs to

I believe you need to say "no" way more often. One user is not enough
to justify feature in mainline kernel, and any number of users should
not be enough to make GZIP compression supported by suspend2.

> have
> these features. Making it harder to use by forcing people to reboot to
> change a parameter or forcing them to do an ls in /dev with obscure
> parameters (to get the major and minor numbers) when they already know
> they want /dev/sda1 isn't user friendly. Obviously user friendliness is
> more important to me than to you. That's fine, but let's agree to differ
> and let the software be more helpful rather than less.

Yes, I care about linux being developer-friendly. If it is not
user-friendly, distributions will solve it. If it is not
developer-friendly, it is dead.

> > > It does. But if there were fundamental flaws in the approach, we would
> > > have found them by now. Since you're using bio calls and not swap's own
> > > read/write functions, you shouldn't have any problems.
> >
> > I believe it has at least one pretty bad flaw: it has hooks all over
> > the place and will be nightmare to maintain. Puting suspend hooks into
> > memory allocation is not nice.
>
> That's a big statement.
>
> "Hooks all over the place" was a phrase first used to refer to the
> attempts at making freezing more reliable. That's irrelevant now with
> the simplified three stages to freezing. The hooks are the same as for
> swsusp1 there.
>
> The hooks you've seen in the rest of the kernel are generally only
> further supplements to the freezing, so that swsusp should probably have
> them too.
>
> The hooks in the memory management are minimal and wrapped in
> unlikely(), so they shouldn't really be a problem in the normal flow
> of

Yes, they are unlikely(); but still they are hooks into memory
managment. They are at least ugly as hell. And no swsusp1 does not
this particular set of hooks, and does not need to patch sysrq-S.

> things. While suspend is running, they serve a good and necessary
> purpose. Using high level routines, we can't guarantee that new slab

They are neccessary because of two-stages LRU saving... I'm trying to
argue "two-stages LRU saving is wrong"...

> > swsusp1 is pretty self-contained. As long as drivers stop the DMA and
> > NMI does nothing wrong, atomic snapshot will indeed be atomic.
> >
> > Can you list conditions neccessary for suspend2 to work?
>
> Not really sure what you mean. At the moment, the main hinderance to it
> working properly is driver model support (USB, DRI, as said previously).
> That forces us to have a userspace script to compile as modules and
> stop/unload support around kernel call. Given that these things are
> done, and that suspend is able to get enough memory to do it's work
> (almost never a problem), suspend should always work.

No, assume driver problems are solved.

For swsusp2, you need drivers to stop the DMA, NMI not interfering,
sync may not happen after you have saved LRU, memory may not be
alocated from slab after you have saved LRU. (something else? This
needs to be written down somewhere, and all kernel hackers will need
to be carefull not to break these rules. Do you see why it wories me?)

swsusp1 is more self-contained. As long as drivers stop the DMA and
NMI does nothing wrong, atomic snapshot will indeed be atomic.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 22:10:07

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 16/51: Disable cache reaping during suspend.

Hi.

On Fri, 2004-11-26 at 05:18, Pavel Machek wrote:
> Hi!
> > I have to admit to being a little unsure as to why this is needed, but
> > suspend's reliability is helped a lot by disabling cache reaping while
> > suspending. Perhaps one of the mm guys will be able to enlighten me
> > here. Might be SMP related.
>
> It would be good to understand it. Rather than slowing common code... why
> not down(&cache_chain_sem) in suspend2?

Didn't consider it, to be honest.

That said, if/when we start to use cpu hotplug for SMP, we'll deadlock
in cpuup_callback if we've got the sem.

> > {
> > struct list_head *walk;
> >
> > - if (down_trylock(&cache_chain_sem)) {
> > + if ((unlikely(test_suspend_state(SUSPEND_RUNNING))) ||
> > + (down_trylock(&cache_chain_sem)))
> > + {
> > /* Give up. Setup the next iteration. */
> > schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC + smp_processor_id());
> > return;
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 22:25:01

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 50/51: Device mapper support.

Hi.

On Fri, 2004-11-26 at 10:58, Pavel Machek wrote:
> Hi!
>
> > This is the device mapper support plugin. Its sole purpose is to ensure
> > that the device mapper allocates enough memory to process all of the I/O
> > we want to throw at it.
>
> This needs to go through dm people....

Yes. I'll look for contact details.

> > +static struct suspend_proc_data disable_dm_support_proc_data = {
> > + .filename = "disable_device_mapper_support",
> > + .permissions = PROC_RW,
> > + .type = SUSPEND_PROC_DATA_INTEGER,
> > + .data = {
> > + .integer = {
> > + .variable = &suspend_dm_ops.disabled,
> > + .minimum = 0,
> > + .maximum = 1,
> > + }
> > + }
> > +};
>
> What is this good for? Debugging switch?

Nod. If built as modules, you can of course just rmmod.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:56:23

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Hi.

On Fri, 2004-11-26 at 10:32, Pavel Machek wrote:
> You have your own abstraction on the top of /proc? That's no-no.

You'd prefer the same code repeated 20 times?

> ...aha, you do that to enable plugin system. Take it as another reason
> why plugins have to go.

No, it's just useful. The /proc abstraction predated the plugins. What I
really want to do is switch to kobjects for the plugins. But that can
wait a little longer.

> And your own keyboard driver :-(.

It's not really a keyboard driver. We're just making serial console and
local keypresses into the same key codes and letting them be handled by
the relevant plugin.

> > + say("BIG FAT WARNING!! %s\n\n", suspend_print_buf);
> > + if (can_erase_image) {
> > + say("If you want to use the current suspend image, reboot and try\n");
> > + say("again with the same kernel that you suspended from. If you want\n");
> > + say("to forget that image, continue and the image will be erased.\n");
> > + } else {
> > + say("If you continue booting, note that any image WILL NOT BE REMOVED.\n");
> > + say("Suspend is unable to do so because the appropriate modules aren't\n");
> > + say("loaded. You should manually remove the image to avoid any\n");
> > + say("possibility of corrupting your filesystem(s) later.\n");
> > + }
> > + say("Press SPACE to reboot or C to continue booting with this kernel\n");
>
> Plus kernel now actually expects user interaction to solve problems
> during boot. No, no.

You want your cake and to eat it too? :> We don't want to warn the user
before they shoot themselves in the foot, but not loudly enough that
they can't help notice and choose to do something before the damage is
done?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 22:36:28

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 48/51: Swapwriter

Hi.

On Fri, 2004-11-26 at 10:55, Pavel Machek wrote:
> Hi!
>
> > This is the swapwriter. It is forms the glue between the highlevel I/O
> > routines in io.c and the blockwriter routines in block_io.c. It is
> > responsible for allocating storage, translating the requests for pages
> > within pagesets into devices and blocks and the like. It is abstracted
> > from the block writer because the plan is that we'll eventually have a
> > generic file writer (ie not using swapspace, but a simple file,
> > possibly
>
> This file alone is bigger than whole swsusp1. That strongly suggests
> you have too many layers of abstraction in there. Planning for future
> is nice, but not at this cost.

Not necessarily too many layers of abstraction. It includes swapfile
support and there's a bit in there for readahead (plus the debugging you
mentioned).

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 21:56:22

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > > Don't you need more than that? Unless things have changed, you still
> > > spend most of your time eating memory so that you only have a few megs
> > > (well, maybe a little more) to write to disk. If you do reduce the
> > > amount of memory you eat, then you need to work to make the I/O
> > > faster.
> >
> > I'm not *that* concerned about speed. Getting rid of order-8 is
> > for preventing "sorry, not enough RAM to suspend to disk".
>
> Priority wise, I agree. But given that the order 8 issue is dealt with,
> speed is important. Particularly when your power just went out and your
> UPS battery is running down.

....

> Again, when you're running on limited time, twice as fast is still twice
> as fast.

My machine suspends in 7 seconds, and that's swsusp1. According to
your numbers, suspend2 should suspend it in 1 second and LZE
compressed should be .5 second.

I'd say "who cares". 7 seconds seems like fast enough for me. And I'm
*not* going to add 2000 lines of code for 500msec speedup during
suspend.

> > Actually I'd like to see lzf done at LVM level; that way it is usefull
> > for people not doing suspend, too, and we should not need plugin
> > infrastructure in suspend2 (LVM provides us with that service).
>
> That ignores that the vast majority of people don't use LVM at the
> moment. Perhaps you could argue that they should. The other thing is,
> I'm trying not to make assumptions about how we're writing the image,
> either. If you want to pipe your image over a network to some server,
> you should be able to, and not have to implement compression again in
> the writer for that.

Suspend-over-network is obscure-enough
feature. Compressed-suspend-over-network is even worse.

BTW my feeling is that if you want to do suspend-over-network, you
should just modify nbd to work with suspend2 and stop adding
special-purpose code to suspend.

> > I believe you need to say "no" way more often. One user is not enough
> > to justify feature in mainline kernel, and any number of users should
> > not be enough to make GZIP compression supported by suspend2.
>
> Okay. Let's say I drop GZIP. I've just asked on the suspend list for
> good reasons not to do it. I'll be surprised if I get any :> (And I'll
> ask for proof that they get a higher throughput with GZIP then with
> LZF!).
>
> I still think the plugin system is useful. It made adding LZF
> compression and DM support really easy, and also means work can be done
> on a generic file writer without needing to pull out all of the
> swapwriter code. It also made making suspend modular far easier, which
> in turn means you don't have to have the memory in use all the time,
> when you only really want it the functionality ready to go when you want
> to power down.


> > Yes, they are unlikely(); but still they are hooks into memory
> > managment. They are at least ugly as hell. And no swsusp1 does not
> > this particular set of hooks, and does not need to patch sysrq-S.
>
> How is ugly defined here? Can you give me an example that does the same
> thing, but which you consider less ugly?
>
> if (unlikely(test_suspend_state(SUSPEND_USE_MEMORY_POOL))) {
> suspend2_free_pool_pages(page, order);
> return;
> }
>
> I nearly launched into a flame war, but I'll try to be more gracious
> than that.

Memory managment should have no knowledge of suspend2... Imagine every
subsystem sprinkling hooks such as this one...

We'd have

if (unlikely(scsi_is_recovering_from_problem()))
scsi_free_pool_pages();
else if (usb_is_unhealthy())
usb_recover()
else if (unlikely(test_suspend_state(SUSPEND_USE_MEMORY_POOL))) {
suspend2_free_pool_pages(page, order);
return;
}

and every one hacking memory managment would know about scsi, usb, and
suspend2. That's not reasonable way to go.

> > > things. While suspend is running, they serve a good and necessary
> > > purpose. Using high level routines, we can't guarantee that new slab
> >
> > They are neccessary because of two-stages LRU saving... I'm trying to
> > argue "two-stages LRU saving is wrong"...
>
> I know you are. What I'm not sure about is whether you believe that the
> user should never have the option of saving a full image of their
> memory, or whether you think there's a better way to do it.

If LRU saving can be done in 300 lines of code with no impact on
generic code... that's okay. In the current form it is way too complex
to merge.

> > For swsusp2, you need drivers to stop the DMA, NMI not interfering,
> > sync may not happen after you have saved LRU, memory may not be
> > alocated from slab after you have saved LRU. (something else? This
> > needs to be written down somewhere, and all kernel hackers will need
> > to be carefull not to break these rules. Do you see why it wories me?)
> >
> > swsusp1 is more self-contained. As long as drivers stop the DMA and
> > NMI does nothing wrong, atomic snapshot will indeed be atomic.
>
> Syncing may not happen after we've done the atomic copy, but since it is
> already done when freezing processes, there shouldn't be any dirty data
> to sync anyway... except for the syslog data from our printks.

??? syslogd is stopped, it can't write anything.

> The LRU pages can't change, but this shouldn't be a problem because all
> userspace threads and most of kernel space, including kswapd, kjournald
> and so on is stopped. The only guys who need to worry about this are the
> MM guys, and so long as the scanning continues to run via a process, the
> buddy allocator or a timer (interrupts aren't an option, are they?),
> that activity will be paused during suspend without them having to add a
> single line of code.

Actually swsusp1 does not need to stop timers, so you have few lines
of code added.

> If bio page I/O was changed to interact with the LRU, we might be in
> trouble.
>
> Slab _can_ be allocated while we're saving the LRU, but the allocations
> need to come from pages that we know will be included in the atomic
> copy. This happens transparently (page allocator), so other kernel
> hackers don't need to worry about any of these issues. If we use the
> memory pool idea, everything else that needs to run can run just like
> normal, without any suspend specific changes. (You might be being

Why do you need to allocate from special pool? After LRU is saved, you
should write all used kernel pages. Slab are kernel pages, so I do not
see why you need to modify it.

> In short, there are no rules that "all kernel hackers" will need to be
> careful not to break. The main thing constraint added is that we need to
> be able to stop all changes to the LRU.

Ok, so the "all kernel hackers" rule is "do not change LRU while
suspend2 is going on".
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 22:44:10

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > > Again, when you're running on limited time, twice as fast is still twice
> > > as fast.
> >
> > My machine suspends in 7 seconds, and that's swsusp1. According to
> > your numbers, suspend2 should suspend it in 1 second and LZE
> > compressed should be .5 second.
> >
> > I'd say "who cares". 7 seconds seems like fast enough for me. And I'm
> > *not* going to add 2000 lines of code for 500msec speedup during
> > suspend.
>
> Yupp. Premature optimization is the roo of all evil. swsusp is
>
> a) an absolute slowpath compared to any normal kernel operation,
> and called extremly seldomly
> b) only usefull for a small subset of all linux instances
>
> hacking core code (fastpathes) for speedups there is a really bad idea.
> If you can speed it up without beeing intrusive all power to you.

I have to agree here. Swsusp is not really performance critical,
almost every other part of kernel is more important.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 19:50:26

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> I don't see the point to saving LRU pages separately when you're still
> eating all the memory you can. You'll have the same number of pages to
> save, just fewer to copy (and copying takes far less time than saving).
>
> > Pagecaches still in, but disable by default, active using sysctl,
> > I'd like not merge it right now, Hope other chagnes can merge into. :)
>
> Pavel's going to think you are trying to turn swsusp into suspend2!!

Pavel knows that already, but at least hugang is producing small
patches ;-).
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 22:48:06

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > For swsusp2, you need drivers to stop the DMA, NMI not interfering,
> > sync may not happen after you have saved LRU, memory may not be
> > alocated from slab after you have saved LRU. (something else? This
> > needs to be written down somewhere, and all kernel hackers will need
> > to be carefull not to break these rules. Do you see why it wories me?)
> >
> > swsusp1 is more self-contained. As long as drivers stop the DMA and
> > NMI does nothing wrong, atomic snapshot will indeed be atomic.
>
> Here is a grabed memory allocate patch from suspend2, useful for shrink memory
> in high memory using system.

Sorry, I do not understand. What problem is this solving?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 22:56:01

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi!

> > > This patch allows the device tree to be split up into multiple trees. I
> > > don't really expect it to be merged, but it is an important part of
> > > suspend at the moment, and I certainly want to see something like it
> > > that will allow us to suspend some parts of the device tree and not
> > > others. Suspend2 uses it to keep alive the hard drive (or equivalent)
> > > that we're writing the image to while suspending other devices, thus
> > > improving the consistency of the image written.
> > >
> > > I remember from last time this was posted that someone commented on
> > > exporting the default device tree; I haven't changed that yet.
> >
> > Q: I do not understand why you have such strong objections to idea of
> > selective suspend.
> >
> > A: Do selective suspend during runtime power managment, that's
> > okay. But
> > its useless for suspend-to-disk. (And I do not see how you could use
> > it for suspend-to-ram, I hope you do not want that).
> >
> > Lets see, so you suggest to
> >
> > * SUSPEND all but swap device and parents
> > * Snapshot
> > * Write image to disk
> > * SUSPEND swap device and parents
> > * Powerdown
> >
> > Oh no, that does not work, if swap device or its parents uses DMA,
> > you've corrupted data. You'd have to do
> >
> > * SUSPEND all but swap device and parents
> > * FREEZE swap device and parents
> > * Snapshot
> > * UNFREEZE swap device and parents
> > * Write
> > * SUSPEND swap device and parents
> >
> > Which means that you still need that FREEZE state, and you get more
> > complicated code. (And I have not yet introduce details like system
> > devices).
>
> There's obviously a misunderstanding here. What I do is:

Ok, sorry, this was from another flamewar ;).

> SUSPEND all but swap device and parents
> WRITE LRU pages
> SUSPEND swap device and parents (+sysdev)
> Snapshot
> RESUME swap device and parents (+sysdev)
> WRITE snapshot
> SUSPEND swap device and parents
> POWERDOWN everything
>
> I thought I wrote - perhaps I'm wrong here - that I understand that your
> new work in this area might make this unnecessary. I really only want to
> do it this way because I don't know what other drivers might be doing
> while we're writing the LRU pages. I'm not worried about them touching
> LRU. What I am worried about is them allocating memory and starving
> suspend so that we get hangs due to being oom. If they're suspended, we
> have more certainty as to how memory is being used. I don't remember
> what prompted me to do this in the first place, but I'm pretty sure it
> would have been a real observed issue.

Uh... It seems like quite a lot of work. Would not reserving few more
pages help here? Or perhaps right solution is to fix "broken" drivers
that need too much memory...

...because you loose anyway if that "broken" driver is between swap
device and root.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:03:51

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi!

> > > > And if you really want to make it changeable, pass major:minor from userland; once
> > > > userland is running getting them is easy.
> > >
> > > Yes, but that's also far uglier, and who thinks in terms of major and
> > > minor numbers anyway? I think of my harddrive as /dev/sda, not 08:xx.
> > > The parsing accepts majors and minors, of course, but shouldn't we make
> > > these things easier to do, not harder? (Would we insist on using majors
> > > and minors for root=?).
> >
> > Kernel interface is not supposed to be "easy". root= has exception,
> > that's init code, and you can't easily ls -al /dev at that point. If
> > you want easy interface, create userland program that looks up
> > minor/major in /dev/ and uses them.
>
> That's a fair possibility, but is it really worth it when all we need to
> do is make two routines not be init? We would still have to duplicate
> some of this code elsewhere anyway, because we need to parse the major
> and minor numbers.

Parsing major/minor should be as simple as sscanf("%d %d"). And you'll
have one less modification to generic code. Yes I think it is worth
it.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:11:20

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 17/51: Disable MCE checking during suspend.

Hi!

> > > > > Avoid a potential SMP deadlock here.
> > > >
> > > > ..and loose MCE report.
> > >
> > > Deadlock or get an MCE report and do a printk when we're shutting down
> > > anyway?
> >
> > If MCE happens, I'd like user to report it. Loosing it is wrong,
> > deadlocking may be actually better because at least you get the
> > report. I'd BUG().
> >
> > MCEs are hardware problem, right? They should not be common.
>
> It's not them occurring that's the problem, it's checking for them that
> involves an SMP call :<

Oops, that bad... and checking is done periodically? That's bad. Okay,
your solution is right here.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:19:23

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> Ok, I got it. I think making LRU safe must sure
> 1: LRU can't change after saved.
> 2: LRU memory can't change after saved.
> The first one is done, the second we can't sure in current design, can
> we using COW do it?

Userspace processes should be stopped at that point, and you really
can't do COW to kernel users.

> > swsusp1 is more self-contained. As long as drivers stop the DMA and
> > NMI does nothing wrong, atomic snapshot will indeed be atomic.
> Here is my current patch still relative with your bit diff, only core
> part here.
> 1: adding a collide bitmap for speedup collide check, I can't sure
> four pages is enough, pavel please check.
> 2: swith list_for_xxx style
> 3: corrent calc_nums.

Heh, can you try this after resume?

cat `cat [0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null

It should have very similar effect to saving LRU, just in one line of
code ;-).

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:23:45

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 42/51: Suspend.c

Hi!

> Here's the heart of the core :> (No, that's not a typo).
>
> - Device suspend/resume calls
> - Power down
> - Highest level routine
> - all_settings proc entry handling

Can we get rid of all the debugging? It makes it hard to see real code
between all the debugging stuff.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:27:07

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi!

> A plugin for verifying the consistency of an image. Working with kdb, it
> can look up the locations of variations. There will always be some
> variations shown, simply because we're touching memory before we get
> here and as we check the image.

Debugging code, can live as external patch pretty well.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:27:07

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 47/51: GZIP support.

Hi!

> The original compressor. Slow. I've tried to drop it, but for reasons I
> simply don't understand, some users still want it.

Okay, IGNORE THOSE @#*$&!)!& USERS!

You need to say no. 500 lines of code, when superior code is available
is bad idea. You know gzip is wrong thing. If some user wants it, it
is he maintaining the patch. Simple.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 19:47:15

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 37/51: Memory pool support.

Hi!

> This is the memory pool support. It handles all pages freed and
> allocated between the preparation of the image and the completion of
> resuming, except prior to restoring the original kernel at resume time.
> It is designed for speed and to match the fact that suspend2 just about
> exclusively uses order 0 allocations. ("Just about" is why a couple of
> order one and two allocations are also available).

You really should use generic routines. Having your own malloc of
course allows you to be slightly faster; but it also means that code
is much bigger and much uglier.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:34:49

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi.

On Fri, 2004-11-26 at 04:07, Pavel Machek wrote:
> Hi!
>
> > 1) Make name_to_dev_t non init. Why should you need to reboot if all you
> > want to do is change the device you're using to suspend? That's M_'s way
>
> Well, if you change it using /proc and forget to change kernel cmd line, you'll have
> a problem. Do you really change this so often?

It's mostly used when setting up a new installation, and when attempting
LVM support. It's particularly useful in the later case because you can
swap off and so on, prepare your new LVM volume, prepare swap on it, do
the echo > resume2 and then check dmesg, and be assured that you've got
the right setup, all without needing to reboot. In the case of a new
config, you still need to reboot once, of course, into the new kernel,
but if you get the resume2= parameter wrong, you don't have to waste
time rebooting just because of a typo. You can just fix your lilo.conf
(or equiv) and use echo > resume2 to fix the mistake.

> And if you really want to make it changeable, pass major:minor from userland; once
> userland is running getting them is easy.

Yes, but that's also far uglier, and who thinks in terms of major and
minor numbers anyway? I think of my harddrive as /dev/sda, not 08:xx.
The parsing accepts majors and minors, of course, but shouldn't we make
these things easier to do, not harder? (Would we insist on using majors
and minors for root=?).

> > 2) Hooks for resuming. Suspend2 functionality can be compiled as modules
> > or built in. Resuming can be activated via an initrd. These hooks allow
> > for all of the combinations of the above. Allowing resuming from within
> > an initrd is important because then you can set up LVM volumes
> > (including encrypted devices), compile drivers for your resume device as
> > modules and so on.
>
> Hmm , this will need a lot of testing and a lot of care... You for example
> mah not write to your fs's before activating it. And if you use this feature,
> kernel no longer has chance to kill suspend signature on normal boot,
> making "shoot(self, foot)" easier.

It has had it. We do rely on the user to make a sensible linuxrc/init,
but there are examples and warnings given in the docs on Berlios, and
plenty of checking done. We have to take the risk anyway: without it, we
can't support Debian (ide support built as modules), LVM (needs to set
up mappings in an initrd) or encrypted storage (ditto).

> But for encrypted stuff it is probably only way to go, so... Just
> make sure people are not using it unless they *have* to.

See above.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 19:47:15

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 28/51: Suspend memory pool hooks.

Hi!

> We save the image in two pages (LRU and the rest). In order to maintain
> a consistent image, we satisfy all page allocations from our own memory
> pool while saving the image and reloading the LRU. This allows us to
> safely use high level routines which might allocate slab etc and not
> free it again by the time we do our atomic copy. We simply save all of
> the pages in the pool when making our atomic copy of the non-LRU pages,
> without having to worry about exactly how they were or weren't used.

Now you know why two pagesets scare me...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:46:43

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Hi!

> +/*
> + * generic_read_proc
> + *
> + * Generic handling for reading the contents of bits, integers,
> + * unsigned longs and strings.
> + */
> +static int generic_read_proc(char * page, char ** start, off_t off, int count,
> + int *eof, void *data)
> +{
> + int len = 0;
> + struct suspend_proc_data * proc_data = (struct suspend_proc_data *) data;
> +
> + switch (proc_data->type) {
> + case SUSPEND_PROC_DATA_CUSTOM:
> + printk("Error! /proc/suspend/%s marked as having custom"
> + " routines, but the generic read routine has"
> + " been invoked.\n",
> + proc_data->filename);
> + break;
> + case SUSPEND_PROC_DATA_BIT:
> + len = sprintf(page, "%d\n",
> + -test_bit(proc_data->data.bit.bit,
> + proc_data->data.bit.bit_vector));
> + break;

You have your own abstraction on the top of /proc? That's no-no.

> +/*
> + * Non-plugin proc entries.
> + *
> + * This array contains entries that are automatically registered at
> + * boot. Plugins and the console code register their own entries separately.
> + */
> +

...aha, you do that to enable plugin system. Take it as another reason
why plugins have to go.

> +/*
> + * Basic keypress handler for suspend. This is extensible
> + * via the user interface modules.
> + */
> +
> +/* For simplicity, we convert keyboard key codes to ascii,
> + * except in the case of function keys, which are mapped
> + * to 1-12. We can then use the same case statement for
> + * serial keyboards (and from a serial keyboard, you can
> + * press Control-A..L to toggle sections.
> + */
> +static unsigned int kbd_keytable[] = {
> + 0, 27, 49, 50, 51, 52, 53, 54, 55, 56,
> + 57, 48, 0, 0, 0, 0, 0, 0, 0, 114,
> + 116, 0, 0, 0, 0, 112, 0, 0, 0, 0,
> + 0, 115, 0, 0, 0, 0, 0, 0, 108, 0,
> + 0, 122, 0, 0, 0, 0, 99, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 32, 0, 1,
> + 2, 3, 4, 5, 6, 7, 8, 9, 10, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> + 0, 0, 0, 0, 0, 0, 0, 11, 12, 0,
> +};
> +
> +/*
> + * keycode_to_action
> + *
> + * Convert a keycode (serial or keyboard) into our
> + * internal code (ascii, except for function keys).
> + */
> +static unsigned int keycode_to_action(unsigned int keycode, int source)
> +{
> + if (source == SUSPEND_KEY_SERIAL) {
> + if (keycode > 64)
> + return (keycode | 32);
> + else
> + return keycode;
> + }

And your own keyboard driver :-(.

> + say("BIG FAT WARNING!! %s\n\n", suspend_print_buf);
> + if (can_erase_image) {
> + say("If you want to use the current suspend image, reboot and try\n");
> + say("again with the same kernel that you suspended from. If you want\n");
> + say("to forget that image, continue and the image will be erased.\n");
> + } else {
> + say("If you continue booting, note that any image WILL NOT BE REMOVED.\n");
> + say("Suspend is unable to do so because the appropriate modules aren't\n");
> + say("loaded. You should manually remove the image to avoid any\n");
> + say("possibility of corrupting your filesystem(s) later.\n");
> + }
> + say("Press SPACE to reboot or C to continue booting with this kernel\n");

Plus kernel now actually expects user interaction to solve problems
during boot. No, no.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:49:50

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi!

> > > Agreed. I wasn't seriously suggesting changing everywhere to be
> > > GFP_NOWARN. Perhaps I should be more explicit in what I'm saying here.
> > > The problem isn't just suspend trying to allocate memory. It's
> > > _ANYTHING_ that might be running trying to allocate memory while we're
> > > eating memory. (Remember that we don't just call shrink_all_memory, but
> > > also allocate that memory so other processes don't grab it and stop us
> > > making forward progress). As a result, they're going to scream when they
> > > can't allocate a page.
> >
> > Hmm, that does not look too healthy. That means that userland programs
> > will see all kinds of weird error conditions that normally
> > "almost-can't-happen" during normal usage.
>
> Failure to allocate memory should be something any caller to get_*_page
> deals with, so if they don't, are we to be blamed?

Well, you'll have things like select() returning -ENOMEM. Applications
will not be too happpy. We can probably live with that, but it is not
nice.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:53:58

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > > I'm thus seeking to simply merge the existing code, let Pavel and others
> > > get to the point where they're ready to say "Okay, we're satisfied that
> > > suspend2 does everything swsusp does and more and better." Then we can
> > > remove swsusp. This is the plan that was discussed with Pavel and Andrew
> > > ages ago. I've just been slow to get there because I'm doing this
> > > part-time voluntary.
> >
> > hugang seems to show that it indeed is possible to incrementally turn
> > swsusp into suspend2. I do not think Andrew really wanted it that way,
> > and I thought of that as of neccessary evil.
>
> With some changes, yes. But when you come to using extents or
> abstracting the method of storage and implementing plugins, it will be
> ground-up redesign. Of course you might not want to go that far.

I'd prefer not to get plugins and abstract storage. I'm not sure about
extents, but as soon as I can get rid of order-8 allocations, things
should be ok.

> > [Okay, at this point I'll understand when you'll put my picture as a
> > texture to some doom3 monster and shoot me thousand times... Lot of
> > work went into suspend2, but in the meantime lot of work went into
> > swsusp1, too...]
>
> Not at all. Perhaps I'm overstating the case or not spending enough time
> looking at your code, but I don't actually think swsusp has changed a
> lot in the two years since I started working on this. (Want my picture
> now? :>)

Well, it was rewriten by Patrick so it actually looks okay, and it
started to work for users...

> > > - Speed: All I/O is asynchronous where possible and readahead used where
> > > not. Routines everywhere optimised to get things done as fast as poss.
> > > (Think low battery).
> >
> > I fixed O(n^2) behaviour in swsusp1 (not yet in). I do not think that
> > asynchronous I/O is does that much difference.
>
> Oh, it makes a huge difference once you're not eating all the memory you
> can. If I submit I/O one at a time, I do 1 or 2 MB/s. With asynchrounous
> I/O, I can write 70MB/s and read 110MB/s with compression, 58|58 without
> compression (that's the maximum throughput of the drive I'm using at the
> moment). If I can streamline things a further, I should be able to lift
> that write rate further, too.

Okay, 58MB/sec is better than 1MB/sec. I do not think I want the
complexity neccessary to get me 70MB/sec.

In some ways, suspend2 is two years ahead of rest of kernel:
* you have interactive debugging
* file compression
* nice splash screen
* plugin interface for transparent network support

Unfortunately, we do not want compression done like that. It would
make sense to do compressed-LVM or something like that (that way
everyone would get the benefit), but it does not make sense to have it
just for suspend2. And we do not want the rest of features, too,
unless they work for the rest of kernel.

> > > - Test bed: Around 10,000 downloads of the 1.0 patch, 2730 to date of
> > > the 2.1.5 version I released 2 weeks ago.
> >
> > Hmm, look at number of downloads of 2.6.9 kernel, I think I win here
> > ;-)))). SuSE9.2 is actually shipping swsusp1 and advertising it as a
> > feature. And it seems to work for people...
>
> :> But not everyone who uses 2.6.9 uses swsusp. :>

But they should ;-).

> > > - Swap file support
> > > - Support for LVM/dm-crypt and siblings
> > > - Support for having device drivers as modules (resume from an
> > > initrd/initramfs)
> >
> > Okay, you win these.
>
> I don't want to have a competition, really. I just want to convince you
> that I've done some worthwhile work :>

You did wonderfull work -- you shown what is possible with
suspend2. Now we just need to scale it back to what is practical. It
needs not only to work, it also needs to be nice, simple, and easy to
maintain.

> > > - Designed to save as much of memory as possible rather than as little
> > > (making the system more responsive post-resume).
> >
> > hugang already has a patch, but I'm not 100% sure if I want it
> > in. Yes, people seem to like this feature, but it complicates
> > *design*, quite a lot.
>
> It does. But if there were fundamental flaws in the approach, we would
> have found them by now. Since you're using bio calls and not swap's own
> read/write functions, you shouldn't have any problems.

I believe it has at least one pretty bad flaw: it has hooks all over
the place and will be nightmare to maintain. Puting suspend hooks into
memory allocation is not nice.

swsusp1 is pretty self-contained. As long as drivers stop the DMA and
NMI does nothing wrong, atomic snapshot will indeed be atomic.

Can you list conditions neccessary for suspend2 to work?

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:53:39

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.

Hi!

> > > The sys_ functions are exported because a while ago, people suggested I
> > > use /dev/console to output text that doesn't need to be logged, and I
> > > also use /dev/splash for the bootsplash support. These functions were
> >
> > Well, we don't do ascii-art on kernel boot and I do not see why we should do it
> > on suspend. Distributions will love bootsplash integration, but it should stay as a separate
> > patch.
>
> It's modular, so no problem there.

That *is* problem. mainline kernel is not expected to carry stuff like
that. We do not have bash as a kernel module. Not even gzip. And we
should not have ascii-art, too.

> > See swsusp1... it has percentage printing, and I think it should
> > be possible to make it look good enough.
>
> We can always make a tex_ mode_for_Pavel plugin :>

Yes, and then kill all the other plugins from mainline patches and
submit that. That would work. Ouch and that means no ugly hooks all
over the place in whatever goes into mainline.... and preferably no
plugin interface, too.

> > Why do you need sys_mkdir?
>
> The text mode plugin is using it to make /dev (if it doesn't exist) so
> it can try to mount devfs (if necessary) and open /dev/console to do the
> output. I'd love to just use vt_console_print, but those who know better
> then me said to use /dev/console...

Ugh, ouch. So if /dev/ is not there you just go and walk over user's
filesystem? Please, let's forget the asciiart.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:57:46

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi.

On Fri, 2004-11-26 at 08:45, Pavel Machek wrote:
> Hi!
>
> > > And if you really want to make it changeable, pass major:minor from userland; once
> > > userland is running getting them is easy.
> >
> > Yes, but that's also far uglier, and who thinks in terms of major and
> > minor numbers anyway? I think of my harddrive as /dev/sda, not 08:xx.
> > The parsing accepts majors and minors, of course, but shouldn't we make
> > these things easier to do, not harder? (Would we insist on using majors
> > and minors for root=?).
>
> Kernel interface is not supposed to be "easy". root= has exception,
> that's init code, and you can't easily ls -al /dev at that point. If
> you want easy interface, create userland program that looks up
> minor/major in /dev/ and uses them.

That's a fair possibility, but is it really worth it when all we need to
do is make two routines not be init? We would still have to duplicate
some of this code elsewhere anyway, because we need to parse the major
and minor numbers.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-26 23:54:04

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 43/51: Utility functions.

Hi!

> These are the routines that I think could possibly be useful elsewhere
> too.
>
> - A snprintf routine that returns the number of bytes actually put into
> the buffer, not the number that would have been put in if the buffer was
> big enough.
> - Routine for finding a proc dir entry (we use it to find /proc/splash
> when)
> - Support routines for dynamically allocated pageflags. Save those
> precious bits!

How many bits do you need? Two? I'd rather use thow two bits than have
yet another abstraction. Also note that it is doing big order
allocation.


Pavel
> +#define BITS_PER_PAGE (PAGE_SIZE * 8)
> +#define PAGES_PER_BITMAP ((max_mapnr + BITS_PER_PAGE - 1) / BITS_PER_PAGE)
> +#define BITMAP_ORDER (get_bitmask_order((PAGES_PER_BITMAP) - 1))
> +
> +/* clear_map
> + *
> + * Description: Clear an array used to store local page flags.
> + * Arguments: unsigned long *: The pagemap to be cleared.
> + */
> +
> +void clear_map(unsigned long * pagemap)
> +{
> + int size = (1 << BITMAP_ORDER) * PAGE_SIZE;
> +
> + memset(pagemap, 0, size);
> +}
> +
> +/* allocate_local_pageflags
> + *
> + * Description: Allocate a bitmap for local page flags.
> + * Arguments: unsigned long **: Pointer to the bitmap.
> + * int: Whether to set nosave flags for the
> + * newly allocated pages.
> + * Note: This looks suboptimal, but remember that we might be allocating
> + * the Nosave bitmap here.
> + */
> +int allocate_local_pageflags(unsigned long ** pagemap, int setnosave)
> +{
> + unsigned long * check;
> + int i;
> + if (*pagemap) {
> + printk("Error. Local pageflags map already allocated.\n");
> + clear_map(*pagemap);
> + } else {
> + check = (unsigned long *) __get_free_pages(GFP_ATOMIC,
> + BITMAP_ORDER);
> + if (!check) {
> + printk("Error. Unable to allocate memory for local page flags.");
> + return 1;
> + }
> + clear_map(check);
> + *pagemap = check;
> + if (setnosave) {
> + struct page * firstpage =
> + virt_to_page((unsigned long) check);
> + for (i = 0; i < (1 << BITMAP_ORDER); i++)
> + SetPageNosave(firstpage + i);
> + }
> + }
> + return 0;
> +}
> +
> +/* freemap
> + *
> + * Description: Free a local pageflags bitmap.
> + * Arguments: unsigned long **: Pointer to the bitmap being freed.
> + * Note: Map being freed might be Nosave.
> + */
> +int free_local_pageflags(unsigned long ** pagemap)
> +{
> + int i;
> + if (!*pagemap)
> + return 1;
> + else {
> + struct page * firstpage =
> + virt_to_page((unsigned long) *pagemap);
> + for (i = 0; i < (1 << BITMAP_ORDER); i++)
> + ClearPageNosave(firstpage + i);
> + free_pages((unsigned long) *pagemap, BITMAP_ORDER);
> + *pagemap = NULL;
> + return 0;
> + }
> +}
> +
> +EXPORT_SYMBOL(suspend_snprintf);
> +EXPORT_SYMBOL(allocate_local_pageflags);
> +EXPORT_SYMBOL(free_local_pageflags);
> +EXPORT_SYMBOL(find_proc_dir_entry);
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:54:03

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi!

> > > > > Included in this patch is a new try_to_freeze() macro Andrew M suggested
> > > > > a while back. The refrigerator declarations are put in sched.h to save
> > > > > extra includes of suspend.h.
> > > >
> > > > try_to_freeze looks nice. Could we get it in after 2.6.10 opens?
> > >
> > > I'm hoping to get the whole thing in mm once all these replies are dealt
> > > with. Does that sound unrealistic?
> >
> > Yes, a little ;-).
>
> I'm not talking about talking about problems and then doing nothing :>
> I'm writing a list of changes as I look at each of these responses.
> Assuming they're all addressed (or not changed for good reasons), and
> the code is actually useful, why shouldn't it go into mm?

It has chance to go into mm, but I do not think all 51 patches will go
at once. And I expect few more rounds of patches / comments. (And then
some patch / "it is too big" flamewar, too :-).

> > Silently doing nothing when user asked for sync is not nice,
> > either. BUG() is better solution than that.
>
> I don't think we should BUG because the user presses Sys-Rq S while
> suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
> when suspending. Sound reasonable?

Yes, that's better. ... only that it means just another hook somewhere
:-(.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:54:02

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 48/51: Swapwriter

Hi!

> This is the swapwriter. It is forms the glue between the highlevel I/O
> routines in io.c and the blockwriter routines in block_io.c. It is
> responsible for allocating storage, translating the requests for pages
> within pagesets into devices and blocks and the like. It is abstracted
> from the block writer because the plan is that we'll eventually have a
> generic file writer (ie not using swapspace, but a simple file,
> possibly

This file alone is bigger than whole swsusp1. That strongly suggests
you have too many layers of abstraction in there. Planning for future
is nice, but not at this cost.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:54:01

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 17/51: Disable MCE checking during suspend.

Hi!

> > > Avoid a potential SMP deadlock here.
> >
> > ..and loose MCE report.
>
> Deadlock or get an MCE report and do a printk when we're shutting down
> anyway?

If MCE happens, I'd like user to report it. Loosing it is wrong,
deadlocking may be actually better because at least you get the
report. I'd BUG().

MCEs are hardware problem, right? They should not be common.
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:53:59

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 36/51: Highlevel I/O routines.

Hi!

> +extern volatile int suspend_io_time[2][2];

Why volatile?

> +
> + PRINTFREEMEM("after initialising page transformers");
> +
> + /* Initialise writer */
> + active_writer->ops.filter.write_init(whichtowrite);
> + PRINTFREEMEM("after initialising writer");
> +
> + get_first_pbe(&pbe, pagedir);
> +
> + /* Write the data */
> + for (i=0; i<size; i++) {
> + int was_mapped = 0;
> + /* Status update */
> + if (!(i&0x1FF))
> + suspend_message(SUSPEND_IO, SUSPEND_LOW, 1, ".");
> + if (((i+base) >= nextupdate) ||
> + (!(i%(1 << (20 - PAGE_SHIFT)))))
> + nextupdate = update_status(i + base, barmax,
> + " %d/%d MB ", MB(base+i+1), MB(barmax));
> + if ((i == (size - 5)) &&
> + TEST_ACTION_STATE(SUSPEND_PAUSE_NEAR_PAGESET_END))
> + check_shift_keys(1, "Five more pages to write.");
> + suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1,
> + "Submitting page %d/%d.\n", i, size);
> +
> + /* Write */
> + was_mapped = suspend_map_kernel_page(pbe.address, 1);
> + if (TEST_ACTION_STATE(SUSPEND_TEST_FILTER_SPEED))
> + ret = first_filter->ops.filter.write_chunk(pbe.origaddress);
> + else
> + ret = first_filter->ops.filter.write_chunk(pbe.address);
> + if (!was_mapped)
> + suspend_map_kernel_page(pbe.address, 0);
> +
> + if (ret) {
> + printk("Write chunk returned %d.\n", ret);
> + abort_suspend("Failed to write a chunk of the "
> + "image.");
> + error = -1;
> + goto write_pageset_free_buffers;
> + }

Half of this code seems to be pretty-prints, and performance
metering. That should be gone before mainline merge.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-26 23:54:00

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 30/51: Enable slab alloc fallback to suspend memory pool

Hi!

> When we are preparing the image and have eaten all available memory, but
> before page allocations have been switched over to the memory pool, we
> sometimes need to allocate memory from slab for the image metadata (swap
> header information). This code allows the slab allocator to fall back to
> the memory pool in such circumstances. There is some extra debugging
> code there at the moment while I seek to diagnose intermittent slab
> corruption (not sure if it's suspend related).

More reasons to dislike two pagesets. Also you probably should not
printk() with two !!s in it (but without severity). Is it bug or not?


> diff -ruN 817-enable-slab-alloc-fallback-to-suspend-memory-pool-old/mm/slab.c 817-enable-slab-alloc-fallback-to-suspend-memory-pool-new/mm/slab.c
> --- 817-enable-slab-alloc-fallback-to-suspend-memory-pool-old/mm/slab.c 2004-11-24 15:48:55.066733152 +1100
> +++ 817-enable-slab-alloc-fallback-to-suspend-memory-pool-new/mm/slab.c 2004-11-23 07:11:42.000000000 +1100
> @@ -874,14 +874,30 @@
> flags |= cachep->gfpflags;
> if (likely(nodeid == -1)) {
> addr = (void*)__get_free_pages(flags, cachep->gfporder);
> + if (unlikely((!addr) && (current->pid == suspend_task) &&
> + test_suspend_state(SUSPEND_SLAB_ALLOC_FALLBACK))) {
> + addr = (void *) suspend2_get_grabbed_pages(0);
> + printk("!! Slab addition satisfied via fallback code.\n");
> + }
> if (!addr)
> return NULL;
> + if (unlikely(test_suspend_state(SUSPEND_RUNNING)))
> + printk("Order %d allocation %p added to slab %p.\n",
> + cachep->gfporder, addr, cachep);
> page = virt_to_page(addr);
> } else {

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 01:28:15

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi.

On Fri, 2004-11-26 at 05:15, Pavel Machek wrote:
> Hi!
>
> > While eating memory, we will potentially trigger this a lot. We
> > therefore disable the message when suspending.
>
> You should only trigger this while eating memory, so *one* GFP_NOWARN should be
> enough. And shrink_all_memory should fix it anyway.

Agreed. I wasn't seriously suggesting changing everywhere to be
GFP_NOWARN. Perhaps I should be more explicit in what I'm saying here.
The problem isn't just suspend trying to allocate memory. It's
_ANYTHING_ that might be running trying to allocate memory while we're
eating memory. (Remember that we don't just call shrink_all_memory, but
also allocate that memory so other processes don't grab it and stop us
making forward progress). As a result, they're going to scream when they
can't allocate a page.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-27 01:36:00

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

Hi!

> > > + * Note that the context and timing of this function is pretty critical.
> > > + * With a minimal amount of things going on in the caller and in here, gcc
> > > + * does a good job of being just a dumb compiler. Watch the assembly output
> > > + * if anything changes, though, and make sure everything is going in the right
> > > + * place.
> >
> > You should include assembly source (unless you can test all the compilers...). Feel free
> > to include C version, too, but #ifdef it out.
>
> I'm thinking I should actually be removing the comment. The C is simple,
> clear, fast and easy to maintain and we haven't actually had any
> problems at all with compilers. All my tweaking in here has turned out
> to be irrelevant to the real cause of problems (I recently found a bug
> where work queues were wrongly inheriting freezer flags; since fixing
> that, all the symptoms in this area have gone away).

See the flames I got when I did just that. No, it needs to be in
assembly, because (by standard) C compiler is allowed to misoptimize
it.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 01:44:06

by Matthew Garrett

[permalink] [raw]
Subject: Re: Suspend 2 merge: 34/51: Includes

Nigel Cunningham <[email protected]> wrote:

> I can see that it might look that way, but it's actually fundamental to
> the support for building as modules (which is required for LVM &
> encryption), and has been really helpful in creating clear distinctions
> between the different parts of suspend. It also provides a clear method
> for someone to add support for their new wizz-bang storage method or
> compressor.

I'm not entirely clear on this. Surely all that's needed for LVM and
encryption support is for that to be set up in userspace and then allow
userspace to trigger a second attempt at resume? I have a hacky patch
for swsusp that allows that (at the moment it just adds a "resume"
method to /sys/power/state), which gives you the functionality without
the module pain.

--
Matthew Garrett | [email protected]

2004-11-27 01:44:00

by Tomas Carnecky

[permalink] [raw]
Subject: Re: Suspend 2 merge: 36/51: Highlevel I/O routines.

Pavel Machek wrote:
> Hi!
>
>
>>+extern volatile int suspend_io_time[2][2];
>
>
> Why volatile?

I think Linus doesn't like this keyword very much. And I
also think he said it should not be used.

tom

2004-11-27 02:19:39

by Matthew Garrett

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Pavel Machek <[email protected]> wrote:
> Hi!
>
>> 1) Make name_to_dev_t non init. Why should you need to reboot if all you
>> want to do is change the device you're using to suspend? That's M_'s way
>
> Well, if you change it using /proc and forget to change kernel cmd line, you'll have
> a problem. Do you really change this so often?

name_to_dev_t needs to be non-init in order to make it possible to
trigger a resume when the block device driver isn't static. Pavel, would
you be willing to consider a patch to make it possible to trigger swsusp
resume from userspace? That gets things working with initrd kernels.
I've been using something along these lines for a few weeks now, and it
hasn't eaten my filesystem yet.

> Hmm , this will need a lot of testing and a lot of care... You for example
> mah not write to your fs's before activating it. And if you use this feature,
> kernel no longer has chance to kill suspend signature on normal boot,
> making "shoot(self, foot)" easier.

Yes, I was thinking of this mostly from a distribution perspective.
There's always potential for data-loss if users resume after touching
the root filesystem. On the other hand, it's currently possible for them
to do that anyway (think booting a different kernel without swsusp
support, then rebooting back into the swsusp one)

--
Matthew Garrett | [email protected]

2004-11-27 02:23:28

by Matthew Garrett

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Nigel Cunningham <[email protected]> wrote:

> You want your cake and to eat it too? :> We don't want to warn the user
> before they shoot themselves in the foot, but not loudly enough that
> they can't help notice and choose to do something before the damage is
> done?

We have userspace to do this, surely? Make the standard method of
triggering resume involve an initrd, and have a small application that
does sanity checks before the resume. In case of failure, have it prompt
the user. As long as it doesn't do bad things to the filesystem,
there's no danger. There's no reason to do this in the kernel.

--
Matthew Garrett | [email protected]

2004-11-27 02:27:17

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge:L 12/51: Disable OOM killer when suspending.

Hi.

On Fri, 2004-11-26 at 05:12, Pavel Machek wrote:
> Hi!
>
> > When preparing the image, suspend eats all the memory in sight, both to
> > reduce the image size and to improve the reliability of our stats (We've
> > worked hard to make it work reliably under heavy load - 100+). Of course
> > this can result in the OOM killer being triggered, so this simple test
> > stops that happening.
>
> andrew's shrink_all_memory should enable you to free memory without
> hacking OOM killer, no?

I do use shrink_all_memory, but I also then allocate those pages that
were freed. We added that when seeking to get Suspend to work well and
reliably under heavy load. IIRC, the issue was that pages that were
freed were immediately getting allocated by other programs. Having said
this, it is a while since I looked at the code for preparing the image.
I can take a look and confirm my thinking.

> If shrink_all_memory is broken... fix it.

Agree.

> > + if (test_suspend_state(SUSPEND_FREEZER_ON))
> > + return;
> > +
>
> Hmm, yes, something like this migh be usefull for BUG_ONs etc...
> For consistency, right name is probably in_suspend(void).

There is a difference; there is sections of time where we're in_suspend
(test_suspend_state(SUSPEND_RUNNING)) but the freezer isn't on (initial
set up and cleanup). As far as the OOM killer goes, it probably doesn't
matter which is used, but I thought it important to point out that
freezer being on !== in_suspend(). (Freezer could also be on for S3?..
'spose you don't care of OOM killer runs then, though). Would you like
to see in_freezer()?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-27 02:37:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge: 46/51: LZF support.

> Since this is a completely new file (as far as kernel tree is concerned)
> could you convert it to proper coding style (braces placement, identation)?

While I'm normally a big advocate of sane indentation it looks like these
two files are taken unmodified from some external library and are unlike to
be modified. Maybe keep them as is to ease a possible future resync (and
comparisms with upstream) ?

2004-11-27 02:02:49

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 51/51: Notes

Hi!

> > It is just too big. suspend2 is small operating system on its own, and
> > that is not good thing :-(.
>
> I feel like you overstate your case a lot. I can see what you mean in
> some ways, though.

I just wish I could write Al-Viro-style replies ;-). Or even better
have Al Viro saying "no" for me.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 03:52:27

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Hi!

> > You have your own abstraction on the top of /proc? That's no-no.
>
> You'd prefer the same code repeated 20 times?

Rest of kernel is pretty happy with /proc as-is. Why can't suspend2
just play along?

How many options are really neccessary? activate is not. selecting of
reboot can already be done in /sys. Some percentages? There really
should not be 20 things to configure.

> > > + say("BIG FAT WARNING!! %s\n\n", suspend_print_buf);
> > > + if (can_erase_image) {
> > > + say("If you want to use the current suspend image, reboot and try\n");
> > > + say("again with the same kernel that you suspended from. If you want\n");
> > > + say("to forget that image, continue and the image will be erased.\n");
> > > + } else {
> > > + say("If you continue booting, note that any image WILL NOT BE REMOVED.\n");
> > > + say("Suspend is unable to do so because the appropriate modules aren't\n");
> > > + say("loaded. You should manually remove the image to avoid any\n");
> > > + say("possibility of corrupting your filesystem(s) later.\n");
> > > + }
> > > + say("Press SPACE to reboot or C to continue booting with this kernel\n");
> >
> > Plus kernel now actually expects user interaction to solve problems
> > during boot. No, no.
>
> You want your cake and to eat it too? :> We don't want to warn the user
> before they shoot themselves in the foot, but not loudly enough that
> they can't help notice and choose to do something before the damage is
> done?

Kernel boot is not expected to be interactive. I'd do

if (can_erase_image)
printk("Incorrect kernel version, image killed\n");
else
panic("Can't kill suspended image");

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 04:17:49

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi!

> > And if you really want to make it changeable, pass major:minor from userland; once
> > userland is running getting them is easy.
>
> Yes, but that's also far uglier, and who thinks in terms of major and
> minor numbers anyway? I think of my harddrive as /dev/sda, not 08:xx.
> The parsing accepts majors and minors, of course, but shouldn't we make
> these things easier to do, not harder? (Would we insist on using majors
> and minors for root=?).

Kernel interface is not supposed to be "easy". root= has exception,
that's init code, and you can't easily ls -al /dev at that point. If
you want easy interface, create userland program that looks up
minor/major in /dev/ and uses them.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 04:17:48

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 4/51: Get module list.

Hi.

On Fri, 2004-11-26 at 03:56, Pavel Machek wrote:
> Hi!
>
> > This provides access to the list of loaded modules for suspend's
> > debugging output. When a cycle finishes, suspend outputs something the
> > following:
> >
> > > Please include the following information in bug reports:
> > > - SUSPEND core : 2.1.5.7
> > > - Kernel Version : 2.6.9
> > > - Compiler vers. : 3.3
> > > - Modules loaded : tuner bttv videodev snd_seq_oss snd_seq_midi_event
> > > snd_seq snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec snd_pcm
> > > snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device
> > > snd soundcore visor usbserial usblp joydev evdev usbmouse usbhid
> > > uhci_hcd usbcore ppp_deflate zlib_deflate zlib_inflate bsd_comp
> > > ipt_LOG ipt_state ipt_MASQUERADE iptable_nat ip_conntrack
> > > ipt_multiport ipt_REJECT iptable_filter ip_tables ppp_async
> > > ppp_generic slhc crc_ccitt video_buf v4l2_common btcx_risc Win4Lin
> > > mki_adapter radeon agpgart parport_pc lp parport sg ide_cd sr_mod
> > > cdrom floppy af_packet e1000 loop dm_mod tsdev suspend_bootsplash
> > > suspend_text suspend_swap suspend_block_io suspend_lzf suspend_core
> > > - Attempt number : 9
> > > - Parameters : 0 2304 32768 1 0 4096 5
> > > - Limits : 261680 pages RAM. Initial boot: 252677.
> > > - Overall expected compression percentage: 0.
> > > - LZF Compressor enabled.
> > > Compressed 922112000 bytes into 437892038 (52 percent compression).
> > > - Swapwriter active.
> > > Swap available for image: 294868 pages.
> > > - Debugging compiled in.
> > > - Preemptive kernel.
> > > - SMP kernel.
> > > - Highmem Support.
> > > - I/O speed: Write 72 MB/s, Read 119 MB/s.
> >
> > Including the modules loaded is very helpful for debugging problems.
>
> It might be usefull as an add-on patch when people are actually debugging it,
> but I do not think it is needed for mainline. You can just do lsmod before suspend...

Yes. It's still pretty helpful, but not as much as it has been in the
past. We're almost at the point where we can automatically say "Have you
got USB compiled in? Compile it as modules. Have you got USB modules
loaded? Unload them and reload after suspending. Have you got DRI
support loaded? Depending on chipset, do X, Y, or Z." That deals with
the vast majority of freezes.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-27 04:18:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Suspend 2 merge

On Fri, Nov 26, 2004 at 01:38:48PM +0100, Pavel Machek wrote:
> > Again, when you're running on limited time, twice as fast is still twice
> > as fast.
>
> My machine suspends in 7 seconds, and that's swsusp1. According to
> your numbers, suspend2 should suspend it in 1 second and LZE
> compressed should be .5 second.
>
> I'd say "who cares". 7 seconds seems like fast enough for me. And I'm
> *not* going to add 2000 lines of code for 500msec speedup during
> suspend.

Yupp. Premature optimization is the roo of all evil. swsusp is

a) an absolute slowpath compared to any normal kernel operation,
and called extremly seldomly
b) only usefull for a small subset of all linux instances

hacking core code (fastpathes) for speedups there is a really bad idea.
If you can speed it up without beeing intrusive all power to you.

> > I'm trying not to make assumptions about how we're writing the image,
> > either. If you want to pipe your image over a network to some server,
> > you should be able to, and not have to implement compression again in
> > the writer for that.
>
> Suspend-over-network is obscure-enough
> feature. Compressed-suspend-over-network is even worse.
>
> BTW my feeling is that if you want to do suspend-over-network, you
> should just modify nbd to work with suspend2 and stop adding
> special-purpose code to suspend.

Honestly I think it's a feature so obscure that we wouldn't ever want to
merge it unless it just happens to work as a fallout of a more important
feature.

2004-11-27 04:18:45

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 3/51: e820 table support

Hi!

> The first of the 'real' candidates for merging.
>
> This adds support for setting and clearing the Nosave status of pages
> based on the contents of the e820 table, and clearing Nosave for __init
> pages when they're freed.

I'd say that page that is both nosave and __init would be a bug.
But strategic BUG_ON() would be welcome...

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:44

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 5/51: Workthread freezer support.

Hi!

> This thread adds freezer support for workthreads.
>
> A new parameter in the create_ functions allows the thread to be marked
> as PF_NOFREEZE. This should only be used for threads that may need to
> run during writing the image.

Ok.
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:42

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 23/51: PPC support.

Hi!

> Not updated for a while, so I'm not sure if it still works. If not, it
> shouldn't take much to get it going again.

It should have a lot in common with hugang's swsusp1/ppc support, right?
Can you coordinate with him and get that in?

Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:41

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 17/51: Disable MCE checking during suspend.

Hi!

> Avoid a potential SMP deadlock here.

..and loose MCE report.
Pavel

> @@ -57,7 +58,8 @@
>
> static void mce_work_fn(void *data)
> {
> - on_each_cpu(mce_checkregs, NULL, 1, 1);
> + if (!test_suspend_state(SUSPEND_RUNNING))
> + on_each_cpu(mce_checkregs, NULL, 1, 1);
> schedule_delayed_work(&mce_work, MCE_RATE);
> }
>
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:40

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 19/51: Remove MTRR sysdev support.

Hi!

> This patch removes sysdev support for MTRRs (potential SMP hang and
> shouldn't be done with interrupts done anyway). Instead, we save and
> restore MTRRs when entering and exiting the processor freezers (ie when
> saving the registers & context for each CPU via an SMP call).

This will break acpi s3...
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:38

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.

Hi!

> The sys_ functions are exported because a while ago, people suggested I
> use /dev/console to output text that doesn't need to be logged, and I
> also use /dev/splash for the bootsplash support. These functions were

Well, we don't do ascii-art on kernel boot and I do not see why we should do it
on suspend. Distributions will love bootsplash integration, but it should stay as a separate
patch.

See swsusp1... it has percentage printing, and I think it should
be possible to make it look good enough.

Why do you need sys_mkdir?

Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:37

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 13/51: Disable highmem tlb flush for copyback.

Hi!

> When we're making/restoring our atomic copy of the image, secondary
> processors are frozen. Trying an SMP call at that time could thus lead
> to deadlock. Secondary processors have
Yes, and thats reason not to do SMP calls, not to hack SMP calling!
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:36

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi!

> Included in this patch is a new try_to_freeze() macro Andrew M suggested
> a while back. The refrigerator declarations are put in sched.h to save
> extra includes of suspend.h.

try_to_freeze looks nice. Could we get it in after 2.6.10 opens?

> +++ 582-refrigerator-new/drivers/pnp/pnpbios/core.c 2004-11-24 17:58:33.769748640 +1100
> @@ -179,6 +179,10 @@
> * Poll every 2 seconds
> */
> msleep_interruptible(2000);
> +
> + if(current->flags & PF_FREEZE)
> + refrigerator(PF_FREEZE);
> +
> if(signal_pending(current))
> break;
>

Use new interface here?

> */
> int fsync_super(struct super_block *sb)
> {
> + int ret;
> +
> + /* A safety net. During suspend, we might overwrite
> + * memory containing filesystem info. We don't then
> + * want to sync it to disk. */
> + if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
> + return 0;
> +

If it is safety net, do BUG_ON().
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:18:36

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 14/51: Disable page alloc failure message when suspending

Hi!

> While eating memory, we will potentially trigger this a lot. We
> therefore disable the message when suspending.

You should only trigger this while eating memory, so *one* GFP_NOWARN should be
enough. And shrink_all_memory should fix it anyway.
Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:17:52

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge:L 12/51: Disable OOM killer when suspending.

Hi!

> When preparing the image, suspend eats all the memory in sight, both to
> reduce the image size and to improve the reliability of our stats (We've
> worked hard to make it work reliably under heavy load - 100+). Of course
> this can result in the OOM killer being triggered, so this simple test
> stops that happening.

andrew's shrink_all_memory should enable you to free memory without
hacking OOM killer, no?

If shrink_all_memory is broken... fix it.
Pavel

> + if (test_suspend_state(SUSPEND_FREEZER_ON))
> + return;
> +

Hmm, yes, something like this migh be usefull for BUG_ONs etc...
For consistency, right name is probably in_suspend(void).

Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:17:53

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 10/51: Exports for suspend built as modules.

Hi.

On Fri, 2004-11-26 at 05:07, Pavel Machek wrote:
> Hi!
>
> > The sys_ functions are exported because a while ago, people suggested I
> > use /dev/console to output text that doesn't need to be logged, and I
> > also use /dev/splash for the bootsplash support. These functions were
>
> Well, we don't do ascii-art on kernel boot and I do not see why we should do it
> on suspend. Distributions will love bootsplash integration, but it should stay as a separate
> patch.

It's modular, so no problem there.

> See swsusp1... it has percentage printing, and I think it should
> be possible to make it look good enough.

We can always make a tex_ mode_for_Pavel plugin :>

> Why do you need sys_mkdir?

The text mode plugin is using it to make /dev (if it doesn't exist) so
it can try to mount devfs (if necessary) and open /dev/console to do the
output. I'd love to just use vt_console_print, but those who know better
then me said to use /dev/console...

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-27 05:09:29

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 16/51: Disable cache reaping during suspend.

Hi!
> I have to admit to being a little unsure as to why this is needed, but
> suspend's reliability is helped a lot by disabling cache reaping while
> suspending. Perhaps one of the mm guys will be able to enlighten me
> here. Might be SMP related.

It would be good to understand it. Rather than slowing common code... why
not down(&cache_chain_sem) in suspend2?
Pavel

> {
> struct list_head *walk;
>
> - if (down_trylock(&cache_chain_sem)) {
> + if ((unlikely(test_suspend_state(SUSPEND_RUNNING))) ||
> + (down_trylock(&cache_chain_sem)))
> + {
> /* Give up. Setup the next iteration. */
> schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC + smp_processor_id());
> return;
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 04:17:47

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 22/51: Suspend2 lowlevel code.

Hi!

> +#include "../../../kernel/power/suspend.h"

Ouch.

> +#define loaddebug(thread,register) > + __asm__("movl %0,%%db" #register > + : /* no output */ > + :"r" ((thread)->debugreg[register]))

This should be already defined somewhere...

> + * Note that the context and timing of this function is pretty critical.
> + * With a minimal amount of things going on in the caller and in here, gcc
> + * does a good job of being just a dumb compiler. Watch the assembly output
> + * if anything changes, though, and make sure everything is going in the right
> + * place.

You should include assembly source (unless you can test all the compilers...). Feel free
to include C version, too, but #ifdef it out.

Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 05:09:30

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 4/51: Get module list.

Hi!

> > It might be usefull as an add-on patch when people are actually debugging it,
> > but I do not think it is needed for mainline. You can just do lsmod before suspend...
>
> Yes. It's still pretty helpful, but not as much as it has been in the
> past. We're almost at the point where we can automatically say "Have you
> got USB compiled in? Compile it as modules. Have you got USB modules
> loaded? Unload them and reload after suspending. Have you got DRI
> support loaded? Depending on chipset, do X, Y, or Z." That deals with
> the vast majority of freezes.

Yes, that's big problem. We need to get driver support right.... Hmm,
perhaps we should at least make the "known-problematic" cases block
suspend with helpfull message?

It works for me (tm) with usb with recent kernels.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:17:35

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi!

> > Parsing major/minor should be as simple as sscanf("%d %d"). And you'll
> > have one less modification to generic code. Yes I think it is worth
> > it.
>
> In that case, we shouldn't access names at boot time either; the
> interface should be consistent, shouldn't it? I really would prefer to
> keep things as they are; is it worth all this fuss?

/proc file is really different from kernel commandline. Consistent
interface is nice but run-time allocated kernel memory is nicer, and
hooks to common code are not nice. Simply drop that /proc file and we
are done with that fuss.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:17:34

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 51/51: Notes

Hi!

> When I started, I thought I did have 51 patches, really! One of them
> turned out to be a couple of things I intend to reverse :>

:-))))

> In posting all of this, I recognise of course that no one else
> understands how it all fits together. I'm hoping that those who care
> enough will ask questions that I'll happily answer, learn from and
> through which I'll improve the code.
>
> For now, though, I'm going to bed.

I still had not fallen asleep at keyboard, and that is pretty
amazing...

It is just too big. suspend2 is small operating system on its own, and
that is not good thing :-(.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:17:33

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 50/51: Device mapper support.

Hi!

> This is the device mapper support plugin. Its sole purpose is to ensure
> that the device mapper allocates enough memory to process all of the I/O
> we want to throw at it.

This needs to go through dm people....

> +static struct suspend_proc_data disable_dm_support_proc_data = {
> + .filename = "disable_device_mapper_support",
> + .permissions = PROC_RW,
> + .type = SUSPEND_PROC_DATA_INTEGER,
> + .data = {
> + .integer = {
> + .variable = &suspend_dm_ops.disabled,
> + .minimum = 0,
> + .maximum = 1,
> + }
> + }
> +};

What is this good for? Debugging switch?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:28:57

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi!

> > > > Silently doing nothing when user asked for sync is not nice,
> > > > either. BUG() is better solution than that.
> > >
> > > I don't think we should BUG because the user presses Sys-Rq S while
> > > suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
> > > when suspending. Sound reasonable?
> >
> > Yes, that's better. ... only that it means just another hook somewhere
> > :-(.
>
> :<. But we're only talking two or three lines. Let's keep it in
> perspective.

I think even three lines are bad. It means that swsusp is no longer
self-contained subsystem, but that it has its hooks all over the
place. And those hooks need to be maintained, too.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:28:59

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Hi!

> > Kernel boot is not expected to be interactive. I'd do
> >
> > if (can_erase_image)
> > printk("Incorrect kernel version, image killed\n");
> > else
> > panic("Can't kill suspended image");
> >
>
> Comes down, again, to user friendliness. Just because I can erase the
> image, doesn't mean I should. It may be that the user just pressed the
> down arrow one too few times in lilo, and they really do have the right
> kernel, but started the wrong one. Or it may be that they're still
> setting up their initrd, didn't get it quite right, know that no damage
> will be done and want to continue booting. We should let the user think
> about what they want to do and then do it.

User friendlyness is nice, but I think "boot is not interactive" is
stronger requirement than that.

> I need to get on with the work I planned on doing today, so I'm going to
> hang up after sending this. That's not at all to say that I want you to
> stop sending email; just that I won't be replying for a while.

No problem, I need some sleep.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:28:58

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi!

> > > I'd agree, except that I don't know how many to allocate. It makes
> > > getting a reliable suspend the result of guess work and favourable
> > > circumstances. Fixing 'broken' drivers by really suspending them seems
> > > to me to be the right solution. Make their memory requirements perfectly
> > > predictable.
> >
> > Except for the few drivers that are between suspend device and
> > root. So you still have the same problem, and still need to
> > guess. Plus you get complex changes to driver model.
>
> I think you're overstating your case. All we're talking about doing is
> quiescing the same drivers that would be quiesced later, in the same
> way, but earlier in the process. Apart from the code I already have in
> that patch, nothing else is needed. The changes aren't that complex,
> either.

Driver model now needs to know how to handle tree where some parts are
suspended and some are not, and I think that's quite a big change.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:36:07

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi!

> > > Included in this patch is a new try_to_freeze() macro Andrew M suggested
> > > a while back. The refrigerator declarations are put in sched.h to save
> > > extra includes of suspend.h.
> >
> > try_to_freeze looks nice. Could we get it in after 2.6.10 opens?
>
> I'm hoping to get the whole thing in mm once all these replies are dealt
> with. Does that sound unrealistic?

Yes, a little ;-).

> > > */
> > > int fsync_super(struct super_block *sb)
> > > {
> > > + int ret;
> > > +
> > > + /* A safety net. During suspend, we might overwrite
> > > + * memory containing filesystem info. We don't then
> > > + * want to sync it to disk. */
> > > + if (unlikely(test_suspend_state(SUSPEND_DISABLE_SYNCING)))
> > > + return 0;
> > > +
> >
> > If it is safety net, do BUG_ON().
>
> Could get triggered by user pressing SysRq. (Or via a panic?). I don't
> think the SysRq should result in a panic; nor should a panic result in a
> recursive call to panic (although I'm wondering here, wasn't the call to
> syncing in panic taken out?).

Silently doing nothing when user asked for sync is not nice,
either. BUG() is better solution than that.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:36:10

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi!

> This patch allows the device tree to be split up into multiple trees. I
> don't really expect it to be merged, but it is an important part of
> suspend at the moment, and I certainly want to see something like it
> that will allow us to suspend some parts of the device tree and not
> others. Suspend2 uses it to keep alive the hard drive (or equivalent)
> that we're writing the image to while suspending other devices, thus
> improving the consistency of the image written.
>
> I remember from last time this was posted that someone commented on
> exporting the default device tree; I haven't changed that yet.

Q: I do not understand why you have such strong objections to idea of
selective suspend.

A: Do selective suspend during runtime power managment, that's
okay. But
its useless for suspend-to-disk. (And I do not see how you could use
it for suspend-to-ram, I hope you do not want that).

Lets see, so you suggest to

* SUSPEND all but swap device and parents
* Snapshot
* Write image to disk
* SUSPEND swap device and parents
* Powerdown

Oh no, that does not work, if swap device or its parents uses DMA,
you've corrupted data. You'd have to do

* SUSPEND all but swap device and parents
* FREEZE swap device and parents
* Snapshot
* UNFREEZE swap device and parents
* Write
* SUSPEND swap device and parents

Which means that you still need that FREEZE state, and you get more
complicated code. (And I have not yet introduce details like system
devices).

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 03:53:00

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi!

> > > A plugin for verifying the consistency of an image. Working with kdb, it
> > > can look up the locations of variations. There will always be some
> > > variations shown, simply because we're touching memory before we get
> > > here and as we check the image.
> >
> > Debugging code, can live as external patch pretty well.
>
> Doesn't most of the kernel have debugging code in it? Maybe not as much,
> but most of the kernel isn't doing the same thing. Remember we have the
> option of compiling it out. If it lives as a separate patch, we're just
> making more work for me (I have to maintain the debug version and then
> make some transformation on it to get the mainline version).

There's so much of debuging code that it is unreadable. When it is
separate file it is not so bad, but you'll have 1000 people going over
it "What is this", trying to keep it up-to-date with sparse
annotations etc...

Is not it possible to just debug suspend2 and then drop the debugging
code, not maintaining it any longer?

> By the way, I'm really appreciating your interaction over all these
> points. I was getting worried that I wasn't getting enough. I should say
> now, too, that I'm away all weekend, so you won't get replies tomorrow
> and the day after.

Well, I worry that now you'll be getting way too much
interaction... Anyway have a nice weekend,
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 05:47:21

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi!

> > > > > > Silently doing nothing when user asked for sync is not nice,
> > > > > > either. BUG() is better solution than that.
> > > > >
> > > > > I don't think we should BUG because the user presses Sys-Rq S while
> > > > > suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
> > > > > when suspending. Sound reasonable?
> > > >
> > > > Yes, that's better. ... only that it means just another hook somewhere
> > > > :-(.
> > >
> > > :<. But we're only talking two or three lines. Let's keep it in
> > > perspective.
> >
> > I think even three lines are bad. It means that swsusp is no longer
> > self-contained subsystem, but that it has its hooks all over the
> > place. And those hooks need to be maintained, too.
>
> Yes, but suspending can't practically be a self contained system. We can
> try to convince ourselves that we're making it self contained by hiding
> behind the driver model, but in reality, the driver model is just a nice
> name for our sticky little fingers in all the other drivers, ensuring
> they do the right thing when we want to go to sleep. Hooks in other code
> is just the equivalent, but without the nice name. Perhaps I should
> invent one. How about the "quiescing subsystem"? :>

I know it can't be self-contained, but it should be as self-contained
as possible.

And driver-model means that interaction between swsusp and rest of the
code is pretty well defined and driver authors do not need to
understand the swsusp to properly support it.

Plus driver-model is usefull for suspend-to-ram, too.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 07:02:01

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi.

On Fri, 2004-11-26 at 11:05, Pavel Machek wrote:
> Hi!
>
> > > > > Silently doing nothing when user asked for sync is not nice,
> > > > > either. BUG() is better solution than that.
> > > >
> > > > I don't think we should BUG because the user presses Sys-Rq S while
> > > > suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
> > > > when suspending. Sound reasonable?
> > >
> > > Yes, that's better. ... only that it means just another hook somewhere
> > > :-(.
> >
> > :<. But we're only talking two or three lines. Let's keep it in
> > perspective.
>
> I think even three lines are bad. It means that swsusp is no longer
> self-contained subsystem, but that it has its hooks all over the
> place. And those hooks need to be maintained, too.

Yes, but suspending can't practically be a self contained system. We can
try to convince ourselves that we're making it self contained by hiding
behind the driver model, but in reality, the driver model is just a nice
name for our sticky little fingers in all the other drivers, ensuring
they do the right thing when we want to go to sleep. Hooks in other code
is just the equivalent, but without the nice name. Perhaps I should
invent one. How about the "quiescing subsystem"? :>
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-27 07:06:13

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Hi again.

On Fri, 2004-11-26 at 11:08, Pavel Machek wrote:
> Kernel boot is not expected to be interactive. I'd do
>
> if (can_erase_image)
> printk("Incorrect kernel version, image killed\n");
> else
> panic("Can't kill suspended image");
>

Comes down, again, to user friendliness. Just because I can erase the
image, doesn't mean I should. It may be that the user just pressed the
down arrow one too few times in lilo, and they really do have the right
kernel, but started the wrong one. Or it may be that they're still
setting up their initrd, didn't get it quite right, know that no damage
will be done and want to continue booting. We should let the user think
about what they want to do and then do it.

I need to get on with the work I planned on doing today, so I'm going to
hang up after sending this. That's not at all to say that I want you to
stop sending email; just that I won't be replying for a while.

Once again, thanks very much for your effort. It is good to be made to
defend design decisions and to see where you could do things better or
took things for granted that 'aint necessarily so.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-27 08:02:03

by Jan Rychter

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

>>>>> "Nigel" == Nigel Cunningham <[email protected]>:
Nigel> Hi.
Nigel> On Fri, 2004-11-26 at 10:32, Pavel Machek wrote:
[...]
>> Plus kernel now actually expects user interaction to solve problems
>> during boot. No, no.

Nigel> You want your cake and to eat it too? :> We don't want to warn
Nigel> the user before they shoot themselves in the foot, but not
Nigel> loudly enough that they can't help notice and choose to do
Nigel> something before the damage is done?

You're forgetting that Pavel's idea of user interaction is via BUG_ON()
and panic(). That's obviously "cleaner", "less ugly", and "smaller".

Sorry, can't help being sarcastic after watching the tone of some of
these exchanges, particularly comments from pedestals that are being
made so often. I find it rather sad.

--J.

2004-11-27 08:45:58

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi!

> >> 1) Make name_to_dev_t non init. Why should you need to reboot if all you
> >> want to do is change the device you're using to suspend? That's M_'s way
> >
> > Well, if you change it using /proc and forget to change kernel cmd line, you'll have
> > a problem. Do you really change this so often?
>
> name_to_dev_t needs to be non-init in order to make it possible to
> trigger a resume when the block device driver isn't static. Pavel, would
> you be willing to consider a patch to make it possible to trigger swsusp
> resume from userspace? That gets things working with initrd kernels.
> I've been using something along these lines for a few weeks now, and it
> hasn't eaten my filesystem yet.

Given it is not too intrusive... why not. Send it for comments.
I probably will not use this myself, so you'll need to test/maintain
it.

> Yes, I was thinking of this mostly from a distribution perspective.
> There's always potential for data-loss if users resume after touching
> the root filesystem. On the other hand, it's currently possible for them
> to do that anyway (think booting a different kernel without swsusp
> support, then rebooting back into the swsusp one)

Yep...
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-27 09:32:16

by Herbert Xu

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Pavel Machek <[email protected]> wrote:
>
>> name_to_dev_t needs to be non-init in order to make it possible to
>> trigger a resume when the block device driver isn't static. Pavel, would
>> you be willing to consider a patch to make it possible to trigger swsusp
>> resume from userspace? That gets things working with initrd kernels.
>> I've been using something along these lines for a few weeks now, and it
>> hasn't eaten my filesystem yet.
>
> Given it is not too intrusive... why not. Send it for comments.
> I probably will not use this myself, so you'll need to test/maintain
> it.

This shouldn't be necessary. Since the resume is being initiated by
userspace, it can perform the function of name_to_dev_t and just feed
the numbers to the kernel. The code to do that is still in Debian's
initrd-tools.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2004-11-27 13:21:17

by Matthew Garrett

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Herbert Xu <[email protected]> wrote:
> Pavel Machek <[email protected]> wrote:
>> Given it is not too intrusive... why not. Send it for comments.
>> I probably will not use this myself, so you'll need to test/maintain
>> it.
>
> This shouldn't be necessary. Since the resume is being initiated by
> userspace, it can perform the function of name_to_dev_t and just feed
> the numbers to the kernel. The code to do that is still in Debian's
> initrd-tools.

Good point. Ok, what's the best way to present this to userspace? Add a
/sys/power/resume and then echo a major:minor in there?

--
Matthew Garrett | [email protected]

2004-11-27 16:11:58

by Dave Hansen

[permalink] [raw]
Subject: Re: Suspend 2 merge: 43/51: Utility functions.

On Thu, 2004-11-25 at 16:04, Nigel Cunningham wrote:
> On Fri, 2004-11-26 at 10:46, Pavel Machek wrote:
> > How many bits do you need? Two? I'd rather use thow two bits than have
> > yet another abstraction. Also note that it is doing big order
> > allocation.
>
> Three if checksumming is enabled IIRC. I'll happily use normal page
> flags, but we only need them when suspending, and I understood they were
> rarer than hen's teeth :>
>
> MM guys copied so they can tell me I'm wrong :>

Please remember that, in almost all cases, any use of page->flags can be
replaced by a simple list. Is a page marked foo? Well, just traverse
this data structure and see if the page is in there. It might be a
stinking slow check, but it will *work*.

I think we're up to using 1 bit in the memory hotplug code, but we don't
even need that if some operations can be implemented more slowly.

An extreme example:

struct list_head foo;

int PageSuspendFoo(page)
{
ret = 0;
lock();
list_for_each(foo, bar) {
if (page == bar)
ret = 1;
}
unlock();
return ret;
}

-- Dave

2004-11-27 16:16:50

by Hu Gang

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

On Fri, Nov 26, 2004 at 09:22:16AM +1100, Nigel Cunningham wrote:
> Hi.
>
.......
>
> There's obviously a misunderstanding here. What I do is:
>
> SUSPEND all but swap device and parents
> WRITE LRU pages
> SUSPEND swap device and parents (+sysdev)
> Snapshot
> RESUME swap device and parents (+sysdev)
> WRITE snapshot
> SUSPEND swap device and parents
> POWERDOWN everything
>
> I thought I wrote - perhaps I'm wrong here - that I understand that your
> new work in this area might make this unnecessary. I really only want to
> do it this way because I don't know what other drivers might be doing
> while we're writing the LRU pages. I'm not worried about them touching
> LRU. What I am worried about is them allocating memory and starving
> suspend so that we get hangs due to being oom. If they're suspended, we
> have more certainty as to how memory is being used. I don't remember
> what prompted me to do this in the first place, but I'm pretty sure it
> would have been a real observed issue.
>

Here is a patch make swsusp1 follow above safe LRU saving, :), Most code merge from
suspend2, still relative with pavel's big patch.

-core.diff-
diff -urp 2.6.9-lzf/kernel/power/disk.c 2.6.9/kernel/power/disk.c
--- 2.6.9-lzf/kernel/power/disk.c 2004-11-27 17:33:12.000000000 +0800
+++ 2.6.9/kernel/power/disk.c 2004-11-28 00:11:30.000000000 +0800
@@ -16,10 +16,11 @@
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/fs.h>
+#include <linux/reboot.h>
#include <linux/device.h>
#include "power.h"

-
+extern struct partial_device_tree *swsusp_dev_tree;
extern suspend_disk_method_t pm_disk_mode;
extern struct pm_ops * pm_ops;

@@ -29,6 +30,8 @@ extern int swsusp_read(void);
extern int swsusp_resume(void);
extern int swsusp_free(void);

+extern int swsusp_prepare_suspend(void);
+extern int swsusp_post_resume(void);

static int noresume = 0;
char resume_file[256] = CONFIG_PM_STD_PARTITION;
@@ -48,19 +51,20 @@ static void power_down(suspend_disk_meth
unsigned long flags;
int error = 0;

- local_irq_save(flags);
switch(mode) {
case PM_DISK_PLATFORM:
- device_power_down(PMSG_SUSPEND);
+ local_irq_save(flags);
error = pm_ops->enter(PM_SUSPEND_DISK);
+ local_irq_restore(flags);
break;
case PM_DISK_SHUTDOWN:
printk("Powering off system\n");
- device_shutdown();
+ notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
+ device_suspend_tree(PMSG_FREEZE, swsusp_dev_tree);
machine_power_off();
break;
case PM_DISK_REBOOT:
- device_shutdown();
+ device_suspend_tree(PMSG_FREEZE, swsusp_dev_tree);
machine_restart(NULL);
break;
}
@@ -74,38 +78,6 @@ static void power_down(suspend_disk_meth

static int in_suspend __nosavedata = 0;

-
-/**
- * free_some_memory - Try to free as much memory as possible
- *
- * ... but do not OOM-kill anyone
- *
- * Notice: all userland should be stopped at this point, or
- * livelock is possible.
- */
-
-static void free_some_memory(void)
-{
- int i;
- for (i=0; i<5; i++) {
- int i = 0, tmp;
- long pages = 0;
- char *p = "-\\|/";
-
- printk("Freeing memory... ");
- while ((tmp = shrink_all_memory(10000))) {
- pages += tmp;
- printk("\b%c", p[i]);
- i++;
- if (i > 3)
- i = 0;
- }
- printk("\bdone (%li pages freed)\n", pages);
- current->state = TASK_INTERRUPTIBLE;
- schedule_timeout(HZ/5);
- }
-}
-
static inline void platform_finish(void)
{
if (pm_disk_mode == PM_DISK_PLATFORM) {
@@ -116,7 +88,7 @@ static inline void platform_finish(void)

static void finish(void)
{
- device_resume();
+ swsusp_post_resume();
platform_finish();
enable_nonboot_cpus();
thaw_processes();
@@ -124,7 +96,7 @@ static void finish(void)
}


-static int prepare(void)
+static int prepare(int resume)
{
int error;

@@ -143,14 +115,11 @@ static int prepare(void)
}
}

- /* Free memory before shutting down devices. */
- free_some_memory();
-
disable_nonboot_cpus();
- if ((error = device_suspend(PMSG_FREEZE))) {
- printk("Some devices failed to suspend\n");
- goto Finish;
- }
+ if (!resume)
+ if ((error = swsusp_prepare_suspend())) {
+ goto Finish;
+ }

return 0;
Finish:
@@ -176,7 +145,7 @@ int pm_suspend_disk(void)
{
int error;

- if ((error = prepare()))
+ if ((error = prepare(0)))
return error;

pr_debug("PM: Attempting to suspend to disk.\n");
@@ -233,7 +202,7 @@ static int software_resume(void)

pr_debug("PM: Preparing system for restore.\n");

- if ((error = prepare()))
+ if ((error = prepare(1)))
goto Free;

barrier();
@@ -241,7 +210,7 @@ static int software_resume(void)

pr_debug("PM: Restoring saved image.\n");
swsusp_resume();
- pr_debug("PM: Restore failed, recovering.n");
+ pr_debug("PM: Restore failed, recovering.\n");
finish();
Free:
swsusp_free();
diff -urp 2.6.9-lzf/kernel/power/main.c 2.6.9/kernel/power/main.c
--- 2.6.9-lzf/kernel/power/main.c 2004-11-27 17:33:12.000000000 +0800
+++ 2.6.9/kernel/power/main.c 2004-11-26 00:26:22.000000000 +0800
@@ -4,7 +4,7 @@
* Copyright (c) 2003 Patrick Mochel
* Copyright (c) 2003 Open Source Development Lab
*
- * This file is release under the GPLv2
+ * This file is released under the GPLv2
*
*/

diff -urp 2.6.9-lzf/kernel/power/swsusp.c 2.6.9/kernel/power/swsusp.c
--- 2.6.9-lzf/kernel/power/swsusp.c 2004-11-27 17:33:12.000000000 +0800
+++ 2.6.9/kernel/power/swsusp.c 2004-11-28 00:02:05.000000000 +0800
@@ -63,6 +63,7 @@
#include <linux/console.h>
#include <linux/highmem.h>
#include <linux/bio.h>
+#include <linux/preempt.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -74,11 +75,8 @@
/* References to section boundaries */
extern char __nosave_begin, __nosave_end;

-/* Variables to be preserved over suspend */
-static int pagedir_order_check;
-
extern char resume_file[];
-static dev_t resume_device;
+static dev_t swsusp_resume_device;
/* Local variables that should not be affected by save */
unsigned int nr_copy_pages __nosavedata = 0;

@@ -97,7 +95,6 @@ unsigned int nr_copy_pages __nosavedata
*/
suspend_pagedir_t *pagedir_nosave __nosavedata = NULL;
static suspend_pagedir_t *pagedir_save;
-static int pagedir_order __nosavedata = 0;

#define SWSUSP_SIG "S1SUSPEND"

@@ -168,10 +165,11 @@ static int is_resume_device(const struct
struct inode *inode = file->f_dentry->d_inode;

return S_ISBLK(inode->i_mode) &&
- resume_device == MKDEV(imajor(inode), iminor(inode));
+ swsusp_resume_device == MKDEV(imajor(inode), iminor(inode));
}

-int swsusp_swap_check(void) /* This is called before saving image */
+/* This is called before saving image */
+int swsusp_swap_check(struct partial_device_tree *suspend_device_tree)
{
int i, len;

@@ -195,6 +193,7 @@ int swsusp_swap_check(void) /* This is c
if (is_resume_device(&swap_info[i])) {
swapfile_used[i] = SWAPFILE_SUSPEND;
root_swap = i;
+ device_switch_trees((swap_info[i].bdev)->bd_disk->driverfs_dev, suspend_device_tree);
} else {
swapfile_used[i] = SWAPFILE_IGNORED;
}
@@ -222,8 +221,105 @@ static void lock_swapdevices(void)
}
swap_list_unlock();
}
+
+#define ONE_PAGE_PBE_NUM (PAGE_SIZE/sizeof(struct pbe))
+#define PBE_IS_PAGE_END(x) \
+ ( PAGE_SIZE - sizeof(struct pbe) == ((x) - ((~(PAGE_SIZE - 1)) & (x))) )
+
+#define pgdir_for_each_safe(pos, n, head) \
+ for(pos = head, n = pos ? (suspend_pagedir_t*)pos->dummy.val : NULL; \
+ pos != NULL; \
+ pos = n, n = pos ? (suspend_pagedir_t *)pos->dummy.val : NULL)
+
+#define pbe_for_each_safe(pos, n, index, max, head) \
+ for(pos = head, index = 0, \
+ n = pos ? (struct pbe *)pos->dummy.val : NULL; \
+ (pos != NULL) && (index < max); \
+ pos = (PBE_IS_PAGE_END((unsigned long)pos)) ? n : \
+ ((struct pbe *)((unsigned long)pos + sizeof(struct pbe))), \
+ index ++, \
+ n = pos ? (struct pbe*)pos->dummy.val : NULL)
+
+/* free pagedir */
+static void pagedir_free(suspend_pagedir_t *head)
+{
+ suspend_pagedir_t *next, *cur;
+ pgdir_for_each_safe(cur, next, head) {
+ free_page((unsigned long)cur);
+ }
+}
+
+/* for_each_pbe_copy_back
+ *
+ * That usefuly for help us writing the code in assemble code.
+ *
+ */
+/*#define CREATE_ASM_CODE */
+#ifdef CREATE_ASM_CODE
+#if 0
+#define GET_ADDRESS(x) __pa(x)
+#else
+#define GET_ADDRESS(x) (x)
+#endif
+asmlinkage void for_each_pbe_copy_back(void)
+{
+ struct pbe *pgdir, *next;
+
+ pgdir = pagedir_nosave;
+ while (pgdir != NULL) {
+ unsigned long nums, i;
+ pgdir = (struct pbe *)GET_ADDRESS(pgdir);
+ next = (struct pbe*)pgdir->dummy.val;
+ for (nums = 0; nums < ONE_PAGE_PBE_NUM; nums++) {
+ register unsigned long *orig, *copy;
+ orig = (unsigned long *)pgdir->orig_address;
+ if (orig == 0) goto end;
+ orig = (unsigned long *)GET_ADDRESS(orig);
+ copy = (unsigned long *)GET_ADDRESS(pgdir->address);
+#if 0
+ memcpy(orig, copy, PAGE_SIZE);
+#else
+ for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i+=4) {
+ *(orig + i) = *(copy + i);
+ *(orig + i+1) = *(copy + i+1);
+ *(orig + i+2) = *(copy + i+2);
+ *(orig + i+3) = *(copy + i+3);
+ }
+#endif
+ pgdir ++;
+ }
+ pgdir = next;
+ }
+end:
+ panic("just asm code");
+}
+#endif

+/*
+ * find_pbe_by_index -
+ * @pgdir: the pgdir head
+ * @index:
+ *
+ * @return:
+ */
+static struct pbe *find_pbe_by_index(struct pbe *pgdir, int index)
+{
+ unsigned long p = 0;
+ struct pbe *pbe, *next;

+ pr_debug("find_pbe_by_index: %p, 0x%03x", pgdir, index);
+ pgdir_for_each_safe(pbe, next, pgdir) {
+ if (p == index / ONE_PAGE_PBE_NUM) {
+ pbe = (struct pbe *)((unsigned long)pbe +
+ (index % ONE_PAGE_PBE_NUM) * sizeof(struct pbe));
+ pr_debug(" %p, o{%p} c{%p}\n",
+ pbe, (void*)pbe->orig_address, (void*)pbe->address);
+ return pbe;
+ }
+ p ++;
+ }
+ return (NULL);
+}

/**
* write_swap_page - Write one page to a fresh swap location.
@@ -257,7 +353,6 @@ static int write_page(unsigned long addr
return error;
}

-
/**
* data_free - Free the swap entries used by the saved image.
*
@@ -267,43 +362,82 @@ static int write_page(unsigned long addr

static void data_free(void)
{
- swp_entry_t entry;
- int i;
+ int index;
+ struct pbe *pos, *next;

- for (i = 0; i < nr_copy_pages; i++) {
- entry = (pagedir_nosave + i)->swap_address;
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_nosave) {
+ swp_entry_t entry;
+
+ entry = pos->swap_address;
if (entry.val)
swap_free(entry);
- else
- break;
- (pagedir_nosave + i)->swap_address = (swp_entry_t){0};
+ pos->swap_address = (swp_entry_t){0};
}
}

+static int mod_progress = 1;
+
+static void inline mod_printk_progress(int i)
+{
+ if (mod_progress == 0) mod_progress = 1;
+ if (!(i%100))
+ printk( "\b\b\b\b%3d%%", i / mod_progress );
+}
+
+static int write_one_pbe(struct pbe *p, void *data, int cur)
+{
+ int error = 0;
+
+ mod_printk_progress(cur);
+
+ pr_debug("write_one_pbe: %p, o{%p} c{%p} %d ",
+ p, (void *)p->orig_address, (void *)p->address, cur);
+ error = write_page((unsigned long)data, &p->swap_address);
+ if (error) return error;
+
+ pr_debug("%lu\n", swp_offset(p->swap_address));
+
+ return 0;
+}
+
+static int bio_read_page(pgoff_t page_off, void * page);
+
+static int read_one_pbe(struct pbe *p, void *data, int cur)
+{
+ int error = 0;
+
+ mod_printk_progress(cur);
+
+ pr_debug("read_one_pbe: %p, o{%p} c{%p} %lu\n",
+ p, (void *)p->orig_address, data,
+ swp_offset(p->swap_address));
+
+ error = bio_read_page(swp_offset(p->swap_address), data);
+ if (error) return error;
+
+ return 0;
+}

/**
* data_write - Write saved image to swap.
*
* Walk the list of pages in the image and sync each one to swap.
*/
-
static int data_write(void)
{
- int error = 0;
- int i;
- unsigned int mod = nr_copy_pages / 100;
-
- if (!mod)
- mod = 1;
+ int error = 0, index;
+ struct pbe *pos, *next;
+
+ mod_progress = nr_copy_pages / 100;

- printk( "Writing data to swap (%d pages)... ", nr_copy_pages );
- for (i = 0; i < nr_copy_pages && !error; i++) {
- if (!(i%mod))
- printk( "\b\b\b\b%3d%%", i / mod );
- error = write_page((pagedir_nosave+i)->address,
- &((pagedir_nosave+i)->swap_address));
+ printk( "Writing data to swap (%d pages)... ", nr_copy_pages);
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_nosave) {
+ BUG_ON(pos->orig_address == 0);
+ error = write_one_pbe(pos, (void*)pos->address, index);
+ if (error) break;
}
printk("\b\b\b\bdone\n");
+
return error;
}

@@ -363,7 +497,6 @@ static void free_pagedir_entries(void)
swap_free(swsusp_info.pagedir[i]);
}

-
/**
* write_pagedir - Write the array of pages holding the page directory.
* @last: Last swap entry we write (needed for header).
@@ -371,15 +504,19 @@ static void free_pagedir_entries(void)

static int write_pagedir(void)
{
- unsigned long addr = (unsigned long)pagedir_nosave;
- int error = 0;
- int n = SUSPEND_PD_PAGES(nr_copy_pages);
- int i;
+ int error = 0, n = 0;
+ suspend_pagedir_t *pgdir, *next;

- swsusp_info.pagedir_pages = n;
+ pgdir_for_each_safe(pgdir, next, pagedir_nosave) {
+ error = write_page((unsigned long)pgdir, &swsusp_info.pagedir[n]);
+ if (error) {
+ break;
+ }
+ n++;
+ }
printk( "Writing pagedir (%d pages)\n", n);
- for (i = 0; i < n && !error; i++, addr += PAGE_SIZE)
- error = write_page(addr, &swsusp_info.pagedir[i]);
+ swsusp_info.pagedir_pages = n;
+
return error;
}

@@ -410,7 +547,6 @@ static int write_suspend_image(void)
goto Done;
}

-
#ifdef CONFIG_HIGHMEM
struct highmem_page {
char *data;
@@ -503,7 +639,526 @@ static int restore_highmem(void)
#endif
return 0;
}
+struct partial_device_tree *swsusp_dev_tree = NULL;
+
+static int free_suspend_device_tree(void)
+{
+ if (swsusp_dev_tree) {
+ device_merge_tree(swsusp_dev_tree, &default_device_tree);
+ device_destroy_tree(swsusp_dev_tree);
+ }
+ swsusp_dev_tree = NULL;
+ return 0;
+}
+
+static int setup_suspend_device_tree(void)
+{
+ struct class * class = NULL;
+
+ swsusp_dev_tree = device_create_tree();
+ if (IS_ERR(swsusp_dev_tree)) {
+ swsusp_dev_tree = NULL;
+ return -ENOMEM;
+ }
+ /* Now check for graphics class devices, so we can
+ * keep the display on while suspending */
+ class = class_find("graphics");
+ if (class) {
+ struct class_device * class_dev;
+ list_for_each_entry(class_dev, &class->children, node)
+ device_switch_trees(class_dev->dev, swsusp_dev_tree);
+ class_put(class);
+ }
+
+ return (0);
+}
+
+typedef int (*do_page_t)(struct page *page, int p);
+
+static int foreach_zone_page(struct zone *zone, do_page_t fun, int p)
+{
+ int inactive = 0, active = 0;
+
+ spin_lock_irq(&zone->lru_lock);
+ if (zone->nr_inactive) {
+ struct list_head * entry = zone->inactive_list.prev;
+ while (entry != &zone->inactive_list) {
+ if (fun) {
+ struct page * page = list_entry(entry, struct page, lru);
+ inactive += fun(page, p);
+ } else {
+ inactive ++;
+ }
+ entry = entry->prev;
+ }
+ }
+ if (zone->nr_active) {
+ struct list_head * entry = zone->active_list.prev;
+ while (entry != &zone->active_list) {
+ if (fun) {
+ struct page * page = list_entry(entry, struct page, lru);
+ active += fun(page, p);
+ } else {
+ active ++;
+ }
+ entry = entry->prev;
+ }
+ }
+ spin_unlock_irq(&zone->lru_lock);
+
+ return (active + inactive);
+}
+
+/* enable/disable pagecache suspend */
+int swsusp_pagecache = 0;
+
+/* I'll move this to include/linux/page-flags.h */
+#define PG_page_caches (PG_nosave_free + 1)

+#define SetPagePcs(page) set_bit(PG_page_caches, &(page)->flags)
+#define ClearPagePcs(page) clear_bit(PG_page_caches, &(page)->flags)
+#define PagePcs(page) test_bit(PG_page_caches, &(page)->flags)
+
+static suspend_pagedir_t *pagedir_cache = NULL;
+static int nr_copy_page_caches = 0;
+
+static int setup_page_caches_pe(struct page *page, int setup)
+{
+ unsigned long pfn = page_to_pfn(page);
+
+ BUG_ON(PageReserved(page) && PageNosave(page));
+ if (!pfn_valid(pfn)) {
+ printk("not valid page\n");
+ return 0;
+ }
+ if (PageNosave(page)) {
+ printk("nosave\n");
+ return 0;
+ }
+ if (PageReserved(page) /*&& pfn_is_nosave(pfn)*/) {
+ printk("[nosave]\n");
+ return 0;
+ }
+ if (PageSlab(page)) {
+ printk("slab\n");
+ return 0;
+ }
+ if (setup) {
+ struct pbe *p = find_pbe_by_index(pagedir_cache, nr_copy_page_caches);
+ BUG_ON(p == NULL);
+ p->address = (long)page_address(page);
+ BUG_ON(p->address == 0);
+ /*pr_debug("setup_page_caches: cur %p, o{%p}, d{%p}, nr %u\n",
+ (void*)p, (void*)p->orig_address,
+ (void*)p->address, nr_copy_page_caches);*/
+ nr_copy_page_caches ++;
+ }
+ SetPagePcs(page);
+
+ return (1);
+}
+
+static int count_page_caches(struct zone *zone, int p)
+{
+ if (swsusp_pagecache)
+ return foreach_zone_page(zone, setup_page_caches_pe, p);
+ return 0;
+}
+
+#define pointer2num(x) ((x - 0xc0000000) >> 12)
+#define num2pointer(x) ((x << 12) + 0xc0000000)
+
+static inline void collide_set_bit(unsigned char *bitmap,
+ unsigned long bitnum)
+{
+ bitnum = pointer2num(bitnum);
+ bitmap[bitnum / 8] |= (1 << (bitnum%8));
+}
+
+static inline int collide_is_bit_set(unsigned char *bitmap,
+ unsigned long bitnum)
+{
+ bitnum = pointer2num(bitnum);
+ return !!(bitmap[bitnum / 8] & (1 << (bitnum%8)));
+}
+
+static void collide_bitmap_free(unsigned char *bitmap)
+{
+ free_pages((unsigned long)bitmap, 2);
+}
+
+/*
+ * four pages are enough for bitmap
+ *
+ */
+static unsigned char *collide_bitmap_init(struct pbe *pgdir)
+{
+ unsigned char *bitmap =
+ (unsigned char *)__get_free_pages(GFP_ATOMIC | __GFP_COLD, 2);
+ struct pbe *next;
+
+ if (bitmap == NULL) {
+ return NULL;
+ }
+ memset(bitmap, 0, 4 * PAGE_SIZE);
+
+ /* do base check */
+ BUG_ON(collide_is_bit_set(bitmap, (unsigned long)bitmap) == 1);
+ collide_set_bit(bitmap, (unsigned long)bitmap);
+ BUG_ON(collide_is_bit_set(bitmap, (unsigned long)bitmap) == 0);
+
+ while (pgdir != NULL) {
+ unsigned long nums;
+ next = (struct pbe*)pgdir->dummy.val;
+ for (nums = 0; nums < ONE_PAGE_PBE_NUM; nums++) {
+ collide_set_bit(bitmap, (unsigned long)pgdir);
+ collide_set_bit(bitmap, (unsigned long)pgdir->orig_address);
+ pgdir ++;
+ }
+ pgdir = next;
+ }
+
+ return bitmap;
+}
+static void **eaten_memory = NULL;
+
+static void *swsusp_get_safe_free_page(unsigned char *collide)
+{
+ void *addr = NULL;
+ void **c = eaten_memory;
+
+ do {
+ if (addr) {
+ eaten_memory = (void**)addr;
+ *eaten_memory = c;
+ c = eaten_memory;
+ }
+ addr = (void*)__get_free_pages(GFP_ATOMIC | __GFP_COLD, 0);
+ if (!addr)
+ return NULL;
+ } while (collide && collide_is_bit_set(collide, (unsigned long)addr));
+
+ return addr;
+}
+/*
+ * redefine in PageCahe pagdir.
+ *
+ * struct pbe {
+ * unsigned long address;
+ * unsigned long orig_address; pointer of next struct pbe
+ * swp_entry_t swap_address;
+ * swp_entry_t dummy; current index
+ * }
+ *
+ */
+static suspend_pagedir_t * alloc_one_pagedir(suspend_pagedir_t *prev,
+ unsigned char *collide)
+{
+ suspend_pagedir_t *pgdir = NULL;
+ int i;
+
+ pgdir = (suspend_pagedir_t *)swsusp_get_safe_free_page(collide);
+
+ /*pr_debug("pgdir: %p, %p, %d\n",
+ pgdir, prev, sizeof(suspend_pagedir_t)); */
+ for (i = 0; i < ONE_PAGE_PBE_NUM; i++) {
+ pgdir[i].dummy.val = 0;
+ pgdir[i].address = 0;
+ pgdir[i].orig_address = 0;
+ if (prev)
+ prev[i].dummy.val= (unsigned long)pgdir;
+ }
+
+ return (pgdir);
+}
+
+/* calc_nums - Determine the nums of allocation needed for pagedir_save. */
+static int calc_nums(int nr_copy)
+{
+ int diff = 0, ret = 0;
+ do {
+ diff = (nr_copy / ONE_PAGE_PBE_NUM) - ret + 1;
+ if (diff) {
+ ret += diff;
+ nr_copy += diff;
+ }
+ } while (diff);
+ return nr_copy;
+}
+
+
+/*
+ * alloc_pagedir
+ *
+ * @param pbe
+ * @param pbe_nums
+ * @param collide
+ * @param page_nums
+ *
+ */
+static int alloc_pagedir(struct pbe **pbe, int pbe_nums,
+ unsigned char *collide, int page_nums)
+{
+ unsigned int nums = 0;
+ unsigned int after_alloc = pbe_nums;
+ suspend_pagedir_t *prev = NULL, *cur = NULL;
+
+ if (page_nums)
+ after_alloc = ONE_PAGE_PBE_NUM * page_nums;
+ else
+ after_alloc = calc_nums(after_alloc);
+
+ pr_debug("alloc_pagedir: %d, %d\n", pbe_nums, after_alloc);
+ for (nums = 0 ; nums < after_alloc ; nums += ONE_PAGE_PBE_NUM) {
+ cur = alloc_one_pagedir(prev, collide);
+ pr_debug("alloc_one_pagedir: %p\n", cur);
+ if (!cur) { /* get page failed */
+ goto no_mem;
+ }
+ if (nums == 0) { /* setup the head */
+ *pbe = cur;
+ }
+ prev = cur;
+ }
+ return after_alloc - pbe_nums;
+
+no_mem:
+ pagedir_free(*pbe);
+ *pbe = NULL;
+
+ return (-ENOMEM);
+}
+
+static char *page_cache_buf = NULL;
+static int alloc_pagecache_buf(void)
+{
+ page_cache_buf = (char *)__get_free_pages(GFP_ATOMIC /*| __GFP_NOWARN*/, 0);
+ if (!page_cache_buf) {
+ /* FIXME try shrink memory */
+ return -ENOMEM;
+ }
+ return 0;
+}
+static int free_pagecache_buf(void)
+{
+ free_page((unsigned long)page_cache_buf);
+ return 0;
+}
+
+int swsusp_post_resume(void)
+{
+ int error = 0, index;
+ struct pbe *pos, *next;
+
+#ifdef CONFIG_PREEMPT
+ preempt_enable();
+#endif
+ if (swsusp_pagecache == 0) {
+ goto end;
+ }
+
+ local_irq_disable();
+ dpm_power_up_tree(swsusp_dev_tree);
+ local_irq_enable();
+ device_resume_tree(swsusp_dev_tree);
+
+ mod_progress = nr_copy_page_caches / 100;
+
+ printk( "Reading PageCaches from swap (%d pages)... ",
+ nr_copy_page_caches);
+ pbe_for_each_safe(pos, next, index, nr_copy_page_caches,
+ pagedir_cache) {
+ swp_entry_t entry;
+
+ error = read_one_pbe(pos, page_cache_buf, index);
+ if (error) break;
+ memcpy((void*)pos->address, page_cache_buf, PAGE_SIZE);
+ entry = pos->swap_address;
+ if (entry.val)
+ swap_free(entry);
+ }
+ printk("\b\b\b\bdone\n");
+
+ free_pagecache_buf();
+end:
+ local_irq_disable();
+ dpm_power_up_tree(&default_device_tree);
+ local_irq_enable();
+ device_resume_tree(&default_device_tree);
+ device_resume_tree(&default_device_tree);
+ free_suspend_device_tree();
+
+ return error;
+}
+
+static int page_caches_write(void)
+{
+ int error = 0, index;
+ struct pbe *pos, *next;
+
+ mod_progress = nr_copy_page_caches / 100;
+
+ printk( "Writing PageCaches to swap (%d pages)... ",
+ nr_copy_page_caches);
+ pbe_for_each_safe(pos, next, index, nr_copy_page_caches,
+ pagedir_cache) {
+ memcpy(page_cache_buf, (void*)pos->address, PAGE_SIZE);
+ error = write_one_pbe(pos, page_cache_buf, index);
+ if (error) break;
+ }
+ printk("\b\b\b\bdone\n");
+
+ return error;
+}
+
+static int setup_pagedir_pbe(void)
+{
+ struct zone *zone;
+
+ nr_copy_page_caches = 0;
+ for_each_zone(zone) {
+ if (!is_highmem(zone)) {
+ count_page_caches(zone, 1);
+ }
+ }
+
+ return 0;
+}
+
+static void count_data_pages(void);
+static int swsusp_alloc(void);
+
+static int page_caches_recal(int resume)
+{
+ struct zone *zone;
+ int i;
+
+ if (swsusp_pagecache == 0 || resume == 1) return 0;
+
+ for (i = 0; i < max_mapnr; i++)
+ ClearPagePcs(mem_map+i);
+
+ nr_copy_page_caches = 0;
+ drain_local_pages();
+ for_each_zone(zone) {
+ if (!is_highmem(zone)) {
+ nr_copy_page_caches += count_page_caches(zone, 0);
+ }
+ }
+ i = calc_nums(nr_copy_page_caches);
+
+ return (i / ONE_PAGE_PBE_NUM + 1);
+}
+
+static int inline swsusp_need_pages(int resume)
+{
+ return nr_copy_pages + page_caches_recal(resume) + PAGES_FOR_IO;
+}
+
+static int swsusp_check_memory(int resume)
+{
+ int retry = 10 * 5; /* wait no memory can swap for 20 sec */
+
+ if (!resume) {
+ count_data_pages();
+ }
+
+ printk("swsusp: need %d pages, freed %d pages, shrinking ",
+ swsusp_need_pages(resume), nr_free_pages());
+ if (nr_free_pages() > swsusp_need_pages(resume)) {
+ printk(" done\n");
+ return 0;
+ }
+
+ do {
+ int diff = swsusp_need_pages(resume) - nr_free_pages();
+
+ if (diff < 0) break;
+ if (shrink_all_memory(diff * 2) == 0) {
+ retry --;
+ }
+ current->state = TASK_INTERRUPTIBLE;
+ schedule_timeout(HZ/5);
+ if (!resume) {
+ drain_local_pages();
+ count_data_pages();
+ }
+ printk("\b\b\b\b\b%5d", diff);
+ } while (retry);
+
+ printk("\nswsusp: need %d pages, freed %d pages ... ",
+ swsusp_need_pages(resume), nr_free_pages());
+
+ if (nr_free_pages() < swsusp_need_pages(resume)) {
+ printk(" failed\n");
+ return -ENOMEM;
+ }
+ printk(" done\n");
+
+ return 0;
+}
+
+int swsusp_prepare_suspend(void)
+{
+ int error = 0;
+
+ if ((error = setup_suspend_device_tree())) {
+ return error;
+ }
+ if (swsusp_check_memory(0)) {
+ free_suspend_device_tree();
+ return -ENOMEM;
+ }
+ /* exept swap device and parent from the tree */
+ if ((error = swsusp_swap_check(swsusp_dev_tree))) {
+ free_suspend_device_tree();
+ return error;
+ }
+
+ /* power all device execpt swap device and the parent */
+ BUG_ON(irqs_disabled());
+ device_suspend_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_disable();
+ device_power_down_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_enable();
+
+ if (swsusp_pagecache) {
+ if ((error = alloc_pagecache_buf())) {
+ swsusp_pagecache = 0;
+ }
+ }
+ if (swsusp_pagecache) {
+ if (alloc_pagedir(&pagedir_cache, nr_copy_page_caches, NULL, 0) < 0)
+ swsusp_pagecache = 0;
+ }
+
+ drain_local_pages();
+ count_data_pages();
+ error = swsusp_alloc();
+ if (error) {
+ printk("swsusp_alloc failed, %d\n", error);
+ free_suspend_device_tree();
+ return error;
+ }
+
+ drain_local_pages();
+ count_data_pages();
+ printk("swsusp: need to copy %u pages, %u page_caches\n",
+ nr_copy_pages, nr_copy_page_caches);
+
+ if (swsusp_pagecache) {
+ setup_pagedir_pbe();
+ pr_debug("after setup_pagedir_pbe \n");
+
+ error = page_caches_write();
+ if (error) {
+ free_suspend_device_tree();
+ return error;
+ }
+ }
+
+ return 0;
+}

static int pfn_is_nosave(unsigned long pfn)
{
@@ -539,7 +1194,10 @@ static int saveable(struct zone * zone,
}
if (PageNosaveFree(page))
return 0;
-
+ if (PagePcs(page) && swsusp_pagecache) {
+ BUG_ON(zone->nr_inactive == 0 && zone->nr_active == 0);
+ return 0;
+ }
return 1;
}

@@ -559,12 +1217,10 @@ static void count_data_pages(void)
}
}

-
static void copy_data_pages(void)
{
struct zone *zone;
unsigned long zone_pfn;
- struct pbe * pbe = pagedir_nosave;
int pages_copied = 0;

for_each_zone(zone) {
@@ -574,11 +1230,16 @@ static void copy_data_pages(void)
for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
if (saveable(zone, &zone_pfn)) {
struct page * page;
+ struct pbe * pbe = find_pbe_by_index(pagedir_nosave,
+ pages_copied);
+ BUG_ON(pbe == NULL);
+ if (pbe->address == 0)
+ panic("copy_data_pages: %d copied\n", pages_copied);
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
pbe->orig_address = (long) page_address(page);
+ BUG_ON(pbe->orig_address == 0);
/* copy_page is not usable for copying task structs. */
memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
- pbe++;
pages_copied++;
}
}
@@ -587,85 +1248,18 @@ static void copy_data_pages(void)
nr_copy_pages = pages_copied;
}

-
-/**
- * calc_order - Determine the order of allocation needed for pagedir_save.
- *
- * This looks tricky, but is just subtle. Please fix it some time.
- * Since there are %nr_copy_pages worth of pages in the snapshot, we need
- * to allocate enough contiguous space to hold
- * (%nr_copy_pages * sizeof(struct pbe)),
- * which has the saved/orig locations of the page..
- *
- * SUSPEND_PD_PAGES() tells us how many pages we need to hold those
- * structures, then we call get_bitmask_order(), which will tell us the
- * last bit set in the number, starting with 1. (If we need 30 pages, that
- * is 0x0000001e in hex. The last bit is the 5th, which is the order we
- * would use to allocate 32 contiguous pages).
- *
- * Since we also need to save those pages, we add the number of pages that
- * we need to nr_copy_pages, and in case of an overflow, do the
- * calculation again to update the number of pages needed.
- *
- * With this model, we will tend to waste a lot of memory if we just cross
- * an order boundary. Plus, the higher the order of allocation that we try
- * to do, the more likely we are to fail in a low-memory situtation
- * (though we're unlikely to get this far in such a case, since swsusp
- * requires half of memory to be free anyway).
- */
-
-
-static void calc_order(void)
-{
- int diff = 0;
- int order = 0;
-
- do {
- diff = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages)) - order;
- if (diff) {
- order += diff;
- nr_copy_pages += 1 << diff;
- }
- } while(diff);
- pagedir_order = order;
-}
-
-
-/**
- * alloc_pagedir - Allocate the page directory.
- *
- * First, determine exactly how many contiguous pages we need and
- * allocate them.
- */
-
-static int alloc_pagedir(void)
-{
- calc_order();
- pagedir_save = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC | __GFP_COLD,
- pagedir_order);
- if (!pagedir_save)
- return -ENOMEM;
- memset(pagedir_save, 0, (1 << pagedir_order) * PAGE_SIZE);
- pagedir_nosave = pagedir_save;
- return 0;
-}
-
/**
* free_image_pages - Free pages allocated for snapshot
*/
-
static void free_image_pages(void)
{
- struct pbe * p;
- int i;
+ struct pbe *pos, *next;
+ int index;

- p = pagedir_save;
- for (i = 0, p = pagedir_save; i < nr_copy_pages; i++, p++) {
- if (p->address) {
- ClearPageNosave(virt_to_page(p->address));
- free_page(p->address);
- p->address = 0;
- }
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_save) {
+ ClearPageNosave(virt_to_page(pos->address));
+ free_page(pos->address);
+ pos->address = 0;
}
}

@@ -673,17 +1267,16 @@ static void free_image_pages(void)
* alloc_image_pages - Allocate pages for the snapshot.
*
*/
-
static int alloc_image_pages(void)
{
- struct pbe * p;
- int i;
+ struct pbe *pos, *next;
+ int index;

- for (i = 0, p = pagedir_save; i < nr_copy_pages; i++, p++) {
- p->address = get_zeroed_page(GFP_ATOMIC | __GFP_COLD);
- if (!p->address)
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_save) {
+ pos->address = (unsigned long)get_zeroed_page(GFP_ATOMIC | __GFP_COLD);
+ if (!pos->address)
return -ENOMEM;
- SetPageNosave(virt_to_page(p->address));
+ SetPageNosave(virt_to_page(pos->address));
}
return 0;
}
@@ -693,28 +1286,9 @@ void swsusp_free(void)
BUG_ON(PageNosave(virt_to_page(pagedir_save)));
BUG_ON(PageNosaveFree(virt_to_page(pagedir_save)));
free_image_pages();
- free_pages((unsigned long) pagedir_save, pagedir_order);
+ pagedir_free(pagedir_save);
}

-
-/**
- * enough_free_mem - Make sure we enough free memory to snapshot.
- *
- * Returns TRUE or FALSE after checking the number of available
- * free pages.
- */
-
-static int enough_free_mem(void)
-{
- if (nr_free_pages() < (nr_copy_pages + PAGES_FOR_IO)) {
- pr_debug("swsusp: Not enough free pages: Have %d\n",
- nr_free_pages());
- return 0;
- }
- return 1;
-}
-
-
/**
* enough_swap - Make sure we have enough swap to save the image.
*
@@ -730,7 +1304,7 @@ static int enough_swap(void)
struct sysinfo i;

si_swapinfo(&i);
- if (i.freeswap < (nr_copy_pages + PAGES_FOR_IO)) {
+ if (i.freeswap < (nr_copy_pages + nr_copy_page_caches + PAGES_FOR_IO)) {
pr_debug("swsusp: Not enough swap. Need %ld\n",i.freeswap);
return 0;
}
@@ -741,34 +1315,30 @@ static int swsusp_alloc(void)
{
int error;

- pr_debug("suspend: (pages needed: %d + %d free: %d)\n",
- nr_copy_pages, PAGES_FOR_IO, nr_free_pages());
-
pagedir_nosave = NULL;
- if (!enough_free_mem())
- return -ENOMEM;

if (!enough_swap())
return -ENOSPC;
-
- if ((error = alloc_pagedir())) {
- pr_debug("suspend: Allocating pagedir failed.\n");
- return error;
+ error = alloc_pagedir(&pagedir_save, nr_copy_pages, NULL, 0);
+ if (error < 0) {
+ printk("suspend: Allocating pagedir failed.\n");
+ return -ENOMEM;
}
+ pr_debug("alloc_pagedir: addon %d\n", error);
+ nr_copy_pages += error;
if ((error = alloc_image_pages())) {
- pr_debug("suspend: Allocating image pages failed.\n");
+ printk("suspend: Allocating image pages failed.\n");
swsusp_free();
return error;
}
+ pagedir_nosave = pagedir_save;

- pagedir_order_check = pagedir_order;
return 0;
}

int suspend_prepare_image(void)
{
- unsigned int nr_needed_pages;
- int error;
+ BUG_ON(!irqs_disabled());

pr_debug("swsusp: critical section: \n");
if (save_highmem()) {
@@ -777,15 +1347,6 @@ int suspend_prepare_image(void)
return -ENOMEM;
}

- drain_local_pages();
- count_data_pages();
- printk("swsusp: Need to copy %u pages\n",nr_copy_pages);
- nr_needed_pages = nr_copy_pages + PAGES_FOR_IO;
-
- error = swsusp_alloc();
- if (error)
- return error;
-
/* During allocating of suspend pagedir, new cold pages may appear.
* Kill them.
*/
@@ -811,7 +1372,6 @@ int suspend_prepare_image(void)
int swsusp_write(void)
{
int error;
- device_resume();
lock_swapdevices();
error = write_suspend_image();
/* This will unlock ignored swap devices since writing is finished */
@@ -820,17 +1380,11 @@ int swsusp_write(void)

}

-
extern asmlinkage int swsusp_arch_suspend(void);
extern asmlinkage int swsusp_arch_resume(void);

-
asmlinkage int swsusp_save(void)
{
- int error = 0;
-
- if ((error = swsusp_swap_check()))
- return error;
return suspend_prepare_image();
}

@@ -839,34 +1393,66 @@ int swsusp_suspend(void)
int error;
if ((error = arch_prepare_suspend()))
return error;
+
+ BUG_ON(irqs_disabled());
+ /* suspend swap device */
+ device_suspend_tree(PMSG_FREEZE, swsusp_dev_tree);
+
+ mb();
+ barrier();
+
+#ifdef CONFIG_PREEMPT
+ preempt_disable();
+#endif
local_irq_disable();
+ device_power_down_tree(PMSG_FREEZE, swsusp_dev_tree);
sysdev_suspend(PMSG_FREEZE);
+
save_processor_state();
error = swsusp_arch_suspend();
/* Restore control flow magically appears here */
restore_processor_state();
restore_highmem();
+
+ BUG_ON(!irqs_disabled());
sysdev_resume();
+
+ dpm_power_up_tree(swsusp_dev_tree);
local_irq_enable();
+ device_resume_tree(swsusp_dev_tree);
+
return error;
}


asmlinkage int swsusp_restore(void)
{
- BUG_ON (pagedir_order_check != pagedir_order);
-
/* Even mappings of "global" things (vmalloc) need to be fixed */
+#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
__flush_tlb_global();
wbinvd(); /* Nigel says wbinvd here is good idea... */
+#endif
return 0;
}

int swsusp_resume(void)
{
int error;
+
+ /* power all device execpt swap device and the parent */
+ BUG_ON(irqs_disabled());
+ device_suspend_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_disable();
+ device_power_down_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_enable();
+
+#ifdef CONFIG_PREEMPT
+ preempt_disable();
+#endif
+
local_irq_disable();
sysdev_suspend(PMSG_FREEZE);
+
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
error = swsusp_arch_resume();
@@ -881,99 +1467,6 @@ int swsusp_resume(void)
return error;
}

-
-
-/* More restore stuff */
-
-#define does_collide(addr) does_collide_order(pagedir_nosave, addr, 0)
-
-/*
- * Returns true if given address/order collides with any orig_address
- */
-static int __init does_collide_order(suspend_pagedir_t *pagedir, unsigned long addr,
- int order)
-{
- int i;
- unsigned long addre = addr + (PAGE_SIZE<<order);
-
- for (i=0; i < nr_copy_pages; i++)
- if ((pagedir+i)->orig_address >= addr &&
- (pagedir+i)->orig_address < addre)
- return 1;
-
- return 0;
-}
-
-/*
- * We check here that pagedir & pages it points to won't collide with pages
- * where we're going to restore from the loaded pages later
- */
-static int __init check_pagedir(void)
-{
- int i;
-
- for(i=0; i < nr_copy_pages; i++) {
- unsigned long addr;
-
- do {
- addr = get_zeroed_page(GFP_ATOMIC);
- if(!addr)
- return -ENOMEM;
- } while (does_collide(addr));
-
- (pagedir_nosave+i)->address = addr;
- }
- return 0;
-}
-
-static int __init swsusp_pagedir_relocate(void)
-{
- /*
- * We have to avoid recursion (not to overflow kernel stack),
- * and that's why code looks pretty cryptic
- */
- suspend_pagedir_t *old_pagedir = pagedir_nosave;
- void **eaten_memory = NULL;
- void **c = eaten_memory, *m, *f;
- int ret = 0;
-
- printk("Relocating pagedir ");
-
- if (!does_collide_order(old_pagedir, (unsigned long)old_pagedir, pagedir_order)) {
- printk("not necessary\n");
- return check_pagedir();
- }
-
- while ((m = (void *) __get_free_pages(GFP_ATOMIC, pagedir_order)) != NULL) {
- if (!does_collide_order(old_pagedir, (unsigned long)m, pagedir_order))
- break;
- eaten_memory = m;
- printk( "." );
- *eaten_memory = c;
- c = eaten_memory;
- }
-
- if (!m) {
- printk("out of memory\n");
- ret = -ENOMEM;
- } else {
- pagedir_nosave =
- memcpy(m, old_pagedir, PAGE_SIZE << pagedir_order);
- }
-
- c = eaten_memory;
- while (c) {
- printk(":");
- f = c;
- c = *c;
- free_pages((unsigned long)f, pagedir_order);
- }
- if (ret)
- return ret;
- printk("|\n");
- return check_pagedir();
-}
-
/**
* Using bio to read from swap.
* This code requires a bit more work than just using buffer heads
@@ -1038,12 +1531,12 @@ static int submit(int rw, pgoff_t page_o
return error;
}

-int bio_read_page(pgoff_t page_off, void * page)
+static int bio_read_page(pgoff_t page_off, void * page)
{
return submit(READ, page_off, page);
}

-int bio_write_page(pgoff_t page_off, void * page)
+static int bio_write_page(pgoff_t page_off, void * page)
{
return submit(WRITE, page_off, page);
}
@@ -1088,7 +1581,6 @@ static int __init check_header(void)
return -EPERM;
}
nr_copy_pages = swsusp_info.image_pages;
- pagedir_order = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages));
return error;
}

@@ -1115,62 +1607,167 @@ static int __init check_sig(void)
return error;
}

+
+static void __init eat_progress(void)
+{
+ char *eaten_progess = "-\\|/";
+ static int eaten_i = 0;
+
+ printk("\b%c", eaten_progess[eaten_i]);
+ eaten_i ++;
+ if (eaten_i > 3) eaten_i = 0;
+}
+
+static int __init check_one_pbe(struct pbe *p, void *collide, int cur)
+{
+ unsigned long addr = 0;
+
+ pr_debug("check_one_pbe: %p %lu o{%p} ",
+ p, p->swap_address.val, (void*)p->orig_address);
+ addr = (unsigned long)swsusp_get_safe_free_page(collide);
+ if(!addr)
+ return -ENOMEM;
+ pr_debug("c{%p} done\n", (void*)addr);
+ p->address = addr;
+
+ return 0;
+}
+
+static void __init swsusp_copy_pagedir(suspend_pagedir_t *d_pgdir,
+ suspend_pagedir_t *s_pgdir)
+{
+ int i = 0;
+
+ while (s_pgdir != NULL) {
+ suspend_pagedir_t *s_next = (suspend_pagedir_t *)s_pgdir->dummy.val;
+ suspend_pagedir_t *d_next = (suspend_pagedir_t *)d_pgdir->dummy.val;
+ for (i = 0; i < ONE_PAGE_PBE_NUM; i++) {
+ d_pgdir->address = s_pgdir->address;
+ d_pgdir->orig_address = s_pgdir->orig_address;
+ d_pgdir->swap_address = s_pgdir->swap_address;
+ s_pgdir ++; d_pgdir ++;
+ }
+ d_pgdir = d_next;
+ s_pgdir = s_next;
+ };
+}
+/*
+ * We check here that pagedir & pages it points to won't collide with pages
+ * where we're going to restore from the loaded pages later
+ */
+static int __init check_pagedir(void)
+{
+ void **c, *f;
+ struct pbe *next, *pos;
+ int error, index;
+ suspend_pagedir_t *addr = NULL;
+ unsigned char *bitmap = collide_bitmap_init(pagedir_nosave);
+
+ BUG_ON(bitmap == NULL);
+
+ printk("Relocating pagedir ... ");
+ error = alloc_pagedir(&addr, nr_copy_pages, bitmap,
+ swsusp_info.pagedir_pages);
+ if (error < 0) {
+ return error;
+ }
+ swsusp_copy_pagedir(addr, pagedir_nosave);
+ pagedir_free(pagedir_nosave);
+
+ /* check copy address */
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, addr) {
+ error = check_one_pbe(pos, bitmap, index);
+ BUG_ON(error);
+ }
+
+ /* free eaten memory */
+ c = eaten_memory;
+ while (c) {
+ eat_progress();
+ f = c;
+ c = *c;
+ free_pages((unsigned long)f, 0);
+ }
+ /* free unused memory */
+ collide_bitmap_free(bitmap);
+ printk(" done\n");
+
+ pagedir_nosave = addr;
+
+ return 0;
+}
+
/**
* swsusp_read_data - Read image pages from swap.
*
- * You do not need to check for overlaps, check_pagedir()
- * already did that.
*/
-
static int __init data_read(void)
{
- struct pbe * p;
- int error;
- int i;
- int mod = nr_copy_pages / 100;
-
- if (!mod)
- mod = 1;
+ int error = 0, index;
+ struct pbe *pos, *next;

- if ((error = swsusp_pagedir_relocate()))
+ if ((error = swsusp_check_memory(1))) {
return error;
+ }
+
+ if ((error = check_pagedir())) {
+ return -ENOMEM;
+ }
+
+ mod_progress = nr_copy_pages / 100;

printk( "Reading image data (%d pages): ", nr_copy_pages );
- for(i = 0, p = pagedir_nosave; i < nr_copy_pages && !error; i++, p++) {
- if (!(i%mod))
- printk( "\b\b\b\b%3d%%", i / mod );
- error = bio_read_page(swp_offset(p->swap_address),
- (void *)p->address);
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_nosave) {
+ error = read_one_pbe(pos, (void*)pos->address, index);
+ if (error) break;
}
- printk(" %d done.\n",i);
- return error;
+ printk(" %d done.\n", index);

+ return error;
}

extern dev_t __init name_to_dev_t(const char *line);

-static int __init read_pagedir(void)
+static int __init read_one_pagedir(suspend_pagedir_t *pgdir, int i)
{
- unsigned long addr;
- int i, n = swsusp_info.pagedir_pages;
+ unsigned long offset = swp_offset(swsusp_info.pagedir[i]);
+ unsigned long next;
int error = 0;

- addr = __get_free_pages(GFP_ATOMIC, pagedir_order);
- if (!addr)
- return -ENOMEM;
- pagedir_nosave = (struct pbe *)addr;
+ next = pgdir->dummy.val;
+ pr_debug("read_one_pagedir: %p, %d, %lu, %p\n",
+ pgdir, i, offset, (void*)next);
+ if ((error = bio_read_page(offset, (void *)pgdir))) {
+ return error;
+ }
+ pgdir->dummy.val = next;

- pr_debug("pmdisk: Reading pagedir (%d Pages)\n",n);
+ return error;
+}

- for (i = 0; i < n && !error; i++, addr += PAGE_SIZE) {
- unsigned long offset = swp_offset(swsusp_info.pagedir[i]);
- if (offset)
- error = bio_read_page(offset, (void *)addr);
- else
- error = -EFAULT;
- }
- if (error)
- free_pages((unsigned long)pagedir_nosave, pagedir_order);
+/*
+ * reading pagedir from swap device
+ */
+static int __init read_pagedir(void)
+{
+ int i = 0, n = swsusp_info.pagedir_pages;
+ int error = 0;
+ suspend_pagedir_t *pgdir, *next;
+
+ error = alloc_pagedir(&pagedir_nosave, nr_copy_pages, NULL, n);
+ if (error < 0)
+ return -ENOMEM;
+
+ printk("pmdisk: Reading pagedir (%d Pages)\n",n);
+ pgdir_for_each_safe(pgdir, next, pagedir_nosave) {
+ error = read_one_pagedir(pgdir, i);
+ if (error) break;
+ i++;
+ }
+ BUG_ON(i != n);
+ if (error)
+ pagedir_free(pagedir_nosave);
+
return error;
}

@@ -1185,7 +1782,7 @@ static int __init read_suspend_image(voi
if ((error = read_pagedir()))
return error;
if ((error = data_read()))
- free_pages((unsigned long)pagedir_nosave, pagedir_order);
+ pagedir_free(pagedir_nosave);
return error;
}

@@ -1200,14 +1797,14 @@ int __init swsusp_read(void)
if (!strlen(resume_file))
return -ENOENT;

- resume_device = name_to_dev_t(resume_file);
+ swsusp_resume_device = name_to_dev_t(resume_file);
pr_debug("swsusp: Resume From Partition: %s\n", resume_file);

- resume_bdev = open_by_devnum(resume_device, FMODE_READ);
+ resume_bdev = open_by_devnum(swsusp_resume_device, FMODE_READ);
if (!IS_ERR(resume_bdev)) {
set_blocksize(resume_bdev, PAGE_SIZE);
error = read_suspend_image();
- blkdev_put(resume_bdev);
+ /* blkdev_put(resume_bdev); */
} else
error = PTR_ERR(resume_bdev);


--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-27 16:18:21

by Hu Gang

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

>
> SUSPEND all but swap device and parents
> WRITE LRU pages
> SUSPEND swap device and parents (+sysdev)
> Snapshot
> RESUME swap device and parents (+sysdev)
> WRITE snapshot
> SUSPEND swap device and parents
> POWERDOWN everything
>
-device-tree.diff-
diff -urp 2.6.9-lzf/drivers/base/class.c 2.6.9/drivers/base/class.c
--- 2.6.9-lzf/drivers/base/class.c 2004-11-25 14:13:02.000000000 +0800
+++ 2.6.9/drivers/base/class.c 2004-11-27 19:10:38.000000000 +0800
@@ -465,6 +465,25 @@ void class_device_put(struct class_devic
kobject_put(&class_dev->kobj);
}

+struct class * class_find(char * name)
+{
+ struct class * this_class;
+
+ if (!name)
+ return NULL;
+
+ down_read(&class_subsys.rwsem);
+ list_for_each_entry(this_class, &class_subsys.kset.list, subsys.kset.kobj.entry) {
+ if (!(strcmp(this_class->name, name))) {
+ class_get(this_class);
+ up_read(&class_subsys.rwsem);
+ return this_class;
+ }
+ }
+ up_read(&class_subsys.rwsem);
+
+ return NULL;
+}

int class_interface_register(struct class_interface *class_intf)
{
@@ -547,3 +566,5 @@ EXPORT_SYMBOL(class_device_remove_file);

EXPORT_SYMBOL(class_interface_register);
EXPORT_SYMBOL(class_interface_unregister);
+
+EXPORT_SYMBOL(class_find);
diff -urp 2.6.9-lzf/drivers/base/power/main.c 2.6.9/drivers/base/power/main.c
--- 2.6.9-lzf/drivers/base/power/main.c 2004-11-25 14:13:02.000000000 +0800
+++ 2.6.9/drivers/base/power/main.c 2004-11-27 18:38:54.000000000 +0800
@@ -4,6 +4,9 @@
* Copyright (c) 2003 Patrick Mochel
* Copyright (c) 2003 Open Source Development Lab
*
+ * Partial tree additions
+ * Copyright (c) 2004 Nigel Cunningham
+ *
* This file is released under the GPLv2
*
*
@@ -23,10 +26,18 @@
#include <linux/device.h>
#include "power.h"

-LIST_HEAD(dpm_active);
-LIST_HEAD(dpm_off);
-LIST_HEAD(dpm_off_irq);
-
+struct partial_device_tree default_device_tree =
+{
+ .dpm_active = LIST_HEAD_INIT(default_device_tree.dpm_active),
+ .dpm_off = LIST_HEAD_INIT(default_device_tree.dpm_off),
+ .dpm_off_irq = LIST_HEAD_INIT(default_device_tree.dpm_off_irq),
+};
+EXPORT_SYMBOL(default_device_tree);
+
+/*
+ * One mutex for all trees because we can be moving items
+ * between trees.
+ */
DECLARE_MUTEX(dpm_sem);

/*
@@ -76,7 +87,9 @@ int device_pm_add(struct device * dev)
dev->bus ? dev->bus->name : "No Bus", dev->kobj.name);
atomic_set(&dev->power.pm_users, 0);
down(&dpm_sem);
- list_add_tail(&dev->power.entry, &dpm_active);
+ list_add_tail(&dev->power.entry, &default_device_tree.dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
+ dev->tree = &default_device_tree;
device_pm_set_parent(dev, dev->parent);
if ((error = dpm_sysfs_add(dev)))
list_del(&dev->power.entry);
@@ -92,6 +105,7 @@ void device_pm_remove(struct device * de
dpm_sysfs_remove(dev);
device_pm_release(dev->power.pm_parent);
list_del(&dev->power.entry);
+ dev->current_list = DEVICE_LIST_NONE;
up(&dpm_sem);
}

diff -urp 2.6.9-lzf/drivers/base/power/Makefile 2.6.9/drivers/base/power/Makefile
--- 2.6.9-lzf/drivers/base/power/Makefile 2004-11-25 14:13:03.000000000 +0800
+++ 2.6.9/drivers/base/power/Makefile 2004-11-27 18:38:54.000000000 +0800
@@ -1,5 +1,5 @@
obj-y := shutdown.o
-obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o
+obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o tree.o

ifeq ($(CONFIG_DEBUG_DRIVER),y)
EXTRA_CFLAGS += -DDEBUG
diff -urp 2.6.9-lzf/drivers/base/power/power.h 2.6.9/drivers/base/power/power.h
--- 2.6.9-lzf/drivers/base/power/power.h 2004-11-27 17:33:21.000000000 +0800
+++ 2.6.9/drivers/base/power/power.h 2004-11-27 18:38:54.000000000 +0800
@@ -30,10 +30,22 @@ extern struct semaphore dpm_sem;
/*
* The PM lists.
*/
-extern struct list_head dpm_active;
-extern struct list_head dpm_off;
-extern struct list_head dpm_off_irq;

+struct partial_device_tree
+{
+ struct list_head dpm_active;
+ struct list_head dpm_off;
+ struct list_head dpm_off_irq;
+};
+
+enum {
+ DEVICE_LIST_NONE,
+ DEVICE_LIST_DPM_ACTIVE,
+ DEVICE_LIST_DPM_OFF,
+ DEVICE_LIST_DPM_OFF_IRQ,
+};
+
+extern struct partial_device_tree default_device_tree;

static inline struct dev_pm_info * to_pm_info(struct list_head * entry)
{
@@ -59,7 +71,9 @@ extern void dpm_sysfs_remove(struct devi
* resume.c
*/

+extern void dpm_resume_tree(struct partial_device_tree * tree);
extern void dpm_resume(void);
+extern void dpm_power_up_tree(struct partial_device_tree * tree);
extern void dpm_power_up(void);
extern int resume_device(struct device *);

diff -urp 2.6.9-lzf/drivers/base/power/resume.c 2.6.9/drivers/base/power/resume.c
--- 2.6.9-lzf/drivers/base/power/resume.c 2004-11-27 17:33:21.000000000 +0800
+++ 2.6.9/drivers/base/power/resume.c 2004-11-27 18:56:49.000000000 +0800
@@ -29,20 +29,25 @@ int resume_device(struct device * dev)



-void dpm_resume(void)
+void dpm_resume_tree(struct partial_device_tree * tree)
{
- while(!list_empty(&dpm_off)) {
- struct list_head * entry = dpm_off.next;
+ while(!list_empty(&tree->dpm_off)) {
+ struct list_head * entry = tree->dpm_off.next;
struct device * dev = to_device(entry);
list_del_init(entry);

if (dev->power.prev_state == PMSG_ON)
resume_device(dev);

- list_add_tail(entry, &dpm_active);
+ list_add_tail(entry, &tree->dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
}
}

+void dpm_resume(void)
+{
+ dpm_resume_tree(&default_device_tree);
+}

/**
* device_resume - Restore state of each device in system.
@@ -60,6 +65,14 @@ void device_resume(void)

EXPORT_SYMBOL(device_resume);

+void device_resume_tree(struct partial_device_tree * tree)
+{
+ down(&dpm_sem);
+ dpm_resume_tree(tree);
+ up(&dpm_sem);
+}
+
+EXPORT_SYMBOL(device_resume_tree);

/**
* device_power_up_irq - Power on some devices.
@@ -72,16 +85,23 @@ EXPORT_SYMBOL(device_resume);
* Interrupts must be disabled when calling this.
*/

-void dpm_power_up(void)
+void dpm_power_up_tree(struct partial_device_tree * tree)
{
- while(!list_empty(&dpm_off_irq)) {
- struct list_head * entry = dpm_off_irq.next;
+ while(!list_empty(&tree->dpm_off_irq)) {
+ struct list_head * entry = tree->dpm_off_irq.next;
+ struct device * dev = to_device(entry);
list_del_init(entry);
- resume_device(to_device(entry));
- list_add_tail(entry, &dpm_active);
+ resume_device(dev);
+ list_add_tail(entry, &tree->dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
}
}
+EXPORT_SYMBOL(dpm_power_up_tree);

+void dpm_power_up(void)
+{
+ dpm_power_up_tree(&default_device_tree);
+}

/**
* device_pm_power_up - Turn on all devices that need special attention.
@@ -97,6 +117,58 @@ void device_power_up(void)
dpm_power_up();
}

+#if 0
+
+/**
+ *
+ * pci_find_class_storage
+ *
+ * Find a PCI storage device.
+ * Based upon pci_find_class, but less strict.
+ */
+
+static struct pci_dev *
+pci_find_class_storage(unsigned int class, const struct pci_dev *from)
+{
+ struct list_head *n;
+ struct pci_dev *dev;
+
+ spin_lock(&pci_bus_lock);
+ n = from ? from->global_list.next : pci_devices.next;
+
+ while (n && (n != &pci_devices)) {
+ dev = pci_dev_g(n);
+ if (((dev->class & 0xff00) >> 16) == class)
+ goto exit;
+ n = n->next;
+ }
+ dev = NULL;
+exit:
+ spin_unlock(&pci_bus_lock);
+ return dev;
+}
+
+
+/**
+ * device_resume_type - Resume some devices.
+ *
+ * Resume devices of a specific type and their parents.
+ * Interrupts must be disabled when calling this.
+ *
+ * Note that we only handle pci devices at the moment.
+ * We have no way that I can tell of getting the class
+ * of devices not on the pci bus.
+ */
+void device_resume_type(type)
+{
+ struct device * dev_dev;
+ struct pci_dev * pci_dev = NULL;
+
+ while ((dev = pci_find_class(PCI_BASE_CLASS_STORAGE, dev))) {
+ }
+}
+#endif
+
EXPORT_SYMBOL(device_power_up);


diff -urp 2.6.9-lzf/drivers/base/power/shutdown.c 2.6.9/drivers/base/power/shutdown.c
--- 2.6.9-lzf/drivers/base/power/shutdown.c 2004-11-27 17:33:21.000000000 +0800
+++ 2.6.9/drivers/base/power/shutdown.c 2004-11-27 18:38:54.000000000 +0800
@@ -66,3 +66,4 @@ void device_shutdown(void)
sysdev_shutdown();
}

+EXPORT_SYMBOL(device_shutdown);
diff -urp 2.6.9-lzf/drivers/base/power/suspend.c 2.6.9/drivers/base/power/suspend.c
--- 2.6.9-lzf/drivers/base/power/suspend.c 2004-11-27 17:33:21.000000000 +0800
+++ 2.6.9/drivers/base/power/suspend.c 2004-11-28 00:09:08.000000000 +0800
@@ -51,7 +51,7 @@ int suspend_device(struct device * dev,


/**
- * device_suspend - Save state and stop all devices in system.
+ * device_suspend_tree - Save state and stop all devices in system.
* @state: Power state to put each device in.
*
* Walk the dpm_active list, call ->suspend() for each device, and move
@@ -60,7 +60,7 @@ int suspend_device(struct device * dev,
* the device to the dpm_off list. If it returns -EAGAIN, we move it to
* the dpm_off_irq list. If we get a different error, try and back out.
*
- * If we hit a failure with any of the devices, call device_resume()
+ * If we hit a failure with any of the devices, call device_resume_tree()
* above to bring the suspended devices back to life.
*
* Note this function leaves dpm_sem held to
@@ -70,22 +70,24 @@ int suspend_device(struct device * dev,
*
*/

-int device_suspend(pm_message_t state)
+int device_suspend_tree(pm_message_t state, struct partial_device_tree * tree)
{
int error = 0;

down(&dpm_sem);
- while(!list_empty(&dpm_active)) {
- struct list_head * entry = dpm_active.prev;
+ while(!list_empty(&tree->dpm_active)) {
+ struct list_head * entry = tree->dpm_active.prev;
struct device * dev = to_device(entry);
error = suspend_device(dev, state);

if (!error) {
list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off);
+ list_add(&dev->power.entry, &tree->dpm_off);
+ dev->current_list = DEVICE_LIST_DPM_OFF;
} else if (error == -EAGAIN) {
list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off_irq);
+ list_add(&dev->power.entry, &tree->dpm_off_irq);
+ dev->current_list = DEVICE_LIST_DPM_OFF_IRQ;
} else {
printk(KERN_ERR "Could not suspend device %s: "
"error %d\n", kobject_name(&dev->kobj), error);
@@ -96,10 +98,15 @@ int device_suspend(pm_message_t state)
up(&dpm_sem);
return error;
Error:
- dpm_resume();
+ dpm_resume_tree(tree);
goto Done;
}
+EXPORT_SYMBOL(device_suspend_tree);

+int device_suspend(pm_message_t state)
+{
+ return device_suspend_tree(state, &default_device_tree);
+}
EXPORT_SYMBOL(device_suspend);


@@ -112,19 +119,17 @@ EXPORT_SYMBOL(device_suspend);
* done, power down system devices.
*/

-int device_power_down(pm_message_t state)
+int device_power_down_tree(pm_message_t state, struct partial_device_tree * tree)
{
int error = 0;
struct device * dev;

- list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
+ list_for_each_entry_reverse(dev, &tree->dpm_off_irq, power.entry) {
if ((error = suspend_device(dev, state)))
break;
}
if (error)
goto Error;
- if ((error = sysdev_suspend(state)))
- goto Error;
Done:
return error;
Error:
@@ -132,5 +137,14 @@ int device_power_down(pm_message_t state
goto Done;
}

-EXPORT_SYMBOL(device_power_down);
+EXPORT_SYMBOL(device_power_down_tree);

+int device_power_down(pm_message_t state)
+{
+ int error;
+
+ if (!(error = device_power_down_tree(state, &default_device_tree)))
+ error = sysdev_suspend(state);
+ return error;
+}
+EXPORT_SYMBOL(device_power_down);
Only in 2.6.9/drivers/base/power: tree.c
diff -urp 2.6.9-lzf/drivers/base/sys.c 2.6.9/drivers/base/sys.c
--- 2.6.9-lzf/drivers/base/sys.c 2004-11-25 14:13:03.000000000 +0800
+++ 2.6.9/drivers/base/sys.c 2004-11-27 18:38:54.000000000 +0800
@@ -337,7 +337,7 @@ int sysdev_suspend(u32 state)
}
return 0;
}
-
+EXPORT_SYMBOL(sysdev_suspend);

/**
* sysdev_resume - Bring system devices back to life.
@@ -384,6 +384,7 @@ int sysdev_resume(void)
}
return 0;
}
+EXPORT_SYMBOL(sysdev_resume);


int __init system_bus_init(void)
--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-27 16:20:35

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi!

> >> Given it is not too intrusive... why not. Send it for comments.
> >> I probably will not use this myself, so you'll need to test/maintain
> >> it.
> >
> > This shouldn't be necessary. Since the resume is being initiated by
> > userspace, it can perform the function of name_to_dev_t and just feed
> > the numbers to the kernel. The code to do that is still in Debian's
> > initrd-tools.
>
> Good point. Ok, what's the best way to present this to userspace? Add a
> /sys/power/resume and then echo a major:minor in there?

Yes, that sounds reasonable. Plus docuementation and big warning about
usage of /sys/power/resume...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 16:26:04

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend2 merge: 1/51: Device trees

Hi!

> > SUSPEND all but swap device and parents
> > WRITE LRU pages
> > SUSPEND swap device and parents (+sysdev)
> > Snapshot
> > RESUME swap device and parents (+sysdev)
> > WRITE snapshot
> > SUSPEND swap device and parents
> > POWERDOWN everything
> >
> -device-tree.diff-

(snipped 420 lines of diff)

No, this one should not be neccessary. It is there only to solve some
memory problems, and it does not solve them anyway.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 17:23:36

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 21/51: Refrigerator upgrade.

Hi!

> > > > > > Silently doing nothing when user asked for sync is not nice,
> > > > > > either. BUG() is better solution than that.
> > > > >
> > > > > I don't think we should BUG because the user presses Sys-Rq S while
> > > > > suspending. I'll make it BUG_ON() and make the Sys_Rq printk & ignore
> > > > > when suspending. Sound reasonable?
> > > >
> > > > Yes, that's better. ... only that it means just another hook somewhere
> > > > :-(.
> > >
> > > :<. But we're only talking two or three lines. Let's keep it in
> > > perspective.
> >
> > I think even three lines are bad. It means that swsusp is no longer
> > self-contained subsystem, but that it has its hooks all over the
> > place. And those hooks need to be maintained, too.
>
> Yes, but suspending can't practically be a self contained system. We can
> try to convince ourselves that we're making it self contained by hiding
> behind the driver model, but in reality, the driver model is just a nice
> name for our sticky little fingers in all the other drivers, ensuring
> they do the right thing when we want to go to sleep. Hooks in other code
> is just the equivalent, but without the nice name. Perhaps I should
> invent one. How about the "quiescing subsystem"? :>

Actually, "quiescing subsystem" with defined (and documented!)
interface might be an improvement ;-).
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 17:25:44

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

On So 27-11-04 01:00:59, Jan Rychter wrote:
> >>>>> "Nigel" == Nigel Cunningham <[email protected]>:
> Nigel> Hi.
> Nigel> On Fri, 2004-11-26 at 10:32, Pavel Machek wrote:
> [...]
> >> Plus kernel now actually expects user interaction to solve problems
> >> during boot. No, no.
>
> Nigel> You want your cake and to eat it too? :> We don't want to warn
> Nigel> the user before they shoot themselves in the foot, but not
> Nigel> loudly enough that they can't help notice and choose to do
> Nigel> something before the damage is done?
>
> You're forgetting that Pavel's idea of user interaction is via BUG_ON()
> and panic(). That's obviously "cleaner", "less ugly", and "smaller".

If you have a "can't happen" condition, it is just plain wrong to
return 0 and succeed. If you can't understand that, well, that's your
problem, not mine.

Now, if you want kernel that asks user "really mount ext3 on /dev/hda3
filesystem to /, WARNING: you should run fsck first, press f to do
that", that's your option, feel free to start your own kernel fork.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-27 22:07:58

by George Spelvin

[permalink] [raw]
Subject: Re: Suspend 2 merge

> My machine suspends in 7 seconds, and that's swsusp1. According to
> your numbers, suspend2 should suspend it in 1 second and LZE
> compressed should be .5 second.
>
> I'd say "who cares". 7 seconds seems like fast enough for me. And I'm
> *not* going to add 2000 lines of code for 500msec speedup during
> suspend.

Lucky you. My machine takes minutes.
(To be precise, it prints about a line and a half of dots in the
count_data_pages() loop, and often takes 2 seconds per dot.)

AMD Athlon XP, 1066 MHz, 768K RAM, VIA KT133 chipset.
Stock 2.6.10-rc1.

I could really use a speedup.


Remember, Linux is the aggregate of a lot of people scratching their
itches. It's okay to criticize *how* people go about addressing
what's annoying them, since that has a long-term maintenance effect,
if nothing else. But complaining that it doesn't annoy *you* isn't the
most valid argument.

That's what's fundamentally wrong with people complainging about
wanting to "stabilize" 2.6.x. Stability is in the eye of the beholder.
Unless you want no changes at all (and you can get that easily enough),
what it means is that the bugs that particularly annoy you get fixed.

But the point is, every bug fixed particularly annoys *someone*;
that's why it's getting fixed.

2004-11-27 22:26:48

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > My machine suspends in 7 seconds, and that's swsusp1. According to
> > your numbers, suspend2 should suspend it in 1 second and LZE
> > compressed should be .5 second.
> >
> > I'd say "who cares". 7 seconds seems like fast enough for me. And I'm
> > *not* going to add 2000 lines of code for 500msec speedup during
> > suspend.
>
> Lucky you. My machine takes minutes.
> (To be precise, it prints about a line and a half of dots in the
> count_data_pages() loop, and often takes 2 seconds per dot.)
>
> AMD Athlon XP, 1066 MHz, 768K RAM, VIA KT133 chipset.
> Stock 2.6.10-rc1.
>
> I could really use a speedup.

Yep, that's O(n^2) algorithm slowing it down. I have fix for it, but
2.6.10 is now too frozen for performance fix to go in. See "bigdiff" I
sent to hugang, or wait few minutes and you'll get really ugly diff in
private email, that should solve it, too.

[I'll be glad when you report results. It should make count_data_pages
< 1 second].

> if nothing else. But complaining that it doesn't annoy *you* isn't the
> most valid argument.

Ok, it is the scale. Half a second speedup is not enough to justify
new compression algorithm in the kernel.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-28 08:29:15

by Wichert Akkerman

[permalink] [raw]
Subject: Re: Suspend 2 merge

Previously [email protected] wrote:
> Lucky you. My machine takes minutes.
> (To be precise, it prints about a line and a half of dots in the
> count_data_pages() loop, and often takes 2 seconds per dot.)

It also seems to vary wildly. Most of the time it goes pretty fast for
me (under one minute) but occasionaly it will take well over 10 minutes.
Never managed to time it exactly since my battery tends to run out in
the middle of a suspend when that happens.

Wichert.

--
Wichert Akkerman <[email protected]> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.

2004-11-28 15:10:39

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > Lucky you. My machine takes minutes.
> > (To be precise, it prints about a line and a half of dots in the
> > count_data_pages() loop, and often takes 2 seconds per dot.)
>
> It also seems to vary wildly. Most of the time it goes pretty fast for
> me (under one minute) but occasionaly it will take well over 10 minutes.
> Never managed to time it exactly since my battery tends to run out in
> the middle of a suspend when that happens.

It depends on memory fragmentation; after updatedb it tends to be slow.
Patch exists, see archives.
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-28 16:29:20

by Hu Gang

[permalink] [raw]
Subject: software suspend patch [1/6]

Hi Pavel Machek, Nigel Cunningham:

device-tree.diff
base from suspend2 with a little changed.

core.diff
1: redefine struct pbe for using _no_ continuous as pagedir.
2: make shrink memory as little as possible.
3: using a bitmap speed up collide check in page relocating.
4: pagecache saving ready.

i386.diff
ppc.diff
i386 and powerpc suspend update.

pagecachs_addon.diff
if enable page caches saving, must using it, it making saving
pagecaches safe. idea from suspend2.

ppcfix.diff
fix compile error.
$ gcc -v
....
gcc version 2.95.4 20011002 (Debian prerelease)

I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
PowerPC and x86 PC with Highmem and prepempt option enabled.

I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
mainline kernel ASAP, :). from I view point device-tree.diff is
very usefuly when using pagecache saving and pagecachs_addon.diff
that's really hack for making pagecache saving safe.


--- 2.6.9-lzf//drivers/base/class.c 2004-11-25 14:13:02.000000000 +0800
+++ 2.6.9/drivers/base/class.c 2004-11-28 23:17:00.000000000 +0800
@@ -465,6 +465,25 @@ void class_device_put(struct class_devic
kobject_put(&class_dev->kobj);
}

+struct class * class_find(char * name)
+{
+ struct class * this_class;
+
+ if (!name)
+ return NULL;
+
+ down_read(&class_subsys.rwsem);
+ list_for_each_entry(this_class, &class_subsys.kset.list, subsys.kset.kobj.entry) {
+ if (!(strcmp(this_class->name, name))) {
+ class_get(this_class);
+ up_read(&class_subsys.rwsem);
+ return this_class;
+ }
+ }
+ up_read(&class_subsys.rwsem);
+
+ return NULL;
+}

int class_interface_register(struct class_interface *class_intf)
{
@@ -547,3 +566,5 @@ EXPORT_SYMBOL(class_device_remove_file);

EXPORT_SYMBOL(class_interface_register);
EXPORT_SYMBOL(class_interface_unregister);
+
+EXPORT_SYMBOL(class_find);
--- 2.6.9-lzf//drivers/base/power/Makefile 2004-11-25 14:13:03.000000000 +0800
+++ 2.6.9/drivers/base/power/Makefile 2004-11-28 23:17:01.000000000 +0800
@@ -1,5 +1,5 @@
obj-y := shutdown.o
-obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o
+obj-$(CONFIG_PM) += main.o suspend.o resume.o runtime.o sysfs.o tree.o

ifeq ($(CONFIG_DEBUG_DRIVER),y)
EXTRA_CFLAGS += -DDEBUG
--- 2.6.9-lzf//drivers/base/power/main.c 2004-11-25 14:13:02.000000000 +0800
+++ 2.6.9/drivers/base/power/main.c 2004-11-28 23:17:01.000000000 +0800
@@ -4,6 +4,9 @@
* Copyright (c) 2003 Patrick Mochel
* Copyright (c) 2003 Open Source Development Lab
*
+ * Partial tree additions
+ * Copyright (c) 2004 Nigel Cunningham
+ *
* This file is released under the GPLv2
*
*
@@ -23,10 +26,18 @@
#include <linux/device.h>
#include "power.h"

-LIST_HEAD(dpm_active);
-LIST_HEAD(dpm_off);
-LIST_HEAD(dpm_off_irq);
-
+struct partial_device_tree default_device_tree =
+{
+ .dpm_active = LIST_HEAD_INIT(default_device_tree.dpm_active),
+ .dpm_off = LIST_HEAD_INIT(default_device_tree.dpm_off),
+ .dpm_off_irq = LIST_HEAD_INIT(default_device_tree.dpm_off_irq),
+};
+EXPORT_SYMBOL(default_device_tree);
+
+/*
+ * One mutex for all trees because we can be moving items
+ * between trees.
+ */
DECLARE_MUTEX(dpm_sem);

/*
@@ -76,7 +87,9 @@ int device_pm_add(struct device * dev)
dev->bus ? dev->bus->name : "No Bus", dev->kobj.name);
atomic_set(&dev->power.pm_users, 0);
down(&dpm_sem);
- list_add_tail(&dev->power.entry, &dpm_active);
+ list_add_tail(&dev->power.entry, &default_device_tree.dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
+ dev->tree = &default_device_tree;
device_pm_set_parent(dev, dev->parent);
if ((error = dpm_sysfs_add(dev)))
list_del(&dev->power.entry);
@@ -92,6 +105,7 @@ void device_pm_remove(struct device * de
dpm_sysfs_remove(dev);
device_pm_release(dev->power.pm_parent);
list_del(&dev->power.entry);
+ dev->current_list = DEVICE_LIST_NONE;
up(&dpm_sem);
}

--- 2.6.9-lzf//drivers/base/power/power.h 2004-11-28 23:17:29.000000000 +0800
+++ 2.6.9/drivers/base/power/power.h 2004-11-28 23:17:00.000000000 +0800
@@ -30,10 +30,22 @@ extern struct semaphore dpm_sem;
/*
* The PM lists.
*/
-extern struct list_head dpm_active;
-extern struct list_head dpm_off;
-extern struct list_head dpm_off_irq;

+struct partial_device_tree
+{
+ struct list_head dpm_active;
+ struct list_head dpm_off;
+ struct list_head dpm_off_irq;
+};
+
+enum {
+ DEVICE_LIST_NONE,
+ DEVICE_LIST_DPM_ACTIVE,
+ DEVICE_LIST_DPM_OFF,
+ DEVICE_LIST_DPM_OFF_IRQ,
+};
+
+extern struct partial_device_tree default_device_tree;

static inline struct dev_pm_info * to_pm_info(struct list_head * entry)
{
@@ -59,7 +71,9 @@ extern void dpm_sysfs_remove(struct devi
* resume.c
*/

+extern void dpm_resume_tree(struct partial_device_tree * tree);
extern void dpm_resume(void);
+extern void dpm_power_up_tree(struct partial_device_tree * tree);
extern void dpm_power_up(void);
extern int resume_device(struct device *);

--- 2.6.9-lzf//drivers/base/power/resume.c 2004-11-28 23:17:29.000000000 +0800
+++ 2.6.9/drivers/base/power/resume.c 2004-11-28 23:17:00.000000000 +0800
@@ -29,20 +29,25 @@ int resume_device(struct device * dev)



-void dpm_resume(void)
+void dpm_resume_tree(struct partial_device_tree * tree)
{
- while(!list_empty(&dpm_off)) {
- struct list_head * entry = dpm_off.next;
+ while(!list_empty(&tree->dpm_off)) {
+ struct list_head * entry = tree->dpm_off.next;
struct device * dev = to_device(entry);
list_del_init(entry);

if (dev->power.prev_state == PMSG_ON)
resume_device(dev);

- list_add_tail(entry, &dpm_active);
+ list_add_tail(entry, &tree->dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
}
}

+void dpm_resume(void)
+{
+ dpm_resume_tree(&default_device_tree);
+}

/**
* device_resume - Restore state of each device in system.
@@ -60,6 +65,14 @@ void device_resume(void)

EXPORT_SYMBOL(device_resume);

+void device_resume_tree(struct partial_device_tree * tree)
+{
+ down(&dpm_sem);
+ dpm_resume_tree(tree);
+ up(&dpm_sem);
+}
+
+EXPORT_SYMBOL(device_resume_tree);

/**
* device_power_up_irq - Power on some devices.
@@ -72,16 +85,23 @@ EXPORT_SYMBOL(device_resume);
* Interrupts must be disabled when calling this.
*/

-void dpm_power_up(void)
+void dpm_power_up_tree(struct partial_device_tree * tree)
{
- while(!list_empty(&dpm_off_irq)) {
- struct list_head * entry = dpm_off_irq.next;
+ while(!list_empty(&tree->dpm_off_irq)) {
+ struct list_head * entry = tree->dpm_off_irq.next;
+ struct device * dev = to_device(entry);
list_del_init(entry);
- resume_device(to_device(entry));
- list_add_tail(entry, &dpm_active);
+ resume_device(dev);
+ list_add_tail(entry, &tree->dpm_active);
+ dev->current_list = DEVICE_LIST_DPM_ACTIVE;
}
}
+EXPORT_SYMBOL(dpm_power_up_tree);

+void dpm_power_up(void)
+{
+ dpm_power_up_tree(&default_device_tree);
+}

/**
* device_pm_power_up - Turn on all devices that need special attention.
@@ -97,6 +117,58 @@ void device_power_up(void)
dpm_power_up();
}

+#if 0
+
+/**
+ *
+ * pci_find_class_storage
+ *
+ * Find a PCI storage device.
+ * Based upon pci_find_class, but less strict.
+ */
+
+static struct pci_dev *
+pci_find_class_storage(unsigned int class, const struct pci_dev *from)
+{
+ struct list_head *n;
+ struct pci_dev *dev;
+
+ spin_lock(&pci_bus_lock);
+ n = from ? from->global_list.next : pci_devices.next;
+
+ while (n && (n != &pci_devices)) {
+ dev = pci_dev_g(n);
+ if (((dev->class & 0xff00) >> 16) == class)
+ goto exit;
+ n = n->next;
+ }
+ dev = NULL;
+exit:
+ spin_unlock(&pci_bus_lock);
+ return dev;
+}
+
+
+/**
+ * device_resume_type - Resume some devices.
+ *
+ * Resume devices of a specific type and their parents.
+ * Interrupts must be disabled when calling this.
+ *
+ * Note that we only handle pci devices at the moment.
+ * We have no way that I can tell of getting the class
+ * of devices not on the pci bus.
+ */
+void device_resume_type(type)
+{
+ struct device * dev_dev;
+ struct pci_dev * pci_dev = NULL;
+
+ while ((dev = pci_find_class(PCI_BASE_CLASS_STORAGE, dev))) {
+ }
+}
+#endif
+
EXPORT_SYMBOL(device_power_up);


--- 2.6.9-lzf//drivers/base/power/shutdown.c 2004-11-28 23:17:29.000000000 +0800
+++ 2.6.9/drivers/base/power/shutdown.c 2004-11-28 23:17:01.000000000 +0800
@@ -66,3 +66,4 @@ void device_shutdown(void)
sysdev_shutdown();
}

+EXPORT_SYMBOL(device_shutdown);
--- 2.6.9-lzf//drivers/base/power/suspend.c 2004-11-28 23:17:29.000000000 +0800
+++ 2.6.9/drivers/base/power/suspend.c 2004-11-28 23:17:00.000000000 +0800
@@ -51,7 +51,7 @@ int suspend_device(struct device * dev,


/**
- * device_suspend - Save state and stop all devices in system.
+ * device_suspend_tree - Save state and stop all devices in system.
* @state: Power state to put each device in.
*
* Walk the dpm_active list, call ->suspend() for each device, and move
@@ -60,7 +60,7 @@ int suspend_device(struct device * dev,
* the device to the dpm_off list. If it returns -EAGAIN, we move it to
* the dpm_off_irq list. If we get a different error, try and back out.
*
- * If we hit a failure with any of the devices, call device_resume()
+ * If we hit a failure with any of the devices, call device_resume_tree()
* above to bring the suspended devices back to life.
*
* Note this function leaves dpm_sem held to
@@ -70,22 +70,24 @@ int suspend_device(struct device * dev,
*
*/

-int device_suspend(pm_message_t state)
+int device_suspend_tree(pm_message_t state, struct partial_device_tree * tree)
{
int error = 0;

down(&dpm_sem);
- while(!list_empty(&dpm_active)) {
- struct list_head * entry = dpm_active.prev;
+ while(!list_empty(&tree->dpm_active)) {
+ struct list_head * entry = tree->dpm_active.prev;
struct device * dev = to_device(entry);
error = suspend_device(dev, state);

if (!error) {
list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off);
+ list_add(&dev->power.entry, &tree->dpm_off);
+ dev->current_list = DEVICE_LIST_DPM_OFF;
} else if (error == -EAGAIN) {
list_del(&dev->power.entry);
- list_add(&dev->power.entry, &dpm_off_irq);
+ list_add(&dev->power.entry, &tree->dpm_off_irq);
+ dev->current_list = DEVICE_LIST_DPM_OFF_IRQ;
} else {
printk(KERN_ERR "Could not suspend device %s: "
"error %d\n", kobject_name(&dev->kobj), error);
@@ -96,10 +98,15 @@ int device_suspend(pm_message_t state)
up(&dpm_sem);
return error;
Error:
- dpm_resume();
+ dpm_resume_tree(tree);
goto Done;
}
+EXPORT_SYMBOL(device_suspend_tree);

+int device_suspend(pm_message_t state)
+{
+ return device_suspend_tree(state, &default_device_tree);
+}
EXPORT_SYMBOL(device_suspend);


@@ -112,19 +119,17 @@ EXPORT_SYMBOL(device_suspend);
* done, power down system devices.
*/

-int device_power_down(pm_message_t state)
+int device_power_down_tree(pm_message_t state, struct partial_device_tree * tree)
{
int error = 0;
struct device * dev;

- list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
+ list_for_each_entry_reverse(dev, &tree->dpm_off_irq, power.entry) {
if ((error = suspend_device(dev, state)))
break;
}
if (error)
goto Error;
- if ((error = sysdev_suspend(state)))
- goto Error;
Done:
return error;
Error:
@@ -132,5 +137,14 @@ int device_power_down(pm_message_t state
goto Done;
}

-EXPORT_SYMBOL(device_power_down);
+EXPORT_SYMBOL(device_power_down_tree);

+int device_power_down(pm_message_t state)
+{
+ int error;
+
+ if (!(error = device_power_down_tree(state, &default_device_tree)))
+ error = sysdev_suspend(state);
+ return error;
+}
+EXPORT_SYMBOL(device_power_down);
--- /dev/null 2004-06-07 18:45:47.000000000 +0800
+++ 2.6.9/drivers/base/power/tree.c 2004-11-28 23:17:00.000000000 +0800
@@ -0,0 +1,105 @@
+/*
+ * suspend.c - Functions for moving devices between trees.
+ *
+ * Copyright (c) 2004 Nigel Cunningham
+ *
+ * This file is released under the GPLv2
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include "power.h"
+
+/*
+ * device_merge_tree - Move an entire tree into another tree
+ * @source: The tree to be moved
+ * @dest : The destination tree
+ */
+
+void device_merge_tree( struct partial_device_tree * source,
+ struct partial_device_tree * dest)
+{
+ down(&dpm_sem);
+ list_splice_init(&source->dpm_active, &dest->dpm_active);
+ list_splice_init(&source->dpm_off, &dest->dpm_off);
+ list_splice_init(&source->dpm_off_irq, &dest->dpm_off_irq);
+ up(&dpm_sem);
+}
+EXPORT_SYMBOL(device_merge_tree);
+
+/*
+ * device_switch_trees - Move a device and its ancestors to a new tree
+ * @dev: The lowest device to be moved.
+ * @tree: The destination tree.
+ *
+ * Note that siblings can be left in the original tree. This is because
+ * we want to be able to keep part of a tree in one state while part is
+ * in another.
+ *
+ * Since we iterate all the way back to the top, and may move entries
+ * already in the destination tree, we will never violate the depth
+ * first property of the destination tree.
+ */
+
+void device_switch_trees(struct device * dev, struct partial_device_tree * tree)
+{
+ down(&dpm_sem);
+ while (dev) {
+ list_del(&dev->power.entry);
+ switch (dev->current_list) {
+ case DEVICE_LIST_DPM_ACTIVE:
+ list_add(&dev->power.entry, &tree->dpm_active);
+ break;
+ case DEVICE_LIST_DPM_OFF:
+ list_add(&dev->power.entry, &tree->dpm_off);
+ break;
+ case DEVICE_LIST_DPM_OFF_IRQ:
+ list_add(&dev->power.entry, &tree->dpm_off_irq);
+ break;
+ }
+
+ dev = dev->parent;
+ }
+ up(&dpm_sem);
+}
+
+EXPORT_SYMBOL(device_switch_trees);
+
+/*
+ * create_device_tree - Create a new device tree
+ */
+
+struct partial_device_tree * device_create_tree(void)
+{
+ struct partial_device_tree * new_tree;
+
+ new_tree = (struct partial_device_tree *)
+ kmalloc(sizeof(struct partial_device_tree), GFP_ATOMIC);
+
+ if (!IS_ERR(new_tree)) {
+ INIT_LIST_HEAD(&new_tree->dpm_active);
+ INIT_LIST_HEAD(&new_tree->dpm_off);
+ INIT_LIST_HEAD(&new_tree->dpm_off_irq);
+ }
+
+ return new_tree;
+}
+EXPORT_SYMBOL(device_create_tree);
+
+/*
+ * device_destroy_tree - Destroy a dynamically created tree
+ */
+
+void device_destroy_tree(struct partial_device_tree * tree)
+{
+ BUG_ON(tree == &default_device_tree);
+
+ BUG_ON(!list_empty(&tree->dpm_active));
+ BUG_ON(!list_empty(&tree->dpm_off));
+ BUG_ON(!list_empty(&tree->dpm_off_irq));
+
+ kfree(tree);
+}
+
+EXPORT_SYMBOL(device_destroy_tree);
--- 2.6.9-lzf//drivers/base/sys.c 2004-11-25 14:13:03.000000000 +0800
+++ 2.6.9/drivers/base/sys.c 2004-11-28 23:17:01.000000000 +0800
@@ -337,7 +337,7 @@ int sysdev_suspend(u32 state)
}
return 0;
}
-
+EXPORT_SYMBOL(sysdev_suspend);

/**
* sysdev_resume - Bring system devices back to life.
@@ -384,6 +384,7 @@ int sysdev_resume(void)
}
return 0;
}
+EXPORT_SYMBOL(sysdev_resume);


int __init system_bus_init(void)
--- 2.6.9-lzf//include/linux/pm.h 2004-11-28 23:17:16.000000000 +0800
+++ 2.6.9/include/linux/pm.h 2004-11-28 23:16:55.000000000 +0800
@@ -231,13 +231,25 @@ struct dev_pm_info {
};

extern void device_pm_set_parent(struct device * dev, struct device * parent);
+struct partial_device_tree;
+extern struct partial_device_tree default_device_tree;

extern int device_suspend(pm_message_t state);
+extern int device_suspend_tree(pm_message_t state, struct partial_device_tree * tree);
extern int device_power_down(pm_message_t state);
+extern int device_power_down_tree(pm_message_t state, struct partial_device_tree * tree);
extern void device_power_up(void);
+extern void device_power_up_tree(struct partial_device_tree * tree);
extern void device_resume(void);
-
-
+extern void device_resume_tree(struct partial_device_tree * tree);
+extern void device_merge_tree(struct partial_device_tree * source,
+ struct partial_device_tree * dest);
+extern void device_switch_trees(struct device * dev, struct partial_device_tree * tree);
+extern void dpm_power_up_tree(struct partial_device_tree * tree);
+extern int sysdev_suspend(u32 state);
+extern int sysdev_resume(void);
+extern struct partial_device_tree * device_create_tree(void);
+extern void device_destroy_tree(struct partial_device_tree * tree);
#endif /* __KERNEL__ */

#endif /* _LINUX_PM_H */
--- 2.6.9-lzf//include/linux/device.h 2004-11-28 23:17:16.000000000 +0800
+++ 2.6.9/include/linux/device.h 2004-11-28 23:16:56.000000000 +0800
@@ -162,6 +162,7 @@ extern void class_unregister(struct clas

extern struct class * class_get(struct class *);
extern void class_put(struct class *);
+extern struct class * class_find(char * name);


struct class_attribute {
@@ -288,6 +289,11 @@ struct device {
override */

void (*release)(struct device * dev);
+
+ struct partial_device_tree * tree; /* Which tree of devices this
+ device is in */
+ int current_list; /* Which list within the tree the
+ device is on (speeds moving) */
};

static inline struct device *

--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-28 16:39:27

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [2/6]

On Mon, Nov 29, 2004 at 12:23:20AM +0800, [email protected] wrote:
> Hi Pavel Machek, Nigel Cunningham:
>
> device-tree.diff
> base from suspend2 with a little changed.
>
> core.diff
> 1: redefine struct pbe for using _no_ continuous as pagedir.
> 2: make shrink memory as little as possible.
> 3: using a bitmap speed up collide check in page relocating.
> 4: pagecache saving ready.
>
> i386.diff
> ppc.diff
> i386 and powerpc suspend update.
>
> pagecachs_addon.diff
> if enable page caches saving, must using it, it making saving
> pagecaches safe. idea from suspend2.
>
> ppcfix.diff
> fix compile error.
> $ gcc -v
> ....
> gcc version 2.95.4 20011002 (Debian prerelease)
>
> I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
> PowerPC and x86 PC with Highmem and prepempt option enabled.
>
> I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
> mainline kernel ASAP, :). from I view point device-tree.diff is
> very usefuly when using pagecache saving and pagecachs_addon.diff
> that's really hack for making pagecache saving safe.

--- 2.6.9-lzf//include/linux/reboot.h 2004-11-26 12:33:39.000000000 +0800
+++ 2.6.9/include/linux/reboot.h 2004-11-28 23:16:56.000000000 +0800
@@ -42,6 +42,8 @@
extern int register_reboot_notifier(struct notifier_block *);
extern int unregister_reboot_notifier(struct notifier_block *);

+/* For use by swsusp only */
+extern struct notifier_block *reboot_notifier_list;

/*
* Architecture-specific implementations of sys_reboot commands.
--- 2.6.9-lzf//include/linux/suspend.h 2004-11-28 23:17:18.000000000 +0800
+++ 2.6.9/include/linux/suspend.h 2004-11-28 23:16:56.000000000 +0800
@@ -1,7 +1,7 @@
#ifndef _LINUX_SWSUSP_H
#define _LINUX_SWSUSP_H

-#ifdef CONFIG_X86
+#if (defined(CONFIG_X86)) || (defined (CONFIG_PPC32))
#include <asm/suspend.h>
#endif
#include <linux/swap.h>
--- 2.6.9-lzf//include/linux/sysctl.h 2004-11-28 23:17:15.000000000 +0800
+++ 2.6.9/include/linux/sysctl.h 2004-11-28 23:16:55.000000000 +0800
@@ -170,6 +170,7 @@ enum
VM_VFS_CACHE_PRESSURE=26, /* dcache/icache reclaim pressure */
VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space layout */
VM_HARDMAPLIMIT=28, /* Make mapped a hard limit */
+ VM_SWSUSP_PAGECACHE=29, /* Enable/Disable Suspend PageCaches */
};


--- 2.6.9-lzf//kernel/power/disk.c 2004-11-28 23:17:11.000000000 +0800
+++ 2.6.9/kernel/power/disk.c 2004-11-28 23:16:54.000000000 +0800
@@ -16,10 +16,11 @@
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/fs.h>
+#include <linux/reboot.h>
#include <linux/device.h>
#include "power.h"

-
+extern struct partial_device_tree *swsusp_dev_tree;
extern suspend_disk_method_t pm_disk_mode;
extern struct pm_ops * pm_ops;

@@ -29,6 +30,8 @@ extern int swsusp_read(void);
extern int swsusp_resume(void);
extern int swsusp_free(void);

+extern int swsusp_prepare_suspend(void);
+extern int swsusp_post_resume(void);

static int noresume = 0;
char resume_file[256] = CONFIG_PM_STD_PARTITION;
@@ -48,19 +51,20 @@ static void power_down(suspend_disk_meth
unsigned long flags;
int error = 0;

- local_irq_save(flags);
switch(mode) {
case PM_DISK_PLATFORM:
- device_power_down(PMSG_SUSPEND);
+ local_irq_save(flags);
error = pm_ops->enter(PM_SUSPEND_DISK);
+ local_irq_restore(flags);
break;
case PM_DISK_SHUTDOWN:
printk("Powering off system\n");
- device_shutdown();
+ notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
+ device_suspend_tree(PMSG_FREEZE, swsusp_dev_tree);
machine_power_off();
break;
case PM_DISK_REBOOT:
- device_shutdown();
+ device_suspend_tree(PMSG_FREEZE, swsusp_dev_tree);
machine_restart(NULL);
break;
}
@@ -74,38 +78,6 @@ static void power_down(suspend_disk_meth

static int in_suspend __nosavedata = 0;

-
-/**
- * free_some_memory - Try to free as much memory as possible
- *
- * ... but do not OOM-kill anyone
- *
- * Notice: all userland should be stopped at this point, or
- * livelock is possible.
- */
-
-static void free_some_memory(void)
-{
- int i;
- for (i=0; i<5; i++) {
- int i = 0, tmp;
- long pages = 0;
- char *p = "-\\|/";
-
- printk("Freeing memory... ");
- while ((tmp = shrink_all_memory(10000))) {
- pages += tmp;
- printk("\b%c", p[i]);
- i++;
- if (i > 3)
- i = 0;
- }
- printk("\bdone (%li pages freed)\n", pages);
- current->state = TASK_INTERRUPTIBLE;
- schedule_timeout(HZ/5);
- }
-}
-
static inline void platform_finish(void)
{
if (pm_disk_mode == PM_DISK_PLATFORM) {
@@ -116,7 +88,7 @@ static inline void platform_finish(void)

static void finish(void)
{
- device_resume();
+ swsusp_post_resume();
platform_finish();
enable_nonboot_cpus();
thaw_processes();
@@ -124,7 +96,7 @@ static void finish(void)
}


-static int prepare(void)
+static int prepare(int resume)
{
int error;

@@ -143,14 +115,11 @@ static int prepare(void)
}
}

- /* Free memory before shutting down devices. */
- free_some_memory();
-
disable_nonboot_cpus();
- if ((error = device_suspend(PMSG_FREEZE))) {
- printk("Some devices failed to suspend\n");
- goto Finish;
- }
+ if (!resume)
+ if ((error = swsusp_prepare_suspend())) {
+ goto Finish;
+ }

return 0;
Finish:
@@ -176,7 +145,7 @@ int pm_suspend_disk(void)
{
int error;

- if ((error = prepare()))
+ if ((error = prepare(0)))
return error;

pr_debug("PM: Attempting to suspend to disk.\n");
@@ -233,7 +202,7 @@ static int software_resume(void)

pr_debug("PM: Preparing system for restore.\n");

- if ((error = prepare()))
+ if ((error = prepare(1)))
goto Free;

barrier();
@@ -241,7 +210,7 @@ static int software_resume(void)

pr_debug("PM: Restoring saved image.\n");
swsusp_resume();
- pr_debug("PM: Restore failed, recovering.n");
+ pr_debug("PM: Restore failed, recovering.\n");
finish();
Free:
swsusp_free();
--- 2.6.9-lzf//kernel/power/main.c 2004-11-28 23:17:11.000000000 +0800
+++ 2.6.9/kernel/power/main.c 2004-11-28 23:16:54.000000000 +0800
@@ -4,7 +4,7 @@
* Copyright (c) 2003 Patrick Mochel
* Copyright (c) 2003 Open Source Development Lab
*
- * This file is release under the GPLv2
+ * This file is released under the GPLv2
*
*/

--- 2.6.9-lzf//kernel/power/swsusp.c 2004-11-28 23:17:11.000000000 +0800
+++ 2.6.9/kernel/power/swsusp.c 2004-11-28 23:16:54.000000000 +0800
@@ -63,6 +63,7 @@
#include <linux/console.h>
#include <linux/highmem.h>
#include <linux/bio.h>
+#include <linux/preempt.h>

#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -74,11 +75,8 @@
/* References to section boundaries */
extern char __nosave_begin, __nosave_end;

-/* Variables to be preserved over suspend */
-static int pagedir_order_check;
-
extern char resume_file[];
-static dev_t resume_device;
+static dev_t swsusp_resume_device;
/* Local variables that should not be affected by save */
unsigned int nr_copy_pages __nosavedata = 0;

@@ -97,7 +95,6 @@ unsigned int nr_copy_pages __nosavedata
*/
suspend_pagedir_t *pagedir_nosave __nosavedata = NULL;
static suspend_pagedir_t *pagedir_save;
-static int pagedir_order __nosavedata = 0;

#define SWSUSP_SIG "S1SUSPEND"

@@ -168,10 +165,11 @@ static int is_resume_device(const struct
struct inode *inode = file->f_dentry->d_inode;

return S_ISBLK(inode->i_mode) &&
- resume_device == MKDEV(imajor(inode), iminor(inode));
+ swsusp_resume_device == MKDEV(imajor(inode), iminor(inode));
}

-int swsusp_swap_check(void) /* This is called before saving image */
+/* This is called before saving image */
+int swsusp_swap_check(struct partial_device_tree *suspend_device_tree)
{
int i, len;

@@ -195,6 +193,7 @@ int swsusp_swap_check(void) /* This is c
if (is_resume_device(&swap_info[i])) {
swapfile_used[i] = SWAPFILE_SUSPEND;
root_swap = i;
+ device_switch_trees((swap_info[i].bdev)->bd_disk->driverfs_dev, suspend_device_tree);
} else {
swapfile_used[i] = SWAPFILE_IGNORED;
}
@@ -222,8 +221,105 @@ static void lock_swapdevices(void)
}
swap_list_unlock();
}
+
+#define ONE_PAGE_PBE_NUM (PAGE_SIZE/sizeof(struct pbe))
+#define PBE_IS_PAGE_END(x) \
+ ( PAGE_SIZE - sizeof(struct pbe) == ((x) - ((~(PAGE_SIZE - 1)) & (x))) )
+
+#define pgdir_for_each_safe(pos, n, head) \
+ for(pos = head, n = pos ? (suspend_pagedir_t*)pos->dummy.val : NULL; \
+ pos != NULL; \
+ pos = n, n = pos ? (suspend_pagedir_t *)pos->dummy.val : NULL)
+
+#define pbe_for_each_safe(pos, n, index, max, head) \
+ for(pos = head, index = 0, \
+ n = pos ? (struct pbe *)pos->dummy.val : NULL; \
+ (pos != NULL) && (index < max); \
+ pos = (PBE_IS_PAGE_END((unsigned long)pos)) ? n : \
+ ((struct pbe *)((unsigned long)pos + sizeof(struct pbe))), \
+ index ++, \
+ n = pos ? (struct pbe*)pos->dummy.val : NULL)
+
+/* free pagedir */
+static void pagedir_free(suspend_pagedir_t *head)
+{
+ suspend_pagedir_t *next, *cur;
+ pgdir_for_each_safe(cur, next, head) {
+ free_page((unsigned long)cur);
+ }
+}
+
+/* for_each_pbe_copy_back
+ *
+ * That usefuly for help us writing the code in assemble code.
+ *
+ */
+/*#define CREATE_ASM_CODE */
+#ifdef CREATE_ASM_CODE
+#if 0
+#define GET_ADDRESS(x) __pa(x)
+#else
+#define GET_ADDRESS(x) (x)
+#endif
+asmlinkage void for_each_pbe_copy_back(void)
+{
+ struct pbe *pgdir, *next;
+
+ pgdir = pagedir_nosave;
+ while (pgdir != NULL) {
+ unsigned long nums, i;
+ pgdir = (struct pbe *)GET_ADDRESS(pgdir);
+ next = (struct pbe*)pgdir->dummy.val;
+ for (nums = 0; nums < ONE_PAGE_PBE_NUM; nums++) {
+ register unsigned long *orig, *copy;
+ orig = (unsigned long *)pgdir->orig_address;
+ if (orig == 0) goto end;
+ orig = (unsigned long *)GET_ADDRESS(orig);
+ copy = (unsigned long *)GET_ADDRESS(pgdir->address);
+#if 0
+ memcpy(orig, copy, PAGE_SIZE);
+#else
+ for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i+=4) {
+ *(orig + i) = *(copy + i);
+ *(orig + i+1) = *(copy + i+1);
+ *(orig + i+2) = *(copy + i+2);
+ *(orig + i+3) = *(copy + i+3);
+ }
+#endif
+ pgdir ++;
+ }
+ pgdir = next;
+ }
+end:
+ panic("just asm code");
+}
+#endif

+/*
+ * find_pbe_by_index -
+ * @pgdir: the pgdir head
+ * @index:
+ *
+ * @return:
+ */
+static struct pbe *find_pbe_by_index(struct pbe *pgdir, int index)
+{
+ unsigned long p = 0;
+ struct pbe *pbe, *next;

+ pr_debug("find_pbe_by_index: %p, 0x%03x", pgdir, index);
+ pgdir_for_each_safe(pbe, next, pgdir) {
+ if (p == index / ONE_PAGE_PBE_NUM) {
+ pbe = (struct pbe *)((unsigned long)pbe +
+ (index % ONE_PAGE_PBE_NUM) * sizeof(struct pbe));
+ pr_debug(" %p, o{%p} c{%p}\n",
+ pbe, (void*)pbe->orig_address, (void*)pbe->address);
+ return pbe;
+ }
+ p ++;
+ }
+ return (NULL);
+}

/**
* write_swap_page - Write one page to a fresh swap location.
@@ -257,7 +353,6 @@ static int write_page(unsigned long addr
return error;
}

-
/**
* data_free - Free the swap entries used by the saved image.
*
@@ -267,43 +362,82 @@ static int write_page(unsigned long addr

static void data_free(void)
{
- swp_entry_t entry;
- int i;
+ int index;
+ struct pbe *pos, *next;

- for (i = 0; i < nr_copy_pages; i++) {
- entry = (pagedir_nosave + i)->swap_address;
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_nosave) {
+ swp_entry_t entry;
+
+ entry = pos->swap_address;
if (entry.val)
swap_free(entry);
- else
- break;
- (pagedir_nosave + i)->swap_address = (swp_entry_t){0};
+ pos->swap_address = (swp_entry_t){0};
}
}

+static int mod_progress = 1;
+
+static void inline mod_printk_progress(int i)
+{
+ if (mod_progress == 0) mod_progress = 1;
+ if (!(i%100))
+ printk( "\b\b\b\b%3d%%", i / mod_progress );
+}
+
+static int write_one_pbe(struct pbe *p, void *data, int cur)
+{
+ int error = 0;
+
+ mod_printk_progress(cur);
+
+ pr_debug("write_one_pbe: %p, o{%p} c{%p} %d ",
+ p, (void *)p->orig_address, (void *)p->address, cur);
+ error = write_page((unsigned long)data, &p->swap_address);
+ if (error) return error;
+
+ pr_debug("%lu\n", swp_offset(p->swap_address));
+
+ return 0;
+}
+
+static int bio_read_page(pgoff_t page_off, void * page);
+
+static int read_one_pbe(struct pbe *p, void *data, int cur)
+{
+ int error = 0;
+
+ mod_printk_progress(cur);
+
+ pr_debug("read_one_pbe: %p, o{%p} c{%p} %lu\n",
+ p, (void *)p->orig_address, data,
+ swp_offset(p->swap_address));
+
+ error = bio_read_page(swp_offset(p->swap_address), data);
+ if (error) return error;
+
+ return 0;
+}

/**
* data_write - Write saved image to swap.
*
* Walk the list of pages in the image and sync each one to swap.
*/
-
static int data_write(void)
{
- int error = 0;
- int i;
- unsigned int mod = nr_copy_pages / 100;
-
- if (!mod)
- mod = 1;
+ int error = 0, index;
+ struct pbe *pos, *next;
+
+ mod_progress = nr_copy_pages / 100;

- printk( "Writing data to swap (%d pages)... ", nr_copy_pages );
- for (i = 0; i < nr_copy_pages && !error; i++) {
- if (!(i%mod))
- printk( "\b\b\b\b%3d%%", i / mod );
- error = write_page((pagedir_nosave+i)->address,
- &((pagedir_nosave+i)->swap_address));
+ printk( "Writing data to swap (%d pages)... ", nr_copy_pages);
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_nosave) {
+ BUG_ON(pos->orig_address == 0);
+ error = write_one_pbe(pos, (void*)pos->address, index);
+ if (error) break;
}
printk("\b\b\b\bdone\n");
+
return error;
}

@@ -363,7 +497,6 @@ static void free_pagedir_entries(void)
swap_free(swsusp_info.pagedir[i]);
}

-
/**
* write_pagedir - Write the array of pages holding the page directory.
* @last: Last swap entry we write (needed for header).
@@ -371,15 +504,19 @@ static void free_pagedir_entries(void)

static int write_pagedir(void)
{
- unsigned long addr = (unsigned long)pagedir_nosave;
- int error = 0;
- int n = SUSPEND_PD_PAGES(nr_copy_pages);
- int i;
+ int error = 0, n = 0;
+ suspend_pagedir_t *pgdir, *next;

- swsusp_info.pagedir_pages = n;
+ pgdir_for_each_safe(pgdir, next, pagedir_nosave) {
+ error = write_page((unsigned long)pgdir, &swsusp_info.pagedir[n]);
+ if (error) {
+ break;
+ }
+ n++;
+ }
printk( "Writing pagedir (%d pages)\n", n);
- for (i = 0; i < n && !error; i++, addr += PAGE_SIZE)
- error = write_page(addr, &swsusp_info.pagedir[i]);
+ swsusp_info.pagedir_pages = n;
+
return error;
}

@@ -410,7 +547,6 @@ static int write_suspend_image(void)
goto Done;
}

-
#ifdef CONFIG_HIGHMEM
struct highmem_page {
char *data;
@@ -503,7 +639,533 @@ static int restore_highmem(void)
#endif
return 0;
}
+struct partial_device_tree *swsusp_dev_tree = NULL;
+
+static int free_suspend_device_tree(void)
+{
+ if (swsusp_dev_tree) {
+ device_merge_tree(swsusp_dev_tree, &default_device_tree);
+ device_destroy_tree(swsusp_dev_tree);
+ }
+ swsusp_dev_tree = NULL;
+ return 0;
+}
+
+static int setup_suspend_device_tree(void)
+{
+ struct class * class = NULL;
+
+ swsusp_dev_tree = device_create_tree();
+ if (IS_ERR(swsusp_dev_tree)) {
+ swsusp_dev_tree = NULL;
+ return -ENOMEM;
+ }
+ /* Now check for graphics class devices, so we can
+ * keep the display on while suspending */
+ class = class_find("graphics");
+ if (class) {
+ struct class_device * class_dev;
+ list_for_each_entry(class_dev, &class->children, node)
+ device_switch_trees(class_dev->dev, swsusp_dev_tree);
+ class_put(class);
+ }
+
+ return (0);
+}
+
+typedef int (*do_page_t)(struct page *page, int p);
+
+static int foreach_zone_page(struct zone *zone, do_page_t fun, int p)
+{
+ int inactive = 0, active = 0;
+
+ spin_lock_irq(&zone->lru_lock);
+ if (zone->nr_inactive) {
+ struct list_head * entry = zone->inactive_list.prev;
+ while (entry != &zone->inactive_list) {
+ if (fun) {
+ struct page * page = list_entry(entry, struct page, lru);
+ inactive += fun(page, p);
+ } else {
+ inactive ++;
+ }
+ entry = entry->prev;
+ }
+ }
+ if (zone->nr_active) {
+ struct list_head * entry = zone->active_list.prev;
+ while (entry != &zone->active_list) {
+ if (fun) {
+ struct page * page = list_entry(entry, struct page, lru);
+ active += fun(page, p);
+ } else {
+ active ++;
+ }
+ entry = entry->prev;
+ }
+ }
+ spin_unlock_irq(&zone->lru_lock);
+
+ return (active + inactive);
+}
+
+/* enable/disable pagecache suspend */
+int swsusp_pagecache = 0;
+
+/* I'll move this to include/linux/page-flags.h */
+#define PG_page_caches (PG_nosave_free + 1)
+
+#define SetPagePcs(page) set_bit(PG_page_caches, &(page)->flags)
+#define ClearPagePcs(page) clear_bit(PG_page_caches, &(page)->flags)
+#define PagePcs(page) test_bit(PG_page_caches, &(page)->flags)
+
+static suspend_pagedir_t *pagedir_cache = NULL;
+static int nr_copy_page_caches = 0;
+
+static int setup_page_caches_pe(struct page *page, int setup)
+{
+ unsigned long pfn = page_to_pfn(page);
+
+ BUG_ON(PageReserved(page) && PageNosave(page));
+ if (!pfn_valid(pfn)) {
+ printk("not valid page\n");
+ return 0;
+ }
+ if (PageNosave(page)) {
+ printk("nosave\n");
+ return 0;
+ }
+ if (PageReserved(page) /*&& pfn_is_nosave(pfn)*/) {
+ printk("[nosave]\n");
+ return 0;
+ }
+ if (PageSlab(page)) {
+ printk("slab\n");
+ return 0;
+ }
+ if (setup) {
+ struct pbe *p = find_pbe_by_index(pagedir_cache, nr_copy_page_caches);
+ BUG_ON(p == NULL);
+ p->address = (long)page_address(page);
+ BUG_ON(p->address == 0);
+ /*pr_debug("setup_page_caches: cur %p, o{%p}, d{%p}, nr %u\n",
+ (void*)p, (void*)p->orig_address,
+ (void*)p->address, nr_copy_page_caches);*/
+ nr_copy_page_caches ++;
+ }
+ SetPagePcs(page);
+
+ return (1);
+}
+
+static int count_page_caches(struct zone *zone, int p)
+{
+ if (swsusp_pagecache)
+ return foreach_zone_page(zone, setup_page_caches_pe, p);
+ return 0;
+}
+
+#define pointer2num(x) ((x - 0xc0000000) >> 12)
+#define num2pointer(x) ((x << 12) + 0xc0000000)
+
+static inline void collide_set_bit(unsigned char *bitmap,
+ unsigned long bitnum)
+{
+ bitnum = pointer2num(bitnum);
+ bitmap[bitnum / 8] |= (1 << (bitnum%8));
+}
+
+static inline int collide_is_bit_set(unsigned char *bitmap,
+ unsigned long bitnum)
+{
+ bitnum = pointer2num(bitnum);
+ return !!(bitmap[bitnum / 8] & (1 << (bitnum%8)));
+}
+
+static void collide_bitmap_free(unsigned char *bitmap)
+{
+ free_pages((unsigned long)bitmap, 2);
+}
+
+/*
+ * four pages are enough for bitmap
+ *
+ */
+static unsigned char *collide_bitmap_init(struct pbe *pgdir)
+{
+ unsigned char *bitmap =
+ (unsigned char *)__get_free_pages(GFP_ATOMIC | __GFP_COLD, 2);
+ struct pbe *next;
+
+ if (bitmap == NULL) {
+ return NULL;
+ }
+ memset(bitmap, 0, 4 * PAGE_SIZE);
+
+ /* do base check */
+ BUG_ON(collide_is_bit_set(bitmap, (unsigned long)bitmap) == 1);
+ collide_set_bit(bitmap, (unsigned long)bitmap);
+ BUG_ON(collide_is_bit_set(bitmap, (unsigned long)bitmap) == 0);
+
+ while (pgdir != NULL) {
+ unsigned long nums;
+ next = (struct pbe*)pgdir->dummy.val;
+ for (nums = 0; nums < ONE_PAGE_PBE_NUM; nums++) {
+ collide_set_bit(bitmap, (unsigned long)pgdir);
+ collide_set_bit(bitmap, (unsigned long)pgdir->orig_address);
+ pgdir ++;
+ }
+ pgdir = next;
+ }
+
+ return bitmap;
+}
+static void **eaten_memory = NULL;
+
+static void *swsusp_get_safe_free_page(unsigned char *collide)
+{
+ void *addr = NULL;
+ void **c = eaten_memory;
+
+ do {
+ if (addr) {
+ eaten_memory = (void**)addr;
+ *eaten_memory = c;
+ c = eaten_memory;
+ }
+ addr = (void*)__get_free_pages(GFP_ATOMIC | __GFP_COLD, 0);
+ if (!addr)
+ return NULL;
+ } while (collide && collide_is_bit_set(collide, (unsigned long)addr));
+
+ return addr;
+}
+/*
+ * redefine in PageCahe pagdir.
+ *
+ * struct pbe {
+ * unsigned long address;
+ * unsigned long orig_address; pointer of next struct pbe
+ * swp_entry_t swap_address;
+ * swp_entry_t dummy; current index
+ * }
+ *
+ */
+static suspend_pagedir_t * alloc_one_pagedir(suspend_pagedir_t *prev,
+ unsigned char *collide)
+{
+ suspend_pagedir_t *pgdir = NULL;
+ int i;
+
+ pgdir = (suspend_pagedir_t *)swsusp_get_safe_free_page(collide);
+
+ /*pr_debug("pgdir: %p, %p, %d\n",
+ pgdir, prev, sizeof(suspend_pagedir_t)); */
+ for (i = 0; i < ONE_PAGE_PBE_NUM; i++) {
+ pgdir[i].dummy.val = 0;
+ pgdir[i].address = 0;
+ pgdir[i].orig_address = 0;
+ if (prev)
+ prev[i].dummy.val= (unsigned long)pgdir;
+ }
+
+ return (pgdir);
+}
+
+/* calc_nums - Determine the nums of allocation needed for pagedir_save. */
+static int calc_nums(int nr_copy)
+{
+ int diff = 0, ret = 0;
+ do {
+ diff = (nr_copy / ONE_PAGE_PBE_NUM) - ret + 1;
+ if (diff) {
+ ret += diff;
+ nr_copy += diff;
+ }
+ } while (diff);
+ return nr_copy;
+}
+
+
+/*
+ * alloc_pagedir
+ *
+ * @param pbe
+ * @param pbe_nums
+ * @param collide
+ * @param page_nums
+ *
+ */
+static int alloc_pagedir(struct pbe **pbe, int pbe_nums,
+ unsigned char *collide, int page_nums)
+{
+ unsigned int nums = 0;
+ unsigned int after_alloc = pbe_nums;
+ suspend_pagedir_t *prev = NULL, *cur = NULL;
+
+ if (page_nums)
+ after_alloc = ONE_PAGE_PBE_NUM * page_nums;
+ else
+ after_alloc = calc_nums(after_alloc);
+
+ pr_debug("alloc_pagedir: %d, %d\n", pbe_nums, after_alloc);
+ for (nums = 0 ; nums < after_alloc ; nums += ONE_PAGE_PBE_NUM) {
+ cur = alloc_one_pagedir(prev, collide);
+ pr_debug("alloc_one_pagedir: %p\n", cur);
+ if (!cur) { /* get page failed */
+ goto no_mem;
+ }
+ if (nums == 0) { /* setup the head */
+ *pbe = cur;
+ }
+ prev = cur;
+ }
+ return after_alloc - pbe_nums;
+
+no_mem:
+ pagedir_free(*pbe);
+ *pbe = NULL;
+
+ return (-ENOMEM);
+}
+
+static char *page_cache_buf = NULL;
+static int alloc_pagecache_buf(void)
+{
+ page_cache_buf = (char *)__get_free_pages(GFP_ATOMIC /*| __GFP_NOWARN*/, 0);
+ if (!page_cache_buf) {
+ /* FIXME try shrink memory */
+ return -ENOMEM;
+ }
+ return 0;
+}
+static int free_pagecache_buf(void)
+{
+ free_page((unsigned long)page_cache_buf);
+ return 0;
+}
+
+int swsusp_post_resume(void)
+{
+ int error = 0, index;
+ struct pbe *pos, *next;
+
+#ifdef CONFIG_PREEMPT
+ preempt_enable();
+#endif
+ if (swsusp_pagecache == 0) {
+ goto end;
+ }
+
+ local_irq_disable();
+ dpm_power_up_tree(swsusp_dev_tree);
+ local_irq_enable();
+ device_resume_tree(swsusp_dev_tree);
+
+ mod_progress = nr_copy_page_caches / 100;
+
+ printk( "Reading PageCaches from swap (%d pages)... ",
+ nr_copy_page_caches);
+ pbe_for_each_safe(pos, next, index, nr_copy_page_caches,
+ pagedir_cache) {
+ swp_entry_t entry;
+
+ error = read_one_pbe(pos, page_cache_buf, index);
+ if (error) break;
+ memcpy((void*)pos->address, page_cache_buf, PAGE_SIZE);
+ entry = pos->swap_address;
+ if (entry.val)
+ swap_free(entry);
+ }
+ printk("\b\b\b\bdone\n");
+
+ free_pagecache_buf();
+ swsusp_pagecache = 1;
+end:
+ local_irq_disable();
+ dpm_power_up_tree(&default_device_tree);
+ local_irq_enable();
+ device_resume_tree(&default_device_tree);
+ device_resume_tree(&default_device_tree);
+ free_suspend_device_tree();
+
+ return error;
+}
+
+static int page_caches_write(void)
+{
+ int error = 0, index;
+ struct pbe *pos, *next;
+
+ mod_progress = nr_copy_page_caches / 100;
+
+ printk( "Writing PageCaches to swap (%d pages)... ",
+ nr_copy_page_caches);
+ pbe_for_each_safe(pos, next, index, nr_copy_page_caches,
+ pagedir_cache) {
+ memcpy(page_cache_buf, (void*)pos->address, PAGE_SIZE);
+ error = write_one_pbe(pos, page_cache_buf, index);
+ if (error) break;
+ }
+ printk("\b\b\b\bdone\n");
+
+ return error;
+}
+
+static int setup_pagedir_pbe(void)
+{
+ struct zone *zone;
+
+ nr_copy_page_caches = 0;
+ for_each_zone(zone) {
+ if (!is_highmem(zone)) {
+ count_page_caches(zone, 1);
+ }
+ }
+
+ return 0;
+}
+
+static void count_data_pages(void);
+static int swsusp_alloc(void);
+
+static int page_caches_recal(int resume)
+{
+ struct zone *zone;
+ int i;
+
+ if (swsusp_pagecache == 0 || resume == 1) return 0;
+
+ for (i = 0; i < max_mapnr; i++)
+ ClearPagePcs(mem_map+i);
+
+ nr_copy_page_caches = 0;
+ drain_local_pages();
+ for_each_zone(zone) {
+ if (!is_highmem(zone)) {
+ nr_copy_page_caches += count_page_caches(zone, 0);
+ }
+ }
+ i = calc_nums(nr_copy_page_caches);
+
+ return (i / ONE_PAGE_PBE_NUM + 1);
+}
+
+static int inline swsusp_need_pages(int resume)
+{
+ return nr_copy_pages + page_caches_recal(resume) + PAGES_FOR_IO;
+}
+
+static int swsusp_check_memory(int resume)
+{
+ int retry = 20 * 5; /* wait no memory can swap for 20 sec */
+
+ if (!resume) {
+ count_data_pages();
+ }
+
+ printk("swsusp: need %d + %d pages, freed %d pages ... ",
+ nr_copy_pages + PAGES_FOR_IO, page_caches_recal(resume),
+ nr_free_pages());
+ if (nr_free_pages() > swsusp_need_pages(resume)) {
+ printk(" done\n");
+ return 0;
+ }
+
+ do {
+ int diff = swsusp_need_pages(resume) - nr_free_pages();
+
+ if (diff < 0) break;
+ if (shrink_all_memory(diff * 2) == 0) {
+ retry --;
+ } else {
+ retry = 0;
+ }
+ current->state = TASK_INTERRUPTIBLE;
+ schedule_timeout(HZ/5);
+ if (!resume) {
+ drain_local_pages();
+ count_data_pages();
+ }
+ printk("\b\b\b\b\b%5d", diff);
+ } while (retry);
+
+ printk("swsusp: need %d + %d pages, freed %d pages ... ",
+ nr_copy_pages + PAGES_FOR_IO, page_caches_recal(resume),
+ nr_free_pages());
+
+ if (nr_free_pages() < swsusp_need_pages(resume)) {
+ printk(" failed\n");
+ return -ENOMEM;
+ }
+ printk(" done\n");
+
+ return 0;
+}
+
+int swsusp_prepare_suspend(void)
+{
+ int error = 0;
+
+ if ((error = setup_suspend_device_tree())) {
+ return error;
+ }
+ if (swsusp_check_memory(0)) {
+ free_suspend_device_tree();
+ return -ENOMEM;
+ }
+ /* exept swap device and parent from the tree */
+ if ((error = swsusp_swap_check(swsusp_dev_tree))) {
+ free_suspend_device_tree();
+ return error;
+ }
+
+ /* power all device execpt swap device and the parent */
+ BUG_ON(irqs_disabled());
+ device_suspend_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_disable();
+ device_power_down_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_enable();
+
+ if (swsusp_pagecache) {
+ if ((error = alloc_pagecache_buf())) {
+ swsusp_pagecache = 0;
+ }
+ }
+ if (swsusp_pagecache) {
+ if (alloc_pagedir(&pagedir_cache, nr_copy_page_caches, NULL, 0) < 0)
+ swsusp_pagecache = 0;
+ else
+ swsusp_pagecache = 2;
+ }
+
+ drain_local_pages();
+ count_data_pages();
+ error = swsusp_alloc();
+ if (error) {
+ printk("swsusp_alloc failed, %d\n", error);
+ free_suspend_device_tree();
+ return error;
+ }

+ drain_local_pages();
+ count_data_pages();
+ printk("swsusp: need to copy %u pages, %u page_caches\n",
+ nr_copy_pages, nr_copy_page_caches);
+
+ if (swsusp_pagecache) {
+ setup_pagedir_pbe();
+ pr_debug("after setup_pagedir_pbe \n");
+
+ error = page_caches_write();
+ if (error) {
+ free_suspend_device_tree();
+ return error;
+ }
+ }
+
+ return 0;
+}

static int pfn_is_nosave(unsigned long pfn)
{
@@ -539,7 +1201,10 @@ static int saveable(struct zone * zone,
}
if (PageNosaveFree(page))
return 0;
-
+ if (PagePcs(page) && swsusp_pagecache) {
+ BUG_ON(zone->nr_inactive == 0 && zone->nr_active == 0);
+ return 0;
+ }
return 1;
}

@@ -559,12 +1224,10 @@ static void count_data_pages(void)
}
}

-
static void copy_data_pages(void)
{
struct zone *zone;
unsigned long zone_pfn;
- struct pbe * pbe = pagedir_nosave;
int pages_copied = 0;

for_each_zone(zone) {
@@ -574,11 +1237,16 @@ static void copy_data_pages(void)
for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
if (saveable(zone, &zone_pfn)) {
struct page * page;
+ struct pbe * pbe = find_pbe_by_index(pagedir_nosave,
+ pages_copied);
+ BUG_ON(pbe == NULL);
+ if (pbe->address == 0)
+ panic("copy_data_pages: %d copied\n", pages_copied);
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
pbe->orig_address = (long) page_address(page);
+ BUG_ON(pbe->orig_address == 0);
/* copy_page is not usable for copying task structs. */
memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
- pbe++;
pages_copied++;
}
}
@@ -587,85 +1255,18 @@ static void copy_data_pages(void)
nr_copy_pages = pages_copied;
}

-
-/**
- * calc_order - Determine the order of allocation needed for pagedir_save.
- *
- * This looks tricky, but is just subtle. Please fix it some time.
- * Since there are %nr_copy_pages worth of pages in the snapshot, we need
- * to allocate enough contiguous space to hold
- * (%nr_copy_pages * sizeof(struct pbe)),
- * which has the saved/orig locations of the page..
- *
- * SUSPEND_PD_PAGES() tells us how many pages we need to hold those
- * structures, then we call get_bitmask_order(), which will tell us the
- * last bit set in the number, starting with 1. (If we need 30 pages, that
- * is 0x0000001e in hex. The last bit is the 5th, which is the order we
- * would use to allocate 32 contiguous pages).
- *
- * Since we also need to save those pages, we add the number of pages that
- * we need to nr_copy_pages, and in case of an overflow, do the
- * calculation again to update the number of pages needed.
- *
- * With this model, we will tend to waste a lot of memory if we just cross
- * an order boundary. Plus, the higher the order of allocation that we try
- * to do, the more likely we are to fail in a low-memory situtation
- * (though we're unlikely to get this far in such a case, since swsusp
- * requires half of memory to be free anyway).
- */
-
-
-static void calc_order(void)
-{
- int diff = 0;
- int order = 0;
-
- do {
- diff = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages)) - order;
- if (diff) {
- order += diff;
- nr_copy_pages += 1 << diff;
- }
- } while(diff);
- pagedir_order = order;
-}
-
-
-/**
- * alloc_pagedir - Allocate the page directory.
- *
- * First, determine exactly how many contiguous pages we need and
- * allocate them.
- */
-
-static int alloc_pagedir(void)
-{
- calc_order();
- pagedir_save = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC | __GFP_COLD,
- pagedir_order);
- if (!pagedir_save)
- return -ENOMEM;
- memset(pagedir_save, 0, (1 << pagedir_order) * PAGE_SIZE);
- pagedir_nosave = pagedir_save;
- return 0;
-}
-
/**
* free_image_pages - Free pages allocated for snapshot
*/
-
static void free_image_pages(void)
{
- struct pbe * p;
- int i;
+ struct pbe *pos, *next;
+ int index;

- p = pagedir_save;
- for (i = 0, p = pagedir_save; i < nr_copy_pages; i++, p++) {
- if (p->address) {
- ClearPageNosave(virt_to_page(p->address));
- free_page(p->address);
- p->address = 0;
- }
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_save) {
+ ClearPageNosave(virt_to_page(pos->address));
+ free_page(pos->address);
+ pos->address = 0;
}
}

@@ -673,17 +1274,16 @@ static void free_image_pages(void)
* alloc_image_pages - Allocate pages for the snapshot.
*
*/
-
static int alloc_image_pages(void)
{
- struct pbe * p;
- int i;
+ struct pbe *pos, *next;
+ int index;

- for (i = 0, p = pagedir_save; i < nr_copy_pages; i++, p++) {
- p->address = get_zeroed_page(GFP_ATOMIC | __GFP_COLD);
- if (!p->address)
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_save) {
+ pos->address = (unsigned long)get_zeroed_page(GFP_ATOMIC | __GFP_COLD);
+ if (!pos->address)
return -ENOMEM;
- SetPageNosave(virt_to_page(p->address));
+ SetPageNosave(virt_to_page(pos->address));
}
return 0;
}
@@ -693,28 +1293,9 @@ void swsusp_free(void)
BUG_ON(PageNosave(virt_to_page(pagedir_save)));
BUG_ON(PageNosaveFree(virt_to_page(pagedir_save)));
free_image_pages();
- free_pages((unsigned long) pagedir_save, pagedir_order);
+ pagedir_free(pagedir_save);
}

-
-/**
- * enough_free_mem - Make sure we enough free memory to snapshot.
- *
- * Returns TRUE or FALSE after checking the number of available
- * free pages.
- */
-
-static int enough_free_mem(void)
-{
- if (nr_free_pages() < (nr_copy_pages + PAGES_FOR_IO)) {
- pr_debug("swsusp: Not enough free pages: Have %d\n",
- nr_free_pages());
- return 0;
- }
- return 1;
-}
-
-
/**
* enough_swap - Make sure we have enough swap to save the image.
*
@@ -730,7 +1311,7 @@ static int enough_swap(void)
struct sysinfo i;

si_swapinfo(&i);
- if (i.freeswap < (nr_copy_pages + PAGES_FOR_IO)) {
+ if (i.freeswap < (nr_copy_pages + nr_copy_page_caches + PAGES_FOR_IO)) {
pr_debug("swsusp: Not enough swap. Need %ld\n",i.freeswap);
return 0;
}
@@ -741,34 +1322,30 @@ static int swsusp_alloc(void)
{
int error;

- pr_debug("suspend: (pages needed: %d + %d free: %d)\n",
- nr_copy_pages, PAGES_FOR_IO, nr_free_pages());
-
pagedir_nosave = NULL;
- if (!enough_free_mem())
- return -ENOMEM;

if (!enough_swap())
return -ENOSPC;
-
- if ((error = alloc_pagedir())) {
- pr_debug("suspend: Allocating pagedir failed.\n");
- return error;
+ error = alloc_pagedir(&pagedir_save, nr_copy_pages, NULL, 0);
+ if (error < 0) {
+ printk("suspend: Allocating pagedir failed.\n");
+ return -ENOMEM;
}
+ pr_debug("alloc_pagedir: addon %d\n", error);
+ nr_copy_pages += error;
if ((error = alloc_image_pages())) {
- pr_debug("suspend: Allocating image pages failed.\n");
+ printk("suspend: Allocating image pages failed.\n");
swsusp_free();
return error;
}
+ pagedir_nosave = pagedir_save;

- pagedir_order_check = pagedir_order;
return 0;
}

int suspend_prepare_image(void)
{
- unsigned int nr_needed_pages;
- int error;
+ BUG_ON(!irqs_disabled());

pr_debug("swsusp: critical section: \n");
if (save_highmem()) {
@@ -777,15 +1354,6 @@ int suspend_prepare_image(void)
return -ENOMEM;
}

- drain_local_pages();
- count_data_pages();
- printk("swsusp: Need to copy %u pages\n",nr_copy_pages);
- nr_needed_pages = nr_copy_pages + PAGES_FOR_IO;
-
- error = swsusp_alloc();
- if (error)
- return error;
-
/* During allocating of suspend pagedir, new cold pages may appear.
* Kill them.
*/
@@ -811,7 +1379,6 @@ int suspend_prepare_image(void)
int swsusp_write(void)
{
int error;
- device_resume();
lock_swapdevices();
error = write_suspend_image();
/* This will unlock ignored swap devices since writing is finished */
@@ -820,17 +1387,11 @@ int swsusp_write(void)

}

-
extern asmlinkage int swsusp_arch_suspend(void);
extern asmlinkage int swsusp_arch_resume(void);

-
asmlinkage int swsusp_save(void)
{
- int error = 0;
-
- if ((error = swsusp_swap_check()))
- return error;
return suspend_prepare_image();
}

@@ -839,34 +1400,66 @@ int swsusp_suspend(void)
int error;
if ((error = arch_prepare_suspend()))
return error;
+
+ BUG_ON(irqs_disabled());
+ /* suspend swap device */
+ device_suspend_tree(PMSG_FREEZE, swsusp_dev_tree);
+
+ mb();
+ barrier();
+
+#ifdef CONFIG_PREEMPT
+ preempt_disable();
+#endif
local_irq_disable();
+ device_power_down_tree(PMSG_FREEZE, swsusp_dev_tree);
sysdev_suspend(PMSG_FREEZE);
+
save_processor_state();
error = swsusp_arch_suspend();
/* Restore control flow magically appears here */
restore_processor_state();
restore_highmem();
+
+ BUG_ON(!irqs_disabled());
sysdev_resume();
+
+ dpm_power_up_tree(swsusp_dev_tree);
local_irq_enable();
+ device_resume_tree(swsusp_dev_tree);
+
return error;
}


asmlinkage int swsusp_restore(void)
{
- BUG_ON (pagedir_order_check != pagedir_order);
-
/* Even mappings of "global" things (vmalloc) need to be fixed */
+#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
__flush_tlb_global();
wbinvd(); /* Nigel says wbinvd here is good idea... */
+#endif
return 0;
}

int swsusp_resume(void)
{
int error;
+
+ /* power all device execpt swap device and the parent */
+ BUG_ON(irqs_disabled());
+ device_suspend_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_disable();
+ device_power_down_tree(PMSG_FREEZE, &default_device_tree);
+ local_irq_enable();
+
+#ifdef CONFIG_PREEMPT
+ preempt_disable();
+#endif
+
local_irq_disable();
sysdev_suspend(PMSG_FREEZE);
+
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
error = swsusp_arch_resume();
@@ -881,99 +1474,6 @@ int swsusp_resume(void)
return error;
}

-
-
-/* More restore stuff */
-
-#define does_collide(addr) does_collide_order(pagedir_nosave, addr, 0)
-
-/*
- * Returns true if given address/order collides with any orig_address
- */
-static int __init does_collide_order(suspend_pagedir_t *pagedir, unsigned long addr,
- int order)
-{
- int i;
- unsigned long addre = addr + (PAGE_SIZE<<order);
-
- for (i=0; i < nr_copy_pages; i++)
- if ((pagedir+i)->orig_address >= addr &&
- (pagedir+i)->orig_address < addre)
- return 1;
-
- return 0;
-}
-
-/*
- * We check here that pagedir & pages it points to won't collide with pages
- * where we're going to restore from the loaded pages later
- */
-static int __init check_pagedir(void)
-{
- int i;
-
- for(i=0; i < nr_copy_pages; i++) {
- unsigned long addr;
-
- do {
- addr = get_zeroed_page(GFP_ATOMIC);
- if(!addr)
- return -ENOMEM;
- } while (does_collide(addr));
-
- (pagedir_nosave+i)->address = addr;
- }
- return 0;
-}
-
-static int __init swsusp_pagedir_relocate(void)
-{
- /*
- * We have to avoid recursion (not to overflow kernel stack),
- * and that's why code looks pretty cryptic
- */
- suspend_pagedir_t *old_pagedir = pagedir_nosave;
- void **eaten_memory = NULL;
- void **c = eaten_memory, *m, *f;
- int ret = 0;
-
- printk("Relocating pagedir ");
-
- if (!does_collide_order(old_pagedir, (unsigned long)old_pagedir, pagedir_order)) {
- printk("not necessary\n");
- return check_pagedir();
- }
-
- while ((m = (void *) __get_free_pages(GFP_ATOMIC, pagedir_order)) != NULL) {
- if (!does_collide_order(old_pagedir, (unsigned long)m, pagedir_order))
- break;
- eaten_memory = m;
- printk( "." );
- *eaten_memory = c;
- c = eaten_memory;
- }
-
- if (!m) {
- printk("out of memory\n");
- ret = -ENOMEM;
- } else {
- pagedir_nosave =
- memcpy(m, old_pagedir, PAGE_SIZE << pagedir_order);
- }
-
- c = eaten_memory;
- while (c) {
- printk(":");
- f = c;
- c = *c;
- free_pages((unsigned long)f, pagedir_order);
- }
- if (ret)
- return ret;
- printk("|\n");
- return check_pagedir();
-}
-
/**
* Using bio to read from swap.
* This code requires a bit more work than just using buffer heads
@@ -1038,12 +1538,12 @@ static int submit(int rw, pgoff_t page_o
return error;
}

-int bio_read_page(pgoff_t page_off, void * page)
+static int bio_read_page(pgoff_t page_off, void * page)
{
return submit(READ, page_off, page);
}

-int bio_write_page(pgoff_t page_off, void * page)
+static int bio_write_page(pgoff_t page_off, void * page)
{
return submit(WRITE, page_off, page);
}
@@ -1088,7 +1588,6 @@ static int __init check_header(void)
return -EPERM;
}
nr_copy_pages = swsusp_info.image_pages;
- pagedir_order = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages));
return error;
}

@@ -1115,62 +1614,167 @@ static int __init check_sig(void)
return error;
}

+
+static void __init eat_progress(void)
+{
+ char *eaten_progess = "-\\|/";
+ static int eaten_i = 0;
+
+ printk("\b%c", eaten_progess[eaten_i]);
+ eaten_i ++;
+ if (eaten_i > 3) eaten_i = 0;
+}
+
+static int __init check_one_pbe(struct pbe *p, void *collide, int cur)
+{
+ unsigned long addr = 0;
+
+ pr_debug("check_one_pbe: %p %lu o{%p} ",
+ p, p->swap_address.val, (void*)p->orig_address);
+ addr = (unsigned long)swsusp_get_safe_free_page(collide);
+ if(!addr)
+ return -ENOMEM;
+ pr_debug("c{%p} done\n", (void*)addr);
+ p->address = addr;
+
+ return 0;
+}
+
+static void __init swsusp_copy_pagedir(suspend_pagedir_t *d_pgdir,
+ suspend_pagedir_t *s_pgdir)
+{
+ int i = 0;
+
+ while (s_pgdir != NULL) {
+ suspend_pagedir_t *s_next = (suspend_pagedir_t *)s_pgdir->dummy.val;
+ suspend_pagedir_t *d_next = (suspend_pagedir_t *)d_pgdir->dummy.val;
+ for (i = 0; i < ONE_PAGE_PBE_NUM; i++) {
+ d_pgdir->address = s_pgdir->address;
+ d_pgdir->orig_address = s_pgdir->orig_address;
+ d_pgdir->swap_address = s_pgdir->swap_address;
+ s_pgdir ++; d_pgdir ++;
+ }
+ d_pgdir = d_next;
+ s_pgdir = s_next;
+ };
+}
+/*
+ * We check here that pagedir & pages it points to won't collide with pages
+ * where we're going to restore from the loaded pages later
+ */
+static int __init check_pagedir(void)
+{
+ void **c, *f;
+ struct pbe *next, *pos;
+ int error, index;
+ suspend_pagedir_t *addr = NULL;
+ unsigned char *bitmap = collide_bitmap_init(pagedir_nosave);
+
+ BUG_ON(bitmap == NULL);
+
+ printk("Relocating pagedir ... ");
+ error = alloc_pagedir(&addr, nr_copy_pages, bitmap,
+ swsusp_info.pagedir_pages);
+ if (error < 0) {
+ return error;
+ }
+ swsusp_copy_pagedir(addr, pagedir_nosave);
+ pagedir_free(pagedir_nosave);
+
+ /* check copy address */
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, addr) {
+ error = check_one_pbe(pos, bitmap, index);
+ BUG_ON(error);
+ }
+
+ /* free eaten memory */
+ c = eaten_memory;
+ while (c) {
+ eat_progress();
+ f = c;
+ c = *c;
+ free_pages((unsigned long)f, 0);
+ }
+ /* free unused memory */
+ collide_bitmap_free(bitmap);
+ printk(" done\n");
+
+ pagedir_nosave = addr;
+
+ return 0;
+}
+
/**
* swsusp_read_data - Read image pages from swap.
*
- * You do not need to check for overlaps, check_pagedir()
- * already did that.
*/
-
static int __init data_read(void)
{
- struct pbe * p;
- int error;
- int i;
- int mod = nr_copy_pages / 100;
-
- if (!mod)
- mod = 1;
+ int error = 0, index;
+ struct pbe *pos, *next;

- if ((error = swsusp_pagedir_relocate()))
+ if ((error = swsusp_check_memory(1))) {
return error;
+ }
+
+ if ((error = check_pagedir())) {
+ return -ENOMEM;
+ }
+
+ mod_progress = nr_copy_pages / 100;

printk( "Reading image data (%d pages): ", nr_copy_pages );
- for(i = 0, p = pagedir_nosave; i < nr_copy_pages && !error; i++, p++) {
- if (!(i%mod))
- printk( "\b\b\b\b%3d%%", i / mod );
- error = bio_read_page(swp_offset(p->swap_address),
- (void *)p->address);
+ pbe_for_each_safe(pos, next, index, nr_copy_pages, pagedir_nosave) {
+ error = read_one_pbe(pos, (void*)pos->address, index);
+ if (error) break;
}
- printk(" %d done.\n",i);
- return error;
+ printk(" %d done.\n", index);

+ return error;
}

extern dev_t __init name_to_dev_t(const char *line);

-static int __init read_pagedir(void)
+static int __init read_one_pagedir(suspend_pagedir_t *pgdir, int i)
{
- unsigned long addr;
- int i, n = swsusp_info.pagedir_pages;
+ unsigned long offset = swp_offset(swsusp_info.pagedir[i]);
+ unsigned long next;
int error = 0;

- addr = __get_free_pages(GFP_ATOMIC, pagedir_order);
- if (!addr)
- return -ENOMEM;
- pagedir_nosave = (struct pbe *)addr;
+ next = pgdir->dummy.val;
+ pr_debug("read_one_pagedir: %p, %d, %lu, %p\n",
+ pgdir, i, offset, (void*)next);
+ if ((error = bio_read_page(offset, (void *)pgdir))) {
+ return error;
+ }
+ pgdir->dummy.val = next;

- pr_debug("pmdisk: Reading pagedir (%d Pages)\n",n);
+ return error;
+}

- for (i = 0; i < n && !error; i++, addr += PAGE_SIZE) {
- unsigned long offset = swp_offset(swsusp_info.pagedir[i]);
- if (offset)
- error = bio_read_page(offset, (void *)addr);
- else
- error = -EFAULT;
- }
- if (error)
- free_pages((unsigned long)pagedir_nosave, pagedir_order);
+/*
+ * reading pagedir from swap device
+ */
+static int __init read_pagedir(void)
+{
+ int i = 0, n = swsusp_info.pagedir_pages;
+ int error = 0;
+ suspend_pagedir_t *pgdir, *next;
+
+ error = alloc_pagedir(&pagedir_nosave, nr_copy_pages, NULL, n);
+ if (error < 0)
+ return -ENOMEM;
+
+ printk("pmdisk: Reading pagedir (%d Pages)\n",n);
+ pgdir_for_each_safe(pgdir, next, pagedir_nosave) {
+ error = read_one_pagedir(pgdir, i);
+ if (error) break;
+ i++;
+ }
+ BUG_ON(i != n);
+ if (error)
+ pagedir_free(pagedir_nosave);
+
return error;
}

@@ -1185,7 +1789,7 @@ static int __init read_suspend_image(voi
if ((error = read_pagedir()))
return error;
if ((error = data_read()))
- free_pages((unsigned long)pagedir_nosave, pagedir_order);
+ pagedir_free(pagedir_nosave);
return error;
}

@@ -1200,14 +1804,14 @@ int __init swsusp_read(void)
if (!strlen(resume_file))
return -ENOENT;

- resume_device = name_to_dev_t(resume_file);
+ swsusp_resume_device = name_to_dev_t(resume_file);
pr_debug("swsusp: Resume From Partition: %s\n", resume_file);

- resume_bdev = open_by_devnum(resume_device, FMODE_READ);
+ resume_bdev = open_by_devnum(swsusp_resume_device, FMODE_READ);
if (!IS_ERR(resume_bdev)) {
set_blocksize(resume_bdev, PAGE_SIZE);
error = read_suspend_image();
- blkdev_put(resume_bdev);
+ /* blkdev_put(resume_bdev); */
} else
error = PTR_ERR(resume_bdev);

--- 2.6.9-lzf//kernel/sys.c 2004-11-28 23:17:11.000000000 +0800
+++ 2.6.9/kernel/sys.c 2004-11-28 23:16:53.000000000 +0800
@@ -84,7 +84,7 @@ int cad_pid = 1;
* and the like.
*/

-static struct notifier_block *reboot_notifier_list;
+struct notifier_block *reboot_notifier_list;
rwlock_t notifier_lock = RW_LOCK_UNLOCKED;

/**
--- 2.6.9-lzf//kernel/sysctl.c 2004-11-28 23:17:12.000000000 +0800
+++ 2.6.9/kernel/sysctl.c 2004-11-28 23:16:55.000000000 +0800
@@ -66,6 +66,10 @@ extern int min_free_kbytes;
extern int printk_ratelimit_jiffies;
extern int printk_ratelimit_burst;

+#if defined(CONFIG_SOFTWARE_SUSPEND)
+extern int swsusp_pagecache;
+#endif
+
#if defined(CONFIG_X86_LOCAL_APIC) && defined(__i386__)
int unknown_nmi_panic;
extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *,
@@ -792,6 +796,18 @@ static ctl_table vm_table[] = {
.strategy = &sysctl_intvec,
.extra1 = &zero,
},
+#if defined(CONFIG_SOFTWARE_SUSPEND)
+ {
+ .ctl_name = VM_SWSUSP_PAGECACHE,
+ .procname = "swsusp_pagecache",
+ .data = &swsusp_pagecache,
+ .maxlen = sizeof(swsusp_pagecache),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ .extra1 = &zero,
+ },
+#endif
{
.ctl_name = VM_BLOCK_DUMP,
.procname = "block_dump",
--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-28 16:39:46

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [3/6]

On Mon, Nov 29, 2004 at 12:23:20AM +0800, [email protected] wrote:
> Hi Pavel Machek, Nigel Cunningham:
>
> device-tree.diff
> base from suspend2 with a little changed.
>
> core.diff
> 1: redefine struct pbe for using _no_ continuous as pagedir.
> 2: make shrink memory as little as possible.
> 3: using a bitmap speed up collide check in page relocating.
> 4: pagecache saving ready.
>
> i386.diff
> ppc.diff
> i386 and powerpc suspend update.
>
> pagecachs_addon.diff
> if enable page caches saving, must using it, it making saving
> pagecaches safe. idea from suspend2.
>
> ppcfix.diff
> fix compile error.
> $ gcc -v
> ....
> gcc version 2.95.4 20011002 (Debian prerelease)
>
> I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
> PowerPC and x86 PC with Highmem and prepempt option enabled.
>
> I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
> mainline kernel ASAP, :). from I view point device-tree.diff is
> very usefuly when using pagecache saving and pagecachs_addon.diff
> that's really hack for making pagecache saving safe.
>

--- 2.6.9-lzf//arch/i386/kernel/signal.c 2004-11-28 23:17:23.000000000 +0800
+++ 2.6.9/arch/i386/kernel/signal.c 2004-11-28 23:16:59.000000000 +0800
@@ -587,6 +587,7 @@ int fastcall do_signal(struct pt_regs *r

if (current->flags & PF_FREEZE) {
refrigerator(0);
+ recalc_sigpending();
if (!signal_pending(current))
goto no_signal;
}
--- 2.6.9-lzf//arch/i386/power/swsusp.S 2004-11-26 12:32:45.000000000 +0800
+++ 2.6.9/arch/i386/power/swsusp.S 2004-11-28 23:16:59.000000000 +0800
@@ -31,24 +31,33 @@ ENTRY(swsusp_arch_resume)
movl $swsusp_pg_dir-__PAGE_OFFSET,%ecx
movl %ecx,%cr3

- movl pagedir_nosave, %ebx
- xorl %eax, %eax
- xorl %edx, %edx
- .p2align 4,,7
-
-copy_loop:
- movl 4(%ebx,%edx),%edi
- movl (%ebx,%edx),%esi
-
- movl $1024, %ecx
- rep
- movsl
-
- incl %eax
- addl $16, %edx
- cmpl nr_copy_pages,%eax
- jb copy_loop
- .p2align 4,,7
+ movl pagedir_nosave, %eax
+ test %eax, %eax
+ je copy_loop_end
+ movl $1024, %edx
+
+copy_loop_start:
+ movl 0xc(%eax), %ebp
+ xorl %ebx, %ebx
+ leal 0x0(%esi),%esi
+
+copy_one_pgdir:
+ movl 0x4(%eax),%edi
+ test %edi, %edi
+ je copy_loop_end
+
+ movl (%eax), %esi
+ movl %edx, %ecx
+ repz movsl %ds:(%esi),%es:(%edi)
+
+ incl %ebx
+ addl $0x10, %eax
+ cmpl $0xff, %ebx
+ jbe copy_one_pgdir
+ test %ebp, %ebp
+ movl %ebp, %eax
+ jne copy_loop_start
+copy_loop_end:

movl saved_context_esp, %esp
movl saved_context_ebp, %ebp
--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-28 16:44:37

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [5/6]

On Mon, Nov 29, 2004 at 12:23:20AM +0800, [email protected] wrote:
> Hi Pavel Machek, Nigel Cunningham:
>
> device-tree.diff
> base from suspend2 with a little changed.
>
> core.diff
> 1: redefine struct pbe for using _no_ continuous as pagedir.
> 2: make shrink memory as little as possible.
> 3: using a bitmap speed up collide check in page relocating.
> 4: pagecache saving ready.
>
> i386.diff
> ppc.diff
> i386 and powerpc suspend update.
>
> pagecachs_addon.diff
> if enable page caches saving, must using it, it making saving
> pagecaches safe. idea from suspend2.
>
> ppcfix.diff
> fix compile error.
> $ gcc -v
> ....
> gcc version 2.95.4 20011002 (Debian prerelease)
>
> I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
> PowerPC and x86 PC with Highmem and prepempt option enabled.
>
> I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
> mainline kernel ASAP, :). from I view point device-tree.diff is
> very usefuly when using pagecache saving and pagecachs_addon.diff
> that's really hack for making pagecache saving safe.
>

--- 2.6.9-lzf/kernel/sched.c 2004-11-28 23:17:11.000000000 +0800
+++ 2.6.9/kernel/sched.c 2004-11-28 23:16:54.000000000 +0800
@@ -2656,6 +2656,12 @@ asmlinkage void __sched schedule(void)
* Otherwise, whine if we are scheduling when we should not be.
*/
if (likely(!(current->state & (TASK_DEAD | TASK_ZOMBIE)))) {
+#ifdef CONFIG_PM
+ extern int swsusp_pagecache;
+ if (unlikely(swsusp_pagecache == 2)) /* slient warning message when
+ writing pagecache */
+#endif
+
if (unlikely(in_atomic())) {
printk(KERN_ERR "bad: scheduling while atomic!\n");
dump_stack();
--- 2.6.9-lzf/mm/page-writeback.c 2004-11-25 14:06:02.000000000 +0800
+++ 2.6.9/mm/page-writeback.c 2004-11-29 00:07:13.000000000 +0800
@@ -359,6 +359,9 @@ static void wb_kupdate(unsigned long arg
unsigned long start_jif;
unsigned long next_jif;
long nr_to_write;
+#ifdef CONFIG_PM
+ extern int swsusp_pagecache;
+#endif
struct writeback_state wbs;
struct writeback_control wbc = {
.bdi = NULL,
@@ -369,6 +372,14 @@ static void wb_kupdate(unsigned long arg
.for_kupdate = 1,
};

+#ifdef CONFIG_PM
+ if (unlikely(swsusp_pagecache == 2)) {
+ start_jif = jiffies;
+ next_jif = start_jif + (dirty_writeback_centisecs * HZ) / 100;
+ goto out;
+ }
+#endif
+
sync_supers();

get_writeback_state(&wbs);
@@ -389,6 +400,7 @@ static void wb_kupdate(unsigned long arg
}
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
}
+out:
if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ;
if (dirty_writeback_centisecs)
--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-28 16:44:08

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [4/6]

On Mon, Nov 29, 2004 at 12:23:20AM +0800, [email protected] wrote:
> Hi Pavel Machek, Nigel Cunningham:
>
> device-tree.diff
> base from suspend2 with a little changed.
>
> core.diff
> 1: redefine struct pbe for using _no_ continuous as pagedir.
> 2: make shrink memory as little as possible.
> 3: using a bitmap speed up collide check in page relocating.
> 4: pagecache saving ready.
>
> i386.diff
> ppc.diff
> i386 and powerpc suspend update.
>
> pagecachs_addon.diff
> if enable page caches saving, must using it, it making saving
> pagecaches safe. idea from suspend2.
>
> ppcfix.diff
> fix compile error.
> $ gcc -v
> ....
> gcc version 2.95.4 20011002 (Debian prerelease)
>
> I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
> PowerPC and x86 PC with Highmem and prepempt option enabled.
>
> I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
> mainline kernel ASAP, :). from I view point device-tree.diff is
> very usefuly when using pagecache saving and pagecachs_addon.diff
> that's really hack for making pagecache saving safe.

--- 2.6.9-lzf/drivers/ide/ppc/pmac.c 2004-11-26 12:33:06.000000000 +0800
+++ 2.6.9/drivers/ide/ppc/pmac.c 2004-11-28 23:17:00.000000000 +0800
@@ -32,6 +32,7 @@
#include <linux/notifier.h>
#include <linux/reboot.h>
#include <linux/pci.h>
+#include <linux/pm.h>
#include <linux/adb.h>
#include <linux/pmu.h>

@@ -1364,7 +1365,7 @@ pmac_ide_macio_suspend(struct macio_dev
ide_hwif_t *hwif = (ide_hwif_t *)dev_get_drvdata(&mdev->ofdev.dev);
int rc = 0;

- if (state != mdev->ofdev.dev.power_state && state >= 2) {
+ if (state != mdev->ofdev.dev.power_state && state == PM_SUSPEND_MEM) {
rc = pmac_ide_do_suspend(hwif);
if (rc == 0)
mdev->ofdev.dev.power_state = state;
@@ -1472,7 +1473,7 @@ pmac_ide_pci_suspend(struct pci_dev *pde
ide_hwif_t *hwif = (ide_hwif_t *)pci_get_drvdata(pdev);
int rc = 0;

- if (state != pdev->dev.power_state && state >= 2) {
+ if (state != pdev->dev.power_state && state == PM_SUSPEND_MEM ) {
rc = pmac_ide_do_suspend(hwif);
if (rc == 0)
pdev->dev.power_state = state;
--- 2.6.9-lzf/drivers/macintosh/Kconfig 2004-11-26 12:33:06.000000000 +0800
+++ 2.6.9/drivers/macintosh/Kconfig 2004-11-28 23:17:00.000000000 +0800
@@ -80,7 +80,7 @@ config ADB_PMU

config PMAC_PBOOK
bool "Power management support for PowerBooks"
- depends on ADB_PMU
+ depends on PM && ADB_PMU
---help---
This provides support for putting a PowerBook to sleep; it also
enables media bay support. Power management works on the
@@ -97,11 +97,6 @@ config PMAC_PBOOK
have it autoloaded. The act of removing the module shuts down the
sound hardware for more power savings.

-config PM
- bool
- depends on PPC_PMAC && ADB_PMU && PMAC_PBOOK
- default y
-
config PMAC_APM_EMU
tristate "APM emulation"
depends on PMAC_PBOOK
--- 2.6.9-lzf/drivers/macintosh/via-pmu.c 2004-11-26 12:33:07.000000000 +0800
+++ 2.6.9/drivers/macintosh/via-pmu.c 2004-11-28 23:17:00.000000000 +0800
@@ -43,6 +43,7 @@
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/device.h>
+#include <linux/sysdev.h>
#include <linux/suspend.h>
#include <linux/syscalls.h>
#include <asm/prom.h>
@@ -2326,7 +2327,7 @@ pmac_suspend_devices(void)
/* Sync the disks. */
/* XXX It would be nice to have some way to ensure that
* nobody is dirtying any new buffers while we wait. That
- * could be acheived using the refrigerator for processes
+ * could be achieved using the refrigerator for processes
* that swsusp uses
*/
sys_sync();
@@ -2379,7 +2380,6 @@ pmac_suspend_devices(void)

/* Wait for completion of async backlight requests */
while (!bright_req_1.complete || !bright_req_2.complete ||
-
!batt_req.complete)
pmu_poll();

@@ -3048,6 +3048,88 @@ pmu_polled_request(struct adb_request *r
}
#endif /* DEBUG_SLEEP */

+
+/* FIXME: This is a temporary set of callbacks to enable us
+ * to do suspend-to-disk.
+ */
+
+#ifdef CONFIG_PM
+
+static int pmu_sys_suspended = 0;
+
+static int pmu_sys_suspend(struct sys_device *sysdev, pm_message_t state)
+{
+ if (state != PMSG_FREEZE || pmu_sys_suspended)
+ return 0;
+
+ /* Suspend PMU event interrupts */
+ pmu_suspend();
+
+ pmu_sys_suspended = 1;
+ return 0;
+}
+
+static int pmu_sys_resume(struct sys_device *sysdev)
+{
+ struct adb_request req;
+
+ if (!pmu_sys_suspended)
+ return 0;
+
+ /* Tell PMU we are ready */
+ pmu_request(&req, NULL, 2, PMU_SYSTEM_READY, 2);
+ pmu_wait_complete(&req);
+
+ /* Resume PMU event interrupts */
+ pmu_resume();
+
+ pmu_sys_suspended = 0;
+
+ return 0;
+}
+
+#endif /* CONFIG_PM */
+
+static struct sysdev_class pmu_sysclass = {
+ set_kset_name("pmu"),
+};
+
+static struct sys_device device_pmu = {
+ .id = 0,
+ .cls = &pmu_sysclass,
+};
+
+static struct sysdev_driver driver_pmu = {
+#ifdef CONFIG_PM
+ .suspend = &pmu_sys_suspend,
+ .resume = &pmu_sys_resume,
+#endif /* CONFIG_PM */
+};
+
+static int __init init_pmu_sysfs(void)
+{
+ int rc;
+
+ rc = sysdev_class_register(&pmu_sysclass);
+ if (rc) {
+ printk(KERN_ERR "Failed registering PMU sys class\n");
+ return -ENODEV;
+ }
+ rc = sysdev_register(&device_pmu);
+ if (rc) {
+ printk(KERN_ERR "Failed registering PMU sys device\n");
+ return -ENODEV;
+ }
+ rc = sysdev_driver_register(&pmu_sysclass, &driver_pmu);
+ if (rc) {
+ printk(KERN_ERR "Failed registering PMU sys driver\n");
+ return -ENODEV;
+ }
+ return 0;
+}
+
+subsys_initcall(init_pmu_sysfs);
+
EXPORT_SYMBOL(pmu_request);
EXPORT_SYMBOL(pmu_poll);
EXPORT_SYMBOL(pmu_poll_adb);
--- /dev/null 2004-06-07 18:45:47.000000000 +0800
+++ 2.6.9/include/asm-ppc/suspend.h 2004-11-28 23:16:57.000000000 +0800
@@ -0,0 +1,12 @@
+static inline int arch_prepare_suspend(void)
+{
+ return 0;
+}
+
+static inline void save_processor_state(void)
+{
+}
+
+static inline void restore_processor_state(void)
+{
+}
--- 2.6.9-lzf/arch/ppc/Kconfig 2004-11-26 12:32:56.000000000 +0800
+++ 2.6.9/arch/ppc/Kconfig 2004-11-28 23:16:58.000000000 +0800
@@ -983,6 +983,8 @@ config PROC_HARDWARE

source "drivers/zorro/Kconfig"

+source kernel/power/Kconfig
+
endmenu

menu "Bus options"
--- 2.6.9-lzf/arch/ppc/kernel/Makefile 2004-11-26 12:32:56.000000000 +0800
+++ 2.6.9/arch/ppc/kernel/Makefile 2004-11-28 23:16:58.000000000 +0800
@@ -16,6 +16,7 @@ obj-y := entry.o traps.o irq.o idle.o
semaphore.o syscalls.o setup.o \
cputable.o ppc_htab.o
obj-$(CONFIG_6xx) += l2cr.o cpu_setup_6xx.o
+obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o
obj-$(CONFIG_POWER4) += cpu_setup_power4.o
obj-$(CONFIG_MODULES) += module.o ppc_ksyms.o
obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-mapping.o
--- 2.6.9-lzf/arch/ppc/kernel/signal.c 2004-11-26 12:32:57.000000000 +0800
+++ 2.6.9/arch/ppc/kernel/signal.c 2004-11-28 23:16:58.000000000 +0800
@@ -28,6 +28,7 @@
#include <linux/elf.h>
#include <linux/tty.h>
#include <linux/binfmts.h>
+#include <linux/suspend.h>
#include <asm/ucontext.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -604,6 +605,15 @@ int do_signal(sigset_t *oldset, struct p
unsigned long frame, newsp;
int signr, ret;

+ if (current->flags & PF_FREEZE) {
+ refrigerator(0);
+ signr = 0;
+ ret = regs->gpr[3];
+ recalc_sigpending();
+ if (!signal_pending(current))
+ goto no_signal;
+ }
+
if (!oldset)
oldset = &current->blocked;

@@ -626,6 +636,7 @@ int do_signal(sigset_t *oldset, struct p
regs->gpr[3] = EINTR;
/* note that the cr0.SO bit is already set */
} else {
+no_signal:
regs->nip -= 4; /* Back up & retry system call */
regs->result = 0;
regs->trap = 0;
--- /dev/null 2004-06-07 18:45:47.000000000 +0800
+++ 2.6.9/arch/ppc/kernel/swsusp.S 2004-11-28 23:16:57.000000000 +0800
@@ -0,0 +1,366 @@
+#include <linux/config.h>
+#include <linux/threads.h>
+#include <asm/processor.h>
+#include <asm/page.h>
+#include <asm/cputable.h>
+#include <asm/thread_info.h>
+#include <asm/ppc_asm.h>
+#include <asm/offsets.h>
+
+
+/*
+ * Structure for storing CPU registers on the save area.
+ */
+#define SL_SP 0
+#define SL_PC 4
+#define SL_MSR 8
+#define SL_SDR1 0xc
+#define SL_SPRG0 0x10 /* 4 sprg's */
+#define SL_DBAT0 0x20
+#define SL_IBAT0 0x28
+#define SL_DBAT1 0x30
+#define SL_IBAT1 0x38
+#define SL_DBAT2 0x40
+#define SL_IBAT2 0x48
+#define SL_DBAT3 0x50
+#define SL_IBAT3 0x58
+#define SL_TB 0x60
+#define SL_R2 0x68
+#define SL_CR 0x6c
+#define SL_LR 0x70
+#define SL_R12 0x74 /* r12 to r31 */
+#define SL_SIZE (SL_R12 + 80)
+
+ .section .data
+ .align 5
+
+_GLOBAL(swsusp_save_area)
+ .space SL_SIZE
+
+
+ .section .text
+ .align 5
+
+_GLOBAL(swsusp_arch_suspend)
+
+ lis r11,swsusp_save_area@h
+ ori r11,r11,swsusp_save_area@l
+
+ mflr r0
+ stw r0,SL_LR(r11)
+ mfcr r0
+ stw r0,SL_CR(r11)
+ stw r1,SL_SP(r11)
+ stw r2,SL_R2(r11)
+ stmw r12,SL_R12(r11)
+
+ /* Save MSR & SDR1 */
+ mfmsr r4
+ stw r4,SL_MSR(r11)
+ mfsdr1 r4
+ stw r4,SL_SDR1(r11)
+
+ /* Get a stable timebase and save it */
+1: mftbu r4
+ stw r4,SL_TB(r11)
+ mftb r5
+ stw r5,SL_TB+4(r11)
+ mftbu r3
+ cmpw r3,r4
+ bne 1b
+
+ /* Save SPRGs */
+ mfsprg r4,0
+ stw r4,SL_SPRG0(r11)
+ mfsprg r4,1
+ stw r4,SL_SPRG0+4(r11)
+ mfsprg r4,2
+ stw r4,SL_SPRG0+8(r11)
+ mfsprg r4,3
+ stw r4,SL_SPRG0+12(r11)
+
+ /* Save BATs */
+ mfdbatu r4,0
+ stw r4,SL_DBAT0(r11)
+ mfdbatl r4,0
+ stw r4,SL_DBAT0+4(r11)
+ mfdbatu r4,1
+ stw r4,SL_DBAT1(r11)
+ mfdbatl r4,1
+ stw r4,SL_DBAT1+4(r11)
+ mfdbatu r4,2
+ stw r4,SL_DBAT2(r11)
+ mfdbatl r4,2
+ stw r4,SL_DBAT2+4(r11)
+ mfdbatu r4,3
+ stw r4,SL_DBAT3(r11)
+ mfdbatl r4,3
+ stw r4,SL_DBAT3+4(r11)
+ mfibatu r4,0
+ stw r4,SL_IBAT0(r11)
+ mfibatl r4,0
+ stw r4,SL_IBAT0+4(r11)
+ mfibatu r4,1
+ stw r4,SL_IBAT1(r11)
+ mfibatl r4,1
+ stw r4,SL_IBAT1+4(r11)
+ mfibatu r4,2
+ stw r4,SL_IBAT2(r11)
+ mfibatl r4,2
+ stw r4,SL_IBAT2+4(r11)
+ mfibatu r4,3
+ stw r4,SL_IBAT3(r11)
+ mfibatl r4,3
+ stw r4,SL_IBAT3+4(r11)
+
+#if 0
+ /* Backup various CPU config stuffs */
+ bl __save_cpu_setup
+#endif
+ /* Call the low level suspend stuff (we should probably have made
+ * a stackframe...
+ */
+ bl swsusp_save
+
+ /* Restore LR from the save area */
+ lis r11,swsusp_save_area@h
+ ori r11,r11,swsusp_save_area@l
+ lwz r0,SL_LR(r11)
+ mtlr r0
+
+ blr
+
+
+/* Resume code */
+_GLOBAL(swsusp_arch_resume)
+
+ /* Stop pending alitvec streams and memory accesses */
+BEGIN_FTR_SECTION
+ DSSALL
+END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
+ sync
+
+ /* Disable MSR:DR to make sure we don't take a TLB or
+ * hash miss during the copy, as our hash table will
+ * for a while be unuseable. For .text, we assume we are
+ * covered by a BAT. This works only for non-G5 at this
+ * point. G5 will need a better approach, possibly using
+ * a small temporary hash table filled with large mappings,
+ * disabling the MMU completely isn't a good option for
+ * performance reasons.
+ * (Note that 750's may have the same performance issue as
+ * the G5 in this case, we should investigate using moving
+ * BATs for these CPUs)
+ */
+ mfmsr r0
+ sync
+ rlwinm r0,r0,0,28,26 /* clear MSR_DR */
+ mtmsr r0
+ sync
+ isync
+
+ /* Load ptr the list of pages to copy in r11 */
+ lis r9,pagedir_nosave@ha
+ addi r9,r9,pagedir_nosave@l
+ tophys(r9,r9)
+ lwz r9, 0(r9)
+#if 0
+ twi 31,r0,0 /* triger trap */
+#endif
+ cmpwi r9, 0
+ beq copy_loop_end
+copy_loop:
+ tophys(r9,r9)
+ lwz r6, 12(r9)
+ li r10, 0
+copy_one_pgdir:
+ lwz r11, 4(r9)
+ addi r8,r10,1
+ cmpwi r11, 0
+ addi r7,r9,16
+ beq copy_loop_end
+ li r0, 256
+ mtctr r0
+ lwz r9,0(r9)
+#if 0
+ twi 31,r0,0 /* triger trap */
+#endif
+ tophys(r10,r11)
+ tophys(r11,r9)
+copy_one_page:
+ lwz r0, 0(r11)
+ stw r0, 0(r10)
+ lwz r9, 4(r11)
+ stw r9, 4(r10)
+ lwz r0, 8(r11)
+ stw r0, 8(r10)
+ lwz r9, 12(r11)
+ addi r11,r11,16
+ stw r9, 12(r10)
+ addi r10,r10,16
+ bdnz copy_one_page
+ mr r10, r8
+ cmplwi r10, 255
+ mr r9, r7
+ ble copy_one_pgdir
+ mr r9, r6
+ bne copy_loop
+copy_loop_end:
+
+ /* Do a very simple cache flush/inval of the L1 to ensure
+ * coherency of the icache
+ */
+ lis r3,0x0002
+ mtctr r3
+ li r3, 0
+1:
+ lwz r0,0(r3)
+ addi r3,r3,0x0020
+ bdnz 1b
+ isync
+ sync
+
+ /* Now flush those cache lines */
+ lis r3,0x0002
+ mtctr r3
+ li r3, 0
+1:
+ dcbf 0,r3
+ addi r3,r3,0x0020
+ bdnz 1b
+ sync
+
+ /* Ok, we are now running with the kernel data of the old
+ * kernel fully restored. We can get to the save area
+ * easily now. As for the rest of the code, it assumes the
+ * loader kernel and the booted one are exactly identical
+ */
+ lis r11,swsusp_save_area@h
+ ori r11,r11,swsusp_save_area@l
+ tophys(r11,r11)
+
+#if 0
+ /* Restore various CPU config stuffs */
+ bl __restore_cpu_setup
+#endif
+ /* Restore the BATs, and SDR1. Then we can turn on the MMU.
+ * This is a bit hairy as we are running out of those BATs,
+ * but first, our code is probably in the icache, and we are
+ * writing the same value to the BAT, so that should be fine,
+ * though a better solution will have to be found long-term
+ */
+ lwz r4,SL_SDR1(r11)
+ mtsdr1 r4
+ lwz r4,SL_SPRG0(r11)
+ mtsprg 0,r4
+ lwz r4,SL_SPRG0+4(r11)
+ mtsprg 1,r4
+ lwz r4,SL_SPRG0+8(r11)
+ mtsprg 2,r4
+ lwz r4,SL_SPRG0+12(r11)
+ mtsprg 3,r4
+
+#if 0
+ lwz r4,SL_DBAT0(r11)
+ mtdbatu 0,r4
+ lwz r4,SL_DBAT0+4(r11)
+ mtdbatl 0,r4
+ lwz r4,SL_DBAT1(r11)
+ mtdbatu 1,r4
+ lwz r4,SL_DBAT1+4(r11)
+ mtdbatl 1,r4
+ lwz r4,SL_DBAT2(r11)
+ mtdbatu 2,r4
+ lwz r4,SL_DBAT2+4(r11)
+ mtdbatl 2,r4
+ lwz r4,SL_DBAT3(r11)
+ mtdbatu 3,r4
+ lwz r4,SL_DBAT3+4(r11)
+ mtdbatl 3,r4
+ lwz r4,SL_IBAT0(r11)
+ mtibatu 0,r4
+ lwz r4,SL_IBAT0+4(r11)
+ mtibatl 0,r4
+ lwz r4,SL_IBAT1(r11)
+ mtibatu 1,r4
+ lwz r4,SL_IBAT1+4(r11)
+ mtibatl 1,r4
+ lwz r4,SL_IBAT2(r11)
+ mtibatu 2,r4
+ lwz r4,SL_IBAT2+4(r11)
+ mtibatl 2,r4
+ lwz r4,SL_IBAT3(r11)
+ mtibatu 3,r4
+ lwz r4,SL_IBAT3+4(r11)
+ mtibatl 3,r4
+#endif
+
+BEGIN_FTR_SECTION
+ li r4,0
+ mtspr SPRN_DBAT4U,r4
+ mtspr SPRN_DBAT4L,r4
+ mtspr SPRN_DBAT5U,r4
+ mtspr SPRN_DBAT5L,r4
+ mtspr SPRN_DBAT6U,r4
+ mtspr SPRN_DBAT6L,r4
+ mtspr SPRN_DBAT7U,r4
+ mtspr SPRN_DBAT7L,r4
+ mtspr SPRN_IBAT4U,r4
+ mtspr SPRN_IBAT4L,r4
+ mtspr SPRN_IBAT5U,r4
+ mtspr SPRN_IBAT5L,r4
+ mtspr SPRN_IBAT6U,r4
+ mtspr SPRN_IBAT6L,r4
+ mtspr SPRN_IBAT7U,r4
+ mtspr SPRN_IBAT7L,r4
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS)
+
+ /* Flush all TLBs */
+ lis r4,0x1000
+1: addic. r4,r4,-0x1000
+ tlbie r4
+ blt 1b
+ sync
+
+ /* restore the MSR and turn on the MMU */
+ lwz r3,SL_MSR(r11)
+ bl turn_on_mmu
+ tovirt(r11,r11)
+
+ /* Restore TB */
+ li r3,0
+ mttbl r3
+ lwz r3,SL_TB(r11)
+ lwz r4,SL_TB+4(r11)
+ mttbu r3
+ mttbl r4
+
+ /* Kick decrementer */
+ li r0,1
+ mtdec r0
+
+ /* Restore the callee-saved registers and return */
+ lwz r0,SL_CR(r11)
+ mtcr r0
+ lwz r2,SL_R2(r11)
+ lmw r12,SL_R12(r11)
+ lwz r1,SL_SP(r11)
+ lwz r0,SL_LR(r11)
+ mtlr r0
+
+ // XXX Note: we don't really need to call swsusp_resume
+
+ li r3,0
+ blr
+
+/* FIXME:This construct is actually not useful since we don't shut
+ * down the instruction MMU, we could just flip back MSR-DR on.
+ */
+turn_on_mmu:
+ mflr r4
+ mtsrr0 r4
+ mtsrr1 r3
+ sync
+ isync
+ rfi
+
--- 2.6.9-lzf/arch/ppc/kernel/vmlinux.lds.S 2004-11-26 12:32:57.000000000 +0800
+++ 2.6.9/arch/ppc/kernel/vmlinux.lds.S 2004-11-28 23:16:58.000000000 +0800
@@ -74,6 +74,12 @@ SECTIONS
CONSTRUCTORS
}

+ . = ALIGN(4096);
+ __nosave_begin = .;
+ .data_nosave : { *(.data.nosave) }
+ . = ALIGN(4096);
+ __nosave_end = .;
+
. = ALIGN(32);
.data.cacheline_aligned : { *(.data.cacheline_aligned) }

--- 2.6.9-lzf/arch/ppc/platforms/pmac_feature.c 2004-11-27 17:33:17.000000000 +0800
+++ 2.6.9/arch/ppc/platforms/pmac_feature.c 2004-11-28 23:16:59.000000000 +0800
@@ -2146,7 +2146,7 @@ static struct pmac_mb_def pmac_mb_defs[]
},
{ "PowerBook6,1", "PowerBook G4 12\"",
PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features,
- PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE,
+ PMAC_MB_CAN_SLEEP | PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE,
},
{ "PowerBook6,2", "PowerBook G4",
PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features,
--- 2.6.9-lzf/arch/ppc/platforms/pmac_setup.c 2004-11-26 12:32:57.000000000 +0800
+++ 2.6.9/arch/ppc/platforms/pmac_setup.c 2004-11-28 23:16:59.000000000 +0800
@@ -51,6 +51,7 @@
#include <linux/irq.h>
#include <linux/seq_file.h>
#include <linux/root_dev.h>
+#include <linux/suspend.h>

#include <asm/reg.h>
#include <asm/sections.h>
@@ -70,6 +71,8 @@
#include <asm/pmac_feature.h>
#include <asm/time.h>
#include <asm/of_device.h>
+#include <asm/mmu_context.h>
+
#include "pmac_pic.h"
#include "mem_pieces.h"

@@ -420,11 +423,67 @@ find_boot_device(void)
#endif
}

+/* TODO: Merge the suspend-to-ram with the common code !!!
+ * currently, this is a stub implementation for suspend-to-disk
+ * only
+ */
+
+#ifdef CONFIG_PM
+
+extern void enable_kernel_altivec(void);
+
+static int pmac_pm_prepare(suspend_state_t state)
+{
+ printk(KERN_DEBUG "pmac_pm_prepare(%d)\n", state);
+
+ return 0;
+}
+
+static int pmac_pm_enter(suspend_state_t state)
+{
+ printk(KERN_DEBUG "pmac_pm_enter(%d)\n", state);
+
+ /* Giveup the lazy FPU & vec so we don't have to back them
+ * up from the low level code
+ */
+ enable_kernel_fp();
+
+#ifdef CONFIG_ALTIVEC
+ if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC)
+ enable_kernel_altivec();
+#endif /* CONFIG_ALTIVEC */
+
+ return 0;
+}
+
+static int pmac_pm_finish(suspend_state_t state)
+{
+ printk(KERN_DEBUG "pmac_pm_finish(%d)\n", state);
+
+ /* Restore userland MMU context */
+ set_context(current->active_mm->context, current->active_mm->pgd);
+
+ return 0;
+}
+
+static struct pm_ops pmac_pm_ops = {
+ .pm_disk_mode = PM_DISK_SHUTDOWN,
+ .prepare = pmac_pm_prepare,
+ .enter = pmac_pm_enter,
+ .finish = pmac_pm_finish,
+};
+
+#endif /* CONFIG_PM */
+
static int initializing = 1;

static int pmac_late_init(void)
{
initializing = 0;
+
+#ifdef CONFIG_PM
+ pm_set_ops(&pmac_pm_ops);
+#endif /* CONFIG_PM */
return 0;
}

--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-28 16:51:31

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

On Mon, Nov 29, 2004 at 12:23:20AM +0800, [email protected] wrote:
> Hi Pavel Machek, Nigel Cunningham:
>
> device-tree.diff
> base from suspend2 with a little changed.
>
> core.diff
> 1: redefine struct pbe for using _no_ continuous as pagedir.
> 2: make shrink memory as little as possible.
> 3: using a bitmap speed up collide check in page relocating.
> 4: pagecache saving ready.
>
> i386.diff
> ppc.diff
> i386 and powerpc suspend update.
>
> pagecachs_addon.diff
> if enable page caches saving, must using it, it making saving
> pagecaches safe. idea from suspend2.
>
> ppcfix.diff
> fix compile error.
> $ gcc -v
> ....
> gcc version 2.95.4 20011002 (Debian prerelease)
>
> I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
> PowerPC and x86 PC with Highmem and prepempt option enabled.
>
> I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
> mainline kernel ASAP, :). from I view point device-tree.diff is
> very usefuly when using pagecache saving and pagecachs_addon.diff
> that's really hack for making pagecache saving safe.
>

--- 2.6.9-lzf/arch/ppc/syslib/open_pic.c 2004-11-26 12:32:58.000000000 +0800
+++ 2.6.9/arch/ppc/syslib/open_pic.c 2004-11-28 23:16:58.000000000 +0800
@@ -776,7 +776,8 @@ static void openpic_mapirq(u_int irq, cp
if (ISR[irq] == 0)
return;
if (!cpus_empty(keepmask)) {
- cpumask_t irqdest = { .bits[0] = openpic_read(&ISR[irq]->Destination) };
+ cpumask_t irqdest;
+ irqdest.bits[0] = openpic_read(&ISR[irq]->Destination);
cpus_and(irqdest, irqdest, keepmask);
cpus_or(physmask, physmask, irqdest);
}
--
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-28 16:59:36

by Pavel Machek

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

Hi!

I can not merge anything before 2.6.10. As you have seen, I have quite
a lot of patches in my tree, and I do not want mix them with these...

> device-tree.diff
> base from suspend2 with a little changed.

I do not want this one.

> core.diff
> 1: redefine struct pbe for using _no_ continuous as pagedir.

Can I get this one as a separate diff?

> 2: make shrink memory as little as possible.
> 3: using a bitmap speed up collide check in page relocating.
> 4: pagecache saving ready.
>
> i386.diff
> ppc.diff
> i386 and powerpc suspend update.

ppc changes look good, you should send them to ppc maintainer...

> pagecachs_addon.diff
> if enable page caches saving, must using it, it making saving
> pagecaches safe. idea from suspend2.
>
> ppcfix.diff
> fix compile error.
> $ gcc -v
> ....
> gcc version 2.95.4 20011002 (Debian prerelease)

Send this one to Andrew Morton, now, it is a bugfix.
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-28 17:06:38

by Pavel Machek

[permalink] [raw]
Subject: Re: software suspend patch [2/6]

Hi!

> @@ -222,8 +221,105 @@ static void lock_swapdevices(void)
> }
> swap_list_unlock();
> }
> +
> +#define ONE_PAGE_PBE_NUM (PAGE_SIZE/sizeof(struct pbe))
> +#define PBE_IS_PAGE_END(x) \
> + ( PAGE_SIZE - sizeof(struct pbe) == ((x) - ((~(PAGE_SIZE - 1)) & (x))) )
> +
> +#define pgdir_for_each_safe(pos, n, head) \
> + for(pos = head, n = pos ? (suspend_pagedir_t*)pos->dummy.val : NULL; \
> + pos != NULL; \
> + pos = n, n = pos ? (suspend_pagedir_t *)pos->dummy.val : NULL)
> +
> +#define pbe_for_each_safe(pos, n, index, max, head) \
> + for(pos = head, index = 0, \
> + n = pos ? (struct pbe *)pos->dummy.val : NULL; \
> + (pos != NULL) && (index < max); \
> + pos = (PBE_IS_PAGE_END((unsigned long)pos)) ? n : \
> + ((struct pbe *)((unsigned long)pos + sizeof(struct pbe))), \
> + index ++, \
> + n = pos ? (struct pbe*)pos->dummy.val : NULL)
> +

_safe suffix means it is safe to delete while traversing. I do not
think your macros can handle that, so you should not have _safe
suffix.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-28 17:13:18

by Pavel Machek

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

Hi!

> > device-tree.diff
> > base from suspend2 with a little changed.
> >
> > core.diff
> > 1: redefine struct pbe for using _no_ continuous as pagedir.
> > 2: make shrink memory as little as possible.
> > 3: using a bitmap speed up collide check in page relocating.
> > 4: pagecache saving ready.
> >
> > i386.diff
> > ppc.diff
> > i386 and powerpc suspend update.
> >
> > pagecachs_addon.diff
> > if enable page caches saving, must using it, it making saving
> > pagecaches safe. idea from suspend2.
> >
> > ppcfix.diff
> > fix compile error.
> > $ gcc -v
> > ....
> > gcc version 2.95.4 20011002 (Debian prerelease)
> >
> > I'm using 2.6.9-ck3 With above patch, swsusp1 works prefect in my
> > PowerPC and x86 PC with Highmem and prepempt option enabled.
> >
> > I hope the core.diff@1,@2,@3 i386.diff ppc.diff will merge into
> > mainline kernel ASAP, :). from I view point device-tree.diff is
> > very usefuly when using pagecache saving and pagecachs_addon.diff
> > that's really hack for making pagecache saving safe.
> >
>
> --- 2.6.9-lzf/arch/ppc/syslib/open_pic.c 2004-11-26 12:32:58.000000000 +0800
> +++ 2.6.9/arch/ppc/syslib/open_pic.c 2004-11-28 23:16:58.000000000 +0800
> @@ -776,7 +776,8 @@ static void openpic_mapirq(u_int irq, cp
> if (ISR[irq] == 0)
> return;
> if (!cpus_empty(keepmask)) {
> - cpumask_t irqdest = { .bits[0] = openpic_read(&ISR[irq]->Destination) };
> + cpumask_t irqdest;
> + irqdest.bits[0] = openpic_read(&ISR[irq]->Destination);
> cpus_and(irqdest, irqdest, keepmask);
> cpus_or(physmask, physmask, irqdest);
> }

ACK. Send this to Andrew Morton, Cc: Rusty trivial patch monkey
Russell <[email protected]>.
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-28 17:19:09

by Pavel Machek

[permalink] [raw]
Subject: Re: software suspend patch [4/6]

Hi!

> --- 2.6.9-lzf/drivers/ide/ppc/pmac.c 2004-11-26 12:33:06.000000000 +0800
> +++ 2.6.9/drivers/ide/ppc/pmac.c 2004-11-28 23:17:00.000000000 +0800
> @@ -32,6 +32,7 @@
> #include <linux/notifier.h>
> #include <linux/reboot.h>
> #include <linux/pci.h>
> +#include <linux/pm.h>
> #include <linux/adb.h>
> #include <linux/pmu.h>
>
> @@ -1364,7 +1365,7 @@ pmac_ide_macio_suspend(struct macio_dev
> ide_hwif_t *hwif = (ide_hwif_t *)dev_get_drvdata(&mdev->ofdev.dev);
> int rc = 0;
>
> - if (state != mdev->ofdev.dev.power_state && state >= 2) {
> + if (state != mdev->ofdev.dev.power_state && state == PM_SUSPEND_MEM) {
> rc = pmac_ide_do_suspend(hwif);
> if (rc == 0)
> mdev->ofdev.dev.power_state = state;
> @@ -1472,7 +1473,7 @@ pmac_ide_pci_suspend(struct pci_dev *pde
> ide_hwif_t *hwif = (ide_hwif_t *)pci_get_drvdata(pdev);
> int rc = 0;
>
> - if (state != pdev->dev.power_state && state >= 2) {
> + if (state != pdev->dev.power_state && state == PM_SUSPEND_MEM ) {
> rc = pmac_ide_do_suspend(hwif);
> if (rc == 0)
> pdev->dev.power_state = state;

Please wait with this one.

> --- 2.6.9-lzf/drivers/macintosh/Kconfig 2004-11-26 12:33:06.000000000 +0800
> +++ 2.6.9/drivers/macintosh/Kconfig 2004-11-28 23:17:00.000000000 +0800
> @@ -80,7 +80,7 @@ config ADB_PMU
>
> config PMAC_PBOOK
> bool "Power management support for PowerBooks"
> - depends on ADB_PMU
> + depends on PM && ADB_PMU
> ---help---
> This provides support for putting a PowerBook to sleep; it also
> enables media bay support. Power management works on the
> @@ -97,11 +97,6 @@ config PMAC_PBOOK
> have it autoloaded. The act of removing the module shuts down the
> sound hardware for more power savings.
>
> -config PM
> - bool
> - depends on PPC_PMAC && ADB_PMU && PMAC_PBOOK
> - default y
> -
> config PMAC_APM_EMU
> tristate "APM emulation"
> depends on PMAC_PBOOK

Ok, merge with BenH.

> --- 2.6.9-lzf/drivers/macintosh/via-pmu.c 2004-11-26 12:33:07.000000000 +0800
> +++ 2.6.9/drivers/macintosh/via-pmu.c 2004-11-28 23:17:00.000000000 +0800
> @@ -43,6 +43,7 @@
> #include <linux/init.h>
> #include <linux/interrupt.h>
> #include <linux/device.h>
> +#include <linux/sysdev.h>
> #include <linux/suspend.h>
> #include <linux/syscalls.h>
> #include <asm/prom.h>
> @@ -2326,7 +2327,7 @@ pmac_suspend_devices(void)
> /* Sync the disks. */
> /* XXX It would be nice to have some way to ensure that
> * nobody is dirtying any new buffers while we wait. That
> - * could be acheived using the refrigerator for processes
> + * could be achieved using the refrigerator for processes
> * that swsusp uses
> */
> sys_sync();
> @@ -2379,7 +2380,6 @@ pmac_suspend_devices(void)
>
> /* Wait for completion of async backlight requests */
> while (!bright_req_1.complete || !bright_req_2.complete ||
> -
> !batt_req.complete)
> pmu_poll();
>
> @@ -3048,6 +3048,88 @@ pmu_polled_request(struct adb_request *r
> }
> #endif /* DEBUG_SLEEP */
>
> +
> +/* FIXME: This is a temporary set of callbacks to enable us
> + * to do suspend-to-disk.
> + */
> +
> +#ifdef CONFIG_PM
> +
> +static int pmu_sys_suspended = 0;
> +
> +static int pmu_sys_suspend(struct sys_device *sysdev, pm_message_t state)
> +{
> + if (state != PMSG_FREEZE || pmu_sys_suspended)
> + return 0;
> +
> + /* Suspend PMU event interrupts */
> + pmu_suspend();
> +
> + pmu_sys_suspended = 1;
> + return 0;
> +}
> +
> +static int pmu_sys_resume(struct sys_device *sysdev)
> +{
> + struct adb_request req;
> +
> + if (!pmu_sys_suspended)
> + return 0;
> +
> + /* Tell PMU we are ready */
> + pmu_request(&req, NULL, 2, PMU_SYSTEM_READY, 2);
> + pmu_wait_complete(&req);
> +
> + /* Resume PMU event interrupts */
> + pmu_resume();
> +
> + pmu_sys_suspended = 0;
> +
> + return 0;
> +}
> +
> +#endif /* CONFIG_PM */
> +
> +static struct sysdev_class pmu_sysclass = {
> + set_kset_name("pmu"),
> +};
> +
> +static struct sys_device device_pmu = {
> + .id = 0,
> + .cls = &pmu_sysclass,
> +};
> +
> +static struct sysdev_driver driver_pmu = {
> +#ifdef CONFIG_PM
> + .suspend = &pmu_sys_suspend,
> + .resume = &pmu_sys_resume,
> +#endif /* CONFIG_PM */
> +};
> +
> +static int __init init_pmu_sysfs(void)
> +{
> + int rc;
> +
> + rc = sysdev_class_register(&pmu_sysclass);
> + if (rc) {
> + printk(KERN_ERR "Failed registering PMU sys class\n");
> + return -ENODEV;
> + }
> + rc = sysdev_register(&device_pmu);
> + if (rc) {
> + printk(KERN_ERR "Failed registering PMU sys device\n");
> + return -ENODEV;
> + }
> + rc = sysdev_driver_register(&pmu_sysclass, &driver_pmu);
> + if (rc) {
> + printk(KERN_ERR "Failed registering PMU sys driver\n");
> + return -ENODEV;
> + }
> + return 0;
> +}

The error handling is not okay:

> --- /dev/null 2004-06-07 18:45:47.000000000 +0800
> +++ 2.6.9/include/asm-ppc/suspend.h 2004-11-28 23:16:57.000000000 +0800
> @@ -0,0 +1,12 @@
> +static inline int arch_prepare_suspend(void)
> +{
> + return 0;
> +}
> +
> +static inline void save_processor_state(void)
> +{
> +}
> +
> +static inline void restore_processor_state(void)
> +{
> +}
> --- 2.6.9-lzf/arch/ppc/Kconfig 2004-11-26 12:32:56.000000000 +0800
> +++ 2.6.9/arch/ppc/Kconfig 2004-11-28 23:16:58.000000000 +0800
> @@ -983,6 +983,8 @@ config PROC_HARDWARE
>
> source "drivers/zorro/Kconfig"
>
> +source kernel/power/Kconfig
> +
> endmenu
>
> menu "Bus options"
> --- 2.6.9-lzf/arch/ppc/kernel/Makefile 2004-11-26 12:32:56.000000000 +0800
> +++ 2.6.9/arch/ppc/kernel/Makefile 2004-11-28 23:16:58.000000000 +0800
> @@ -16,6 +16,7 @@ obj-y := entry.o traps.o irq.o idle.o
> semaphore.o syscalls.o setup.o \
> cputable.o ppc_htab.o
> obj-$(CONFIG_6xx) += l2cr.o cpu_setup_6xx.o
> +obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o
> obj-$(CONFIG_POWER4) += cpu_setup_power4.o
> obj-$(CONFIG_MODULES) += module.o ppc_ksyms.o
> obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-mapping.o
> --- 2.6.9-lzf/arch/ppc/kernel/signal.c 2004-11-26 12:32:57.000000000 +0800
> +++ 2.6.9/arch/ppc/kernel/signal.c 2004-11-28 23:16:58.000000000 +0800
> @@ -28,6 +28,7 @@
> #include <linux/elf.h>
> #include <linux/tty.h>
> #include <linux/binfmts.h>
> +#include <linux/suspend.h>
> #include <asm/ucontext.h>
> #include <asm/uaccess.h>
> #include <asm/pgtable.h>
> @@ -604,6 +605,15 @@ int do_signal(sigset_t *oldset, struct p
> unsigned long frame, newsp;
> int signr, ret;
>
> + if (current->flags & PF_FREEZE) {
> + refrigerator(0);
> + signr = 0;
> + ret = regs->gpr[3];
> + recalc_sigpending();
> + if (!signal_pending(current))
> + goto no_signal;
> + }
> +
> if (!oldset)
> oldset = &current->blocked;
>
> @@ -626,6 +636,7 @@ int do_signal(sigset_t *oldset, struct p
> regs->gpr[3] = EINTR;
> /* note that the cr0.SO bit is already set */
> } else {
> +no_signal:
> regs->nip -= 4; /* Back up & retry system call */
> regs->result = 0;
> regs->trap = 0;

Ok, merge with BenH.

> --- /dev/null 2004-06-07 18:45:47.000000000 +0800
> +++ 2.6.9/arch/ppc/kernel/swsusp.S 2004-11-28 23:16:57.000000000 +0800
> @@ -0,0 +1,366 @@
> +#include <linux/config.h>
> +#include <linux/threads.h>
> +#include <asm/processor.h>
> +#include <asm/page.h>
> +#include <asm/cputable.h>
> +#include <asm/thread_info.h>
> +#include <asm/ppc_asm.h>
> +#include <asm/offsets.h>
> +
> +
> +/*
> + * Structure for storing CPU registers on the save area.
> + */
> +#define SL_SP 0
> +#define SL_PC 4
> +#define SL_MSR 8
> +#define SL_SDR1 0xc
> +#define SL_SPRG0 0x10 /* 4 sprg's */
> +#define SL_DBAT0 0x20
> +#define SL_IBAT0 0x28
> +#define SL_DBAT1 0x30
> +#define SL_IBAT1 0x38
> +#define SL_DBAT2 0x40
> +#define SL_IBAT2 0x48
> +#define SL_DBAT3 0x50
> +#define SL_IBAT3 0x58
> +#define SL_TB 0x60
> +#define SL_R2 0x68
> +#define SL_CR 0x6c
> +#define SL_LR 0x70
> +#define SL_R12 0x74 /* r12 to r31 */
> +#define SL_SIZE (SL_R12 + 80)
> +
> + .section .data
> + .align 5
> +
> +_GLOBAL(swsusp_save_area)
> + .space SL_SIZE
> +
> +
> + .section .text
> + .align 5
> +
> +_GLOBAL(swsusp_arch_suspend)
> +
> + lis r11,swsusp_save_area@h
> + ori r11,r11,swsusp_save_area@l
> +
> + mflr r0
> + stw r0,SL_LR(r11)
> + mfcr r0
> + stw r0,SL_CR(r11)
> + stw r1,SL_SP(r11)
> + stw r2,SL_R2(r11)
> + stmw r12,SL_R12(r11)
> +
> + /* Save MSR & SDR1 */
> + mfmsr r4
> + stw r4,SL_MSR(r11)
> + mfsdr1 r4
> + stw r4,SL_SDR1(r11)
> +
> + /* Get a stable timebase and save it */
> +1: mftbu r4
> + stw r4,SL_TB(r11)
> + mftb r5
> + stw r5,SL_TB+4(r11)
> + mftbu r3
> + cmpw r3,r4
> + bne 1b
> +
> + /* Save SPRGs */
> + mfsprg r4,0
> + stw r4,SL_SPRG0(r11)
> + mfsprg r4,1
> + stw r4,SL_SPRG0+4(r11)
> + mfsprg r4,2
> + stw r4,SL_SPRG0+8(r11)
> + mfsprg r4,3
> + stw r4,SL_SPRG0+12(r11)
> +
> + /* Save BATs */
> + mfdbatu r4,0
> + stw r4,SL_DBAT0(r11)
> + mfdbatl r4,0
> + stw r4,SL_DBAT0+4(r11)
> + mfdbatu r4,1
> + stw r4,SL_DBAT1(r11)
> + mfdbatl r4,1
> + stw r4,SL_DBAT1+4(r11)
> + mfdbatu r4,2
> + stw r4,SL_DBAT2(r11)
> + mfdbatl r4,2
> + stw r4,SL_DBAT2+4(r11)
> + mfdbatu r4,3
> + stw r4,SL_DBAT3(r11)
> + mfdbatl r4,3
> + stw r4,SL_DBAT3+4(r11)
> + mfibatu r4,0
> + stw r4,SL_IBAT0(r11)
> + mfibatl r4,0
> + stw r4,SL_IBAT0+4(r11)
> + mfibatu r4,1
> + stw r4,SL_IBAT1(r11)
> + mfibatl r4,1
> + stw r4,SL_IBAT1+4(r11)
> + mfibatu r4,2
> + stw r4,SL_IBAT2(r11)
> + mfibatl r4,2
> + stw r4,SL_IBAT2+4(r11)
> + mfibatu r4,3
> + stw r4,SL_IBAT3(r11)
> + mfibatl r4,3
> + stw r4,SL_IBAT3+4(r11)
> +
> +#if 0
> + /* Backup various CPU config stuffs */
> + bl __save_cpu_setup
> +#endif
> + /* Call the low level suspend stuff (we should probably have made
> + * a stackframe...
> + */
> + bl swsusp_save
> +
> + /* Restore LR from the save area */
> + lis r11,swsusp_save_area@h
> + ori r11,r11,swsusp_save_area@l
> + lwz r0,SL_LR(r11)
> + mtlr r0
> +
> + blr
> +
> +
> +/* Resume code */
> +_GLOBAL(swsusp_arch_resume)
> +
> + /* Stop pending alitvec streams and memory accesses */
> +BEGIN_FTR_SECTION
> + DSSALL
> +END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
> + sync
> +
> + /* Disable MSR:DR to make sure we don't take a TLB or
> + * hash miss during the copy, as our hash table will
> + * for a while be unuseable. For .text, we assume we are
> + * covered by a BAT. This works only for non-G5 at this
> + * point. G5 will need a better approach, possibly using
> + * a small temporary hash table filled with large mappings,
> + * disabling the MMU completely isn't a good option for
> + * performance reasons.
> + * (Note that 750's may have the same performance issue as
> + * the G5 in this case, we should investigate using moving
> + * BATs for these CPUs)
> + */
> + mfmsr r0
> + sync
> + rlwinm r0,r0,0,28,26 /* clear MSR_DR */
> + mtmsr r0
> + sync
> + isync
> +
> + /* Load ptr the list of pages to copy in r11 */
> + lis r9,pagedir_nosave@ha
> + addi r9,r9,pagedir_nosave@l
> + tophys(r9,r9)
> + lwz r9, 0(r9)
> +#if 0
> + twi 31,r0,0 /* triger trap */
> +#endif
> + cmpwi r9, 0
> + beq copy_loop_end
> +copy_loop:
> + tophys(r9,r9)
> + lwz r6, 12(r9)
> + li r10, 0
> +copy_one_pgdir:
> + lwz r11, 4(r9)
> + addi r8,r10,1
> + cmpwi r11, 0
> + addi r7,r9,16
> + beq copy_loop_end
> + li r0, 256
> + mtctr r0
> + lwz r9,0(r9)
> +#if 0
> + twi 31,r0,0 /* triger trap */
> +#endif
> + tophys(r10,r11)
> + tophys(r11,r9)
> +copy_one_page:
> + lwz r0, 0(r11)
> + stw r0, 0(r10)
> + lwz r9, 4(r11)
> + stw r9, 4(r10)
> + lwz r0, 8(r11)
> + stw r0, 8(r10)
> + lwz r9, 12(r11)
> + addi r11,r11,16
> + stw r9, 12(r10)
> + addi r10,r10,16
> + bdnz copy_one_page
> + mr r10, r8
> + cmplwi r10, 255
> + mr r9, r7
> + ble copy_one_pgdir
> + mr r9, r6
> + bne copy_loop
> +copy_loop_end:
> +
> + /* Do a very simple cache flush/inval of the L1 to ensure
> + * coherency of the icache
> + */
> + lis r3,0x0002
> + mtctr r3
> + li r3, 0
> +1:
> + lwz r0,0(r3)
> + addi r3,r3,0x0020
> + bdnz 1b
> + isync
> + sync
> +
> + /* Now flush those cache lines */
> + lis r3,0x0002
> + mtctr r3
> + li r3, 0
> +1:
> + dcbf 0,r3
> + addi r3,r3,0x0020
> + bdnz 1b
> + sync
> +
> + /* Ok, we are now running with the kernel data of the old
> + * kernel fully restored. We can get to the save area
> + * easily now. As for the rest of the code, it assumes the
> + * loader kernel and the booted one are exactly identical
> + */
> + lis r11,swsusp_save_area@h
> + ori r11,r11,swsusp_save_area@l
> + tophys(r11,r11)
> +
> +#if 0
> + /* Restore various CPU config stuffs */
> + bl __restore_cpu_setup
> +#endif
> + /* Restore the BATs, and SDR1. Then we can turn on the MMU.
> + * This is a bit hairy as we are running out of those BATs,
> + * but first, our code is probably in the icache, and we are
> + * writing the same value to the BAT, so that should be fine,
> + * though a better solution will have to be found long-term
> + */
> + lwz r4,SL_SDR1(r11)
> + mtsdr1 r4
> + lwz r4,SL_SPRG0(r11)
> + mtsprg 0,r4
> + lwz r4,SL_SPRG0+4(r11)
> + mtsprg 1,r4
> + lwz r4,SL_SPRG0+8(r11)
> + mtsprg 2,r4
> + lwz r4,SL_SPRG0+12(r11)
> + mtsprg 3,r4
> +
> +#if 0
> + lwz r4,SL_DBAT0(r11)
> + mtdbatu 0,r4
> + lwz r4,SL_DBAT0+4(r11)
> + mtdbatl 0,r4
> + lwz r4,SL_DBAT1(r11)
> + mtdbatu 1,r4
> + lwz r4,SL_DBAT1+4(r11)
> + mtdbatl 1,r4
> + lwz r4,SL_DBAT2(r11)
> + mtdbatu 2,r4
> + lwz r4,SL_DBAT2+4(r11)
> + mtdbatl 2,r4
> + lwz r4,SL_DBAT3(r11)
> + mtdbatu 3,r4
> + lwz r4,SL_DBAT3+4(r11)
> + mtdbatl 3,r4
> + lwz r4,SL_IBAT0(r11)
> + mtibatu 0,r4
> + lwz r4,SL_IBAT0+4(r11)
> + mtibatl 0,r4
> + lwz r4,SL_IBAT1(r11)
> + mtibatu 1,r4
> + lwz r4,SL_IBAT1+4(r11)
> + mtibatl 1,r4
> + lwz r4,SL_IBAT2(r11)
> + mtibatu 2,r4
> + lwz r4,SL_IBAT2+4(r11)
> + mtibatl 2,r4
> + lwz r4,SL_IBAT3(r11)
> + mtibatu 3,r4
> + lwz r4,SL_IBAT3+4(r11)
> + mtibatl 3,r4
> +#endif
> +
> +BEGIN_FTR_SECTION
> + li r4,0
> + mtspr SPRN_DBAT4U,r4
> + mtspr SPRN_DBAT4L,r4
> + mtspr SPRN_DBAT5U,r4
> + mtspr SPRN_DBAT5L,r4
> + mtspr SPRN_DBAT6U,r4
> + mtspr SPRN_DBAT6L,r4
> + mtspr SPRN_DBAT7U,r4
> + mtspr SPRN_DBAT7L,r4
> + mtspr SPRN_IBAT4U,r4
> + mtspr SPRN_IBAT4L,r4
> + mtspr SPRN_IBAT5U,r4
> + mtspr SPRN_IBAT5L,r4
> + mtspr SPRN_IBAT6U,r4
> + mtspr SPRN_IBAT6L,r4
> + mtspr SPRN_IBAT7U,r4
> + mtspr SPRN_IBAT7L,r4
> +END_FTR_SECTION_IFSET(CPU_FTR_HAS_HIGH_BATS)
> +
> + /* Flush all TLBs */
> + lis r4,0x1000
> +1: addic. r4,r4,-0x1000
> + tlbie r4
> + blt 1b
> + sync
> +
> + /* restore the MSR and turn on the MMU */
> + lwz r3,SL_MSR(r11)
> + bl turn_on_mmu
> + tovirt(r11,r11)
> +
> + /* Restore TB */
> + li r3,0
> + mttbl r3
> + lwz r3,SL_TB(r11)
> + lwz r4,SL_TB+4(r11)
> + mttbu r3
> + mttbl r4
> +
> + /* Kick decrementer */
> + li r0,1
> + mtdec r0
> +
> + /* Restore the callee-saved registers and return */
> + lwz r0,SL_CR(r11)
> + mtcr r0
> + lwz r2,SL_R2(r11)
> + lmw r12,SL_R12(r11)
> + lwz r1,SL_SP(r11)
> + lwz r0,SL_LR(r11)
> + mtlr r0
> +
> + // XXX Note: we don't really need to call swsusp_resume
> +
> + li r3,0
> + blr
> +
> +/* FIXME:This construct is actually not useful since we don't shut
> + * down the instruction MMU, we could just flip back MSR-DR on.
> + */
> +turn_on_mmu:
> + mflr r4
> + mtsrr0 r4
> + mtsrr1 r3
> + sync
> + isync
> + rfi
> +

This version will probably not work with 2.6.9 kernel (if you have
version that works with 2.6.9, it would be even better to merge that).

> --- 2.6.9-lzf/arch/ppc/kernel/vmlinux.lds.S 2004-11-26 12:32:57.000000000 +0800
> +++ 2.6.9/arch/ppc/kernel/vmlinux.lds.S 2004-11-28 23:16:58.000000000 +0800
> @@ -74,6 +74,12 @@ SECTIONS
> CONSTRUCTORS
> }
>
> + . = ALIGN(4096);
> + __nosave_begin = .;
> + .data_nosave : { *(.data.nosave) }
> + . = ALIGN(4096);
> + __nosave_end = .;
> +
> . = ALIGN(32);
> .data.cacheline_aligned : { *(.data.cacheline_aligned) }
>
> --- 2.6.9-lzf/arch/ppc/platforms/pmac_feature.c 2004-11-27 17:33:17.000000000 +0800
> +++ 2.6.9/arch/ppc/platforms/pmac_feature.c 2004-11-28 23:16:59.000000000 +0800
> @@ -2146,7 +2146,7 @@ static struct pmac_mb_def pmac_mb_defs[]
> },
> { "PowerBook6,1", "PowerBook G4 12\"",
> PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features,
> - PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE,
> + PMAC_MB_CAN_SLEEP | PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE,
> },
> { "PowerBook6,2", "PowerBook G4",
> PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features,

Ok, merge with BenH.

> --- 2.6.9-lzf/arch/ppc/platforms/pmac_setup.c 2004-11-26 12:32:57.000000000 +0800
> +++ 2.6.9/arch/ppc/platforms/pmac_setup.c 2004-11-28 23:16:59.000000000 +0800
> @@ -51,6 +51,7 @@
> #include <linux/irq.h>
> #include <linux/seq_file.h>
> #include <linux/root_dev.h>
> +#include <linux/suspend.h>
>
> #include <asm/reg.h>
> #include <asm/sections.h>
> @@ -70,6 +71,8 @@
> #include <asm/pmac_feature.h>
> #include <asm/time.h>
> #include <asm/of_device.h>
> +#include <asm/mmu_context.h>
> +
> #include "pmac_pic.h"
> #include "mem_pieces.h"
>
> @@ -420,11 +423,67 @@ find_boot_device(void)
> #endif
> }
>
> +/* TODO: Merge the suspend-to-ram with the common code !!!
> + * currently, this is a stub implementation for suspend-to-disk
> + * only
> + */
> +
> +#ifdef CONFIG_PM
> +
> +extern void enable_kernel_altivec(void);
> +
> +static int pmac_pm_prepare(suspend_state_t state)
> +{
> + printk(KERN_DEBUG "pmac_pm_prepare(%d)\n", state);
> +
> + return 0;
> +}
> +
> +static int pmac_pm_enter(suspend_state_t state)
> +{
> + printk(KERN_DEBUG "pmac_pm_enter(%d)\n", state);
> +
> + /* Giveup the lazy FPU & vec so we don't have to back them
> + * up from the low level code
> + */
> + enable_kernel_fp();
> +
> +#ifdef CONFIG_ALTIVEC
> + if (cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC)
> + enable_kernel_altivec();
> +#endif /* CONFIG_ALTIVEC */
> +
> + return 0;
> +}
> +
> +static int pmac_pm_finish(suspend_state_t state)
> +{
> + printk(KERN_DEBUG "pmac_pm_finish(%d)\n", state);
> +
> + /* Restore userland MMU context */
> + set_context(current->active_mm->context, current->active_mm->pgd);
> +
> + return 0;
> +}
> +
> +static struct pm_ops pmac_pm_ops = {
> + .pm_disk_mode = PM_DISK_SHUTDOWN,
> + .prepare = pmac_pm_prepare,
> + .enter = pmac_pm_enter,
> + .finish = pmac_pm_finish,
> +};
> +
> +#endif /* CONFIG_PM */
> +
> static int initializing = 1;
>
> static int pmac_late_init(void)
> {
> initializing = 0;
> +
> +#ifdef CONFIG_PM
> + pm_set_ops(&pmac_pm_ops);
> +#endif /* CONFIG_PM */
> return 0;
> }

Ok, merge with BenH.
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-28 21:39:45

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 43/51: Utility functions.

Hi.

On Sun, 2004-11-28 at 03:11, Dave Hansen wrote:
> On Thu, 2004-11-25 at 16:04, Nigel Cunningham wrote:
> > On Fri, 2004-11-26 at 10:46, Pavel Machek wrote:
> > > How many bits do you need? Two? I'd rather use thow two bits than have
> > > yet another abstraction. Also note that it is doing big order
> > > allocation.
> >
> > Three if checksumming is enabled IIRC. I'll happily use normal page
> > flags, but we only need them when suspending, and I understood they were
> > rarer than hen's teeth :>
> >
> > MM guys copied so they can tell me I'm wrong :>
>
> Please remember that, in almost all cases, any use of page->flags can be
> replaced by a simple list. Is a page marked foo? Well, just traverse
> this data structure and see if the page is in there. It might be a
> stinking slow check, but it will *work*.
>
> I think we're up to using 1 bit in the memory hotplug code, but we don't
> even need that if some operations can be implemented more slowly.

Yes. That's the way suspending did things initially like checking which
pages were free. The bitmap was added to turn O(n^2) into O(n). Since
the calculations can potentially be done a few times (as memory is freed
so we can suspend), it was a big gain to use a bitmap.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 21:44:11

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Fri, 2004-11-26 at 11:39, Pavel Machek wrote:
> I'm not *that* concerned about speed. Getting rid of order-8 is
> for preventing "sorry, not enough RAM to suspend to disk".

That's fine, but you're only expressing your preference. I'm going to
ignore the other postings from the weekend that essentially say the same
thing; there's no point.

--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 22:38:29

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

Hi.

On Fri, 2004-11-26 at 06:28, Pavel Machek wrote:
> > > Here we add simple hooks so that the user can interact with suspend
> > > while it is running. (Hmm. The serial console condition could be
> > > simplified :>). The hooks allow you to do such things as:

> > > - change the amount of detail of debugging info shown
>
> Use sysrq-X as you do during runtime.

No, I don't do this anymore. When I did, I had problems post-resume with
the keyboard handler sometimes thinking SysRq was still pressed.
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 22:38:23

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 19/51: Remove MTRR sysdev support.

On Fri, 2004-11-26 at 05:22, Pavel Machek wrote:
> Hi!
>
> > This patch removes sysdev support for MTRRs (potential SMP hang and
> > shouldn't be done with interrupts done anyway). Instead, we save and
> > restore MTRRs when entering and exiting the processor freezers (ie when
> > saving the registers & context for each CPU via an SMP call).
>
> This will break acpi s3...

MTRR support is via sysdev is by design broken (SMP deadlock possible),
so you need to add it to the right place in your S3 code. (ie, it's not
that I'm breaking S3. It's already broken, but works while you only
support suspending !SMP).

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 22:40:12

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Fri, 2004-11-26 at 23:38, Pavel Machek wrote:
> My machine suspends in 7 seconds, and that's swsusp1. According to
> your numbers, suspend2 should suspend it in 1 second and LZE
> compressed should be .5 second.

Seven seconds? How much memory is in use when you start, and how much is
actually written to disk? If you're starting with 1GB of RAM in use,
I'll sit up and listen, but I suspect you're talking about something
closer to 20MB and init S :>

These discussions are getting really unreasonable. "I don't want that
feature, therefore it shouldn't be merged" isn't a valid argument.
Neither is "Well, I can suspend in seven seconds with hardly any memory
in use." If you just don't want suspend2 in the kernel, come out and say
it. But please, stop giving me lame arguments (more below deleted rather
than replied to).

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 22:44:20

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 35/51: Code always built in to the kernel.

Hi Matthew.

On Sat, 2004-11-27 at 13:19, Matthew Garrett wrote:
> We have userspace to do this, surely? Make the standard method of
> triggering resume involve an initrd, and have a small application that
> does sanity checks before the resume. In case of failure, have it prompt
> the user. As long as it doesn't do bad things to the filesystem,
> there's no danger. There's no reason to do this in the kernel.

It was originally done in kernel space prior to us having initrd
support, as a small extension on what was already there. I don't see a
good reason to move it to working from an initrd because:

1) We're then assuming that everyone uses an initrd/initramfs, which is
not true
2) We need to provide a way for this userspace program to obtain from
the kernel the signature of the image and information about what we want
the signature to look like. It will also then need to be able to tell
the kernel to delete the image.
3) If you want the userspace program to actually read the signature
itself, the kernel still needs to tell the userspace program where to
find that signature (what device, block and blocksize). That device
can't be mounted/swapon'd to do this; it needs to be a raw read.
4) This whole method means there's even more code to maintain!

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 22:48:18

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 9/51: init/* changes.

Hi.

On Sun, 2004-11-28 at 00:21, Matthew Garrett wrote:
> Herbert Xu <[email protected]> wrote:
> > Pavel Machek <[email protected]> wrote:
> >> Given it is not too intrusive... why not. Send it for comments.
> >> I probably will not use this myself, so you'll need to test/maintain
> >> it.
> >
> > This shouldn't be necessary. Since the resume is being initiated by
> > userspace, it can perform the function of name_to_dev_t and just feed
> > the numbers to the kernel. The code to do that is still in Debian's
> > initrd-tools.
>
> Good point. Ok, what's the best way to present this to userspace? Add a
> /sys/power/resume and then echo a major:minor in there?

If you're ever going to add swapfile support, you're also going to want
to be able to specify the blocksize and block at which the swapheader
begins.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-28 23:40:11

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

Hi!

> > > > Here we add simple hooks so that the user can interact with suspend
> > > > while it is running. (Hmm. The serial console condition could be
> > > > simplified :>). The hooks allow you to do such things as:
>
> > > > - change the amount of detail of debugging info shown
> >
> > Use sysrq-X as you do during runtime.
>
> No, I don't do this anymore. When I did, I had problems post-resume with
> the keyboard handler sometimes thinking SysRq was still pressed.

Fix keyboard handler, then... It probably happens with other keys
beside SysRq, right?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-28 23:56:25

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > My machine suspends in 7 seconds, and that's swsusp1. According to
> > your numbers, suspend2 should suspend it in 1 second and LZE
> > compressed should be .5 second.
>
> Seven seconds? How much memory is in use when you start, and how much is
> actually written to disk? If you're starting with 1GB of RAM in use,
> I'll sit up and listen, but I suspect you're talking about something
> closer to 20MB and init S :>

It was on .5GB machine, with X running, IIRC. Specify how should I
load the system and I'll try it here. swsusp1 got some speedups with
O(n^2) killing (not yet merged).

> These discussions are getting really unreasonable. "I don't want that
> feature, therefore it shouldn't be merged" isn't a valid argument.
> Neither is "Well, I can suspend in seven seconds with hardly any memory
> in use." If you just don't want suspend2 in the kernel, come out and say
> it.

Ok, "I do not want suspend2 in kernel". Not what you'd call suspend2,
anyway. I thought that stripping down suspend2 then merging it is
reasonable way to go, but now it seems to me that enhancing swsusp1 is
easier way to go. At least I'll be able to do it incrementally.

I'm sorry about all the confusion, and you can still get that jpeg for
"put pavel into doom3".
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-29 03:23:59

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Mon, 2004-11-29 at 10:55, Pavel Machek wrote:
> Hi!
>
> > > My machine suspends in 7 seconds, and that's swsusp1. According to
> > > your numbers, suspend2 should suspend it in 1 second and LZE
> > > compressed should be .5 second.
> >
> > Seven seconds? How much memory is in use when you start, and how much is
> > actually written to disk? If you're starting with 1GB of RAM in use,
> > I'll sit up and listen, but I suspect you're talking about something
> > closer to 20MB and init S :>
>
> It was on .5GB machine, with X running, IIRC. Specify how should I
> load the system and I'll try it here. swsusp1 got some speedups with
> O(n^2) killing (not yet merged).

So it wrote .5GB of memory in seven seconds, or started with .5GB of RAM
in use?

If we want to compare apples with apples, we're going to have to make
the only difference which code is run. A normal load on my computer is
evolution, cyrus imapd, opera, win4lin running Libronix and a kernel
tree in the cache (last image sizes were 1000, 1002, 995, 949 and
910MB). I'm happy to run your sped-up code for some tests, if you'd
like. You know where to find mine if you want to make sure I'm not
cheating :>

> > These discussions are getting really unreasonable. "I don't want that
> > feature, therefore it shouldn't be merged" isn't a valid argument.
> > Neither is "Well, I can suspend in seven seconds with hardly any memory
> > in use." If you just don't want suspend2 in the kernel, come out and say
> > it.
>
> Ok, "I do not want suspend2 in kernel". Not what you'd call suspend2,
> anyway. I thought that stripping down suspend2 then merging it is
> reasonable way to go, but now it seems to me that enhancing swsusp1 is
> easier way to go. At least I'll be able to do it incrementally.

You'll be able to do that within limits, but once you do seriously given
up on the max-half-of-memory limit, you'll need some major redesigning.
If that's the way you want to go, okay. Assuming nothing else changes,
I'll just keep suspend2 alive outside of the kernel tree until you get
sick of users asking, and continue to enhance it.

> I'm sorry about all the confusion, and you can still get that jpeg for
> "put pavel into doom3".

I'm not taking it personally at all. I did find some of the objections
pretty petty and some of the comparisons grossly unfair, but I'm not
taking it personally.
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-29 10:58:28

by Rob Landley

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

On Wednesday 24 November 2004 08:02 am, Nigel Cunningham wrote:
> A plugin for verifying the consistency of an image. Working with kdb, it
> can look up the locations of variations. There will always be some
> variations shown, simply because we're touching memory before we get
> here and as we check the image.

A while back I suggested checking the last mount time of the mounted local
filesystems as a quick and dirty sanity check between loading the image and
unfreezing all the processes. (Since a read-only mount shouldn't touch this,
triggering swsusp resume from userspace after prodding various hardware
shouldn't cause a major problem either...) Does that sound like a good idea?

Haven't had time to look into it myself, though. (Just recently got time
enough to bang on busybox again. Somewhere around 2.6.7, software suspend
stopped working for me and I haven't even had a chance to track _that_ down
yet. Hopefully fixed in 2.6.9 or 2.6.10, I haven't played with it
recently...)

Rob

2004-11-29 11:08:09

by Stefan Seyfried

[permalink] [raw]
Subject: Re: Suspend 2 merge

Nigel Cunningham wrote:

> The cryptoapi provides support for both compression and encryption. I'd
> happily make use of that, but we still need a way for the user to choose
> what compression/encryption they want and configure it. I'm not at all

And encryption is in fact much more needed than compression. Remember,
you are writing everything in memory (including maybe ssh passphrases or
gpg keys) to swap in clear text. Not nice. And i agree that compression
is nice to have, too.

>>>:> But not everyone who uses 2.6.9 uses swsusp. :>

and not everyone who downloads suspend2 uses it ;-)

> change a parameter or forcing them to do an ls in /dev with obscure
> parameters (to get the major and minor numbers) when they already know
> they want /dev/sda1 isn't user friendly. Obviously user friendliness is

This can easily be done by a userspace helper. You do use the
(userspace) X server to display your GUI, don't you?
Putting only the absolutely necessary things into the kernel (the same
is true for the interactive resume thing - if someone wants interactive
startup at a failing resume, he has to use an initrd, i don't see a
problem with that) will probably increase the acceptance a bit :-)

Best regards,

Stefan

2004-11-29 13:06:30

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > > > My machine suspends in 7 seconds, and that's swsusp1. According to
> > > > your numbers, suspend2 should suspend it in 1 second and LZE
> > > > compressed should be .5 second.
> > >
> > > Seven seconds? How much memory is in use when you start, and how much is
> > > actually written to disk? If you're starting with 1GB of RAM in use,
> > > I'll sit up and listen, but I suspect you're talking about something
> > > closer to 20MB and init S :>
> >
> > It was on .5GB machine, with X running, IIRC. Specify how should I
> > load the system and I'll try it here. swsusp1 got some speedups with
> > O(n^2) killing (not yet merged).
>
> So it wrote .5GB of memory in seven seconds, or started with .5GB of RAM
> in use?

Machine had .5GB total, not surehow much was really used.

> If we want to compare apples with apples, we're going to have to make
> the only difference which code is run. A normal load on my computer is
> evolution, cyrus imapd, opera, win4lin running Libronix and a kernel
> tree in the cache (last image sizes were 1000, 1002, 995, 949 and
> 910MB). I'm happy to run your sped-up code for some tests, if you'd
> like. You know where to find mine if you want to make sure I'm not
> cheating :>

Okay, I started galeon (no opera here :-(), evolution, xpdf,
oowriter. Well, it is not going to be too much "apples-to-apples"
since swsusp1 cheats and discards caches (etc). Machine has 1GB memory
total, before suspend attempt 800MB were in use. Suspend took 20
seconds, after resume (and some swap-in) 250MB was in use.

> > > These discussions are getting really unreasonable. "I don't want that
> > > feature, therefore it shouldn't be merged" isn't a valid argument.
> > > Neither is "Well, I can suspend in seven seconds with hardly any memory
> > > in use." If you just don't want suspend2 in the kernel, come out and say
> > > it.
> >
> > Ok, "I do not want suspend2 in kernel". Not what you'd call suspend2,
> > anyway. I thought that stripping down suspend2 then merging it is
> > reasonable way to go, but now it seems to me that enhancing swsusp1 is
> > easier way to go. At least I'll be able to do it incrementally.
>
> You'll be able to do that within limits, but once you do seriously given
> up on the max-half-of-memory limit, you'll need some major redesigning.
> If that's the way you want to go, okay. Assuming nothing else changes,

I'm not sure if I want to do full page-cache saving (and without that,
half-of-memory limit does not bite too badly). "Everything is swapped
out" problem is actually not limited to swsusp, updatedb overnight
tends to have the same effect. Perhaps more generic solution is
needed...

cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null

does solve part of the problem. (Another problem is how to actually
measure improvements in this area).
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-29 15:51:35

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

On Sun, Nov 28, 2004 at 05:58:35PM +0100, Pavel Machek wrote:
> Hi!
>
> I can not merge anything before 2.6.10. As you have seen, I have quite
> a lot of patches in my tree, and I do not want mix them with these...
>
> > device-tree.diff
> > base from suspend2 with a little changed.
>
> I do not want this one.
>
> > core.diff
> > 1: redefine struct pbe for using _no_ continuous as pagedir.
>
> Can I get this one as a separate diff?

Here is it.

=== kernel/power/swsusp.c
==================================================================
--- kernel/power/swsusp.c (revision 24520)
+++ kernel/power/swsusp.c (local)
@@ -74,9 +74,6 @@
/* References to section boundaries */
extern char __nosave_begin, __nosave_end;

-/* Variables to be preserved over suspend */
-static int pagedir_order_check;
-
extern char resume_file[];
static dev_t resume_device;
/* Local variables that should not be affected by save */
@@ -97,7 +94,6 @@
*/
suspend_pagedir_t *pagedir_nosave __nosavedata = NULL;
static suspend_pagedir_t *pagedir_save;
-static int pagedir_order __nosavedata = 0;

#define SWSUSP_SIG "S1SUSPEND"

@@ -223,9 +219,63 @@
swap_list_unlock();
}

+#define ONE_PAGE_PBE_NUM (PAGE_SIZE/sizeof(struct pbe))
+#define PBE_IS_PAGE_END(x) \
+ ( PAGE_SIZE - sizeof(struct pbe) == ((x) - ((~(PAGE_SIZE - 1)) & (x))) )

+#define pgdir_for_each(pos, n, head) \
+ for(pos = head, n = pos ? (suspend_pagedir_t*)pos->dummy.val : NULL; \
+ pos != NULL; \
+ pos = n, n = pos ? (suspend_pagedir_t *)pos->dummy.val : NULL)

+#define pbe_for_each(pos, n, index, max, head) \
+ for(pos = head, index = 0, \
+ n = pos ? (struct pbe *)pos->dummy.val : NULL; \
+ (pos != NULL) && (index < max); \
+ pos = (PBE_IS_PAGE_END((unsigned long)pos)) ? n : \
+ ((struct pbe *)((unsigned long)pos + sizeof(struct pbe))), \
+ index ++, \
+ n = pos ? (struct pbe*)pos->dummy.val : NULL)
/**
+ * find_pbe_by_index -
+ * @pgdir:
+ * @index:
+ *
+ *
+ */
+static struct pbe *find_pbe_by_index(struct pbe *pgdir, int index)
+{
+ unsigned long p = 0;
+ struct pbe *pbe, *next;
+
+ pr_debug("find_pbe_by_index: %p, 0x%03x", pgdir, index);
+ pgdir_for_each(pbe, next, pgdir) {
+ if (p == index / ONE_PAGE_PBE_NUM) {
+ pbe = (struct pbe *)((unsigned long)pbe +
+ (index % ONE_PAGE_PBE_NUM) * sizeof(struct pbe));
+ pr_debug(" %p, o{%p} c{%p}\n",
+ pbe, (void*)pbe->orig_address, (void*)pbe->address);
+ return pbe;
+ }
+ p ++;
+ }
+ return (NULL);
+}
+
+/**
+ * pagedir_free -
+ * @head:
+ *
+ */
+static void pagedir_free(suspend_pagedir_t *head)
+{
+ suspend_pagedir_t *next, *cur;
+ pgdir_for_each(cur, next, head)
+ free_page((unsigned long)cur);
+}
+
+
+/**
* write_swap_page - Write one page to a fresh swap location.
* @addr: Address we're writing.
* @loc: Place to store the entry we used.
@@ -269,19 +319,76 @@
{
swp_entry_t entry;
int i;
+ struct pbe *next, *pos;

- for (i = 0; i < nr_copy_pages; i++) {
- entry = (pagedir_nosave + i)->swap_address;
+ pbe_for_each(pos, next, i, nr_copy_pages, pagedir_nosave) {
+ entry = pos->swap_address;
if (entry.val)
swap_free(entry);
else
break;
- (pagedir_nosave + i)->swap_address = (swp_entry_t){0};
+ pos->swap_address = (swp_entry_t){0};
}
}

+static int mod_progress = 1;

+static void inline mod_printk_progress(int i)
+{
+ if (mod_progress == 0) mod_progress = 1;
+ if (!(i%100))
+ printk( "\b\b\b\b%3d%%", i / mod_progress );
+}
+
/**
+ * write_one_pbe -
+ * @p:
+ * @data:
+ * @cur:
+ *
+ */
+static int write_one_pbe(struct pbe *p, void *data, int cur)
+{
+ int error = 0;
+
+ mod_printk_progress(cur);
+
+ pr_debug("write_one_pbe: %p, o{%p} c{%p} %d ",
+ p, (void *)p->orig_address, (void *)p->address, cur);
+ error = write_page((unsigned long)data, &p->swap_address);
+ if (error) return error;
+ pr_debug("%lu\n", swp_offset(p->swap_address));
+
+ return 0;
+}
+
+static int bio_read_page(pgoff_t page_off, void * page);
+
+/**
+ * read_one_pbe -
+ * @p:
+ * @data:
+ * @cur
+ *
+ */
+static int read_one_pbe(struct pbe *p, void *data, int cur)
+{
+ int error = 0;
+
+ mod_printk_progress(cur);
+
+ pr_debug("read_one_pbe: %p, o{%p} c{%p} %lu\n",
+ p, (void *)p->orig_address, data,
+ swp_offset(p->swap_address));
+
+ error = bio_read_page(swp_offset(p->swap_address), data);
+ if (error) return error;
+
+ return 0;
+}
+
+
+/**
* data_write - Write saved image to swap.
*
* Walk the list of pages in the image and sync each one to swap.
@@ -291,17 +398,15 @@
{
int error = 0;
int i;
- unsigned int mod = nr_copy_pages / 100;
+ struct pbe *pos, *next;

- if (!mod)
- mod = 1;
+ mod_progress = nr_copy_pages / 100;

printk( "Writing data to swap (%d pages)... ", nr_copy_pages );
- for (i = 0; i < nr_copy_pages && !error; i++) {
- if (!(i%mod))
- printk( "\b\b\b\b%3d%%", i / mod );
- error = write_page((pagedir_nosave+i)->address,
- &((pagedir_nosave+i)->swap_address));
+ pbe_for_each(pos, next, i, nr_copy_pages, pagedir_nosave) {
+ BUG_ON(pos->orig_address == 0);
+ error = write_one_pbe(pos, (void*)pos->address, i);
+ if (error) break;
}
printk("\b\b\b\bdone\n");
return error;
@@ -371,15 +476,17 @@

static int write_pagedir(void)
{
- unsigned long addr = (unsigned long)pagedir_nosave;
int error = 0;
- int n = SUSPEND_PD_PAGES(nr_copy_pages);
- int i;
+ int n = 0;
+ suspend_pagedir_t *pgdir, *next;

+ pgdir_for_each(pgdir, next, pagedir_nosave) {
+ error = write_page((unsigned long)pgdir, &swsusp_info.pagedir[n]);
+ if (error) break;
+ n ++;
+ }
+ printk( "Writing pagedir (%d pages)\n", n);
swsusp_info.pagedir_pages = n;
- printk( "Writing pagedir (%d pages)\n", n);
- for (i = 0; i < n && !error; i++, addr += PAGE_SIZE)
- error = write_page(addr, &swsusp_info.pagedir[i]);
return error;
}

@@ -564,7 +671,7 @@
{
struct zone *zone;
unsigned long zone_pfn;
- struct pbe * pbe = pagedir_nosave;
+ struct pbe * pbe = NULL;
int pages_copied = 0;

for_each_zone(zone) {
@@ -574,11 +681,14 @@
for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
if (saveable(zone, &zone_pfn)) {
struct page * page;
+ pbe = find_pbe_by_index(pagedir_nosave, pages_copied);
+ BUG_ON(pbe == NULL);
page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
pbe->orig_address = (long) page_address(page);
+ BUG_ON(pbe->orig_address == 0);
+ BUG_ON(pbe->address == 0);
/* copy_page is not usable for copying task structs. */
memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
- pbe++;
pages_copied++;
}
}
@@ -587,67 +697,160 @@
nr_copy_pages = pages_copied;
}

+#define pointer2num(x) ((x - PAGE_OFFSET) >> 12)
+#define num2pointer(x) ((x << 12) + PAGE_OFFSET)
+static inline void collide_set_bit(unsigned char *bitmap,
+ unsigned long bitnum)
+{
+ bitnum = pointer2num(bitnum);
+ bitmap[bitnum / 8] |= (1 << (bitnum%8));
+}
+static inline int collide_is_bit_set(unsigned char *bitmap,
+ unsigned long bitnum)
+{
+ bitnum = pointer2num(bitnum);
+ return !!(bitmap[bitnum / 8] & (1 << (bitnum%8)));
+}
+static void collide_bitmap_free(unsigned char *bitmap)
+{
+ free_pages((unsigned long)bitmap, 2);
+}

+/* ((1 << COLLIDE_BITMAP_ORDER) * PAGE_SIZE * 8) << 12 + PAGE_OFFSET */
+#define COLLIDE_BITMAP_ORDER 3
+
+static unsigned char *collide_bitmap_init(struct pbe *pgdir)
+{
+ unsigned char *bitmap =
+ (unsigned char *)__get_free_pages(GFP_ATOMIC | __GFP_COLD,
+ COLLIDE_BITMAP_ORDER);
+ struct pbe *next;
+
+ if (bitmap == NULL) {
+ return NULL;
+ }
+ memset(bitmap, 0, (1 << COLLIDE_BITMAP_ORDER) * PAGE_SIZE);
+
+ /* do base check */
+ BUG_ON(collide_is_bit_set(bitmap, (unsigned long)bitmap) == 1);
+ collide_set_bit(bitmap, (unsigned long)bitmap);
+ BUG_ON(collide_is_bit_set(bitmap, (unsigned long)bitmap) == 0);
+ while (pgdir != NULL) {
+ unsigned long nums;
+ next = (struct pbe*)pgdir->dummy.val;
+ for (nums = 0; nums < ONE_PAGE_PBE_NUM; nums++) {
+ collide_set_bit(bitmap, (unsigned long)pgdir);
+ collide_set_bit(bitmap, (unsigned long)pgdir->orig_address);
+ pgdir ++;
+ }
+ pgdir = next;
+ }
+ return bitmap;
+}
+
+static void **eaten_memory = NULL;
+
+static void *swsusp_get_safe_free_page(unsigned char *collide)
+{
+ void *addr = NULL;
+ void **c = eaten_memory;
+
+ do {
+ if (addr) {
+ eaten_memory = (void**)addr;
+ *eaten_memory = c;
+ c = eaten_memory;
+ }
+ addr = (void*)__get_free_pages(GFP_ATOMIC | __GFP_COLD, 0);
+ if (!addr)
+ return NULL;
+ } while (collide && collide_is_bit_set(collide, (unsigned long)addr));
+
+ return addr;
+}
+
/**
- * calc_order - Determine the order of allocation needed for pagedir_save.
+ * alloc_one_pagedir -
+ * @prev:
+ * @collide:
*
- * This looks tricky, but is just subtle. Please fix it some time.
- * Since there are %nr_copy_pages worth of pages in the snapshot, we need
- * to allocate enough contiguous space to hold
- * (%nr_copy_pages * sizeof(struct pbe)),
- * which has the saved/orig locations of the page..
- *
- * SUSPEND_PD_PAGES() tells us how many pages we need to hold those
- * structures, then we call get_bitmask_order(), which will tell us the
- * last bit set in the number, starting with 1. (If we need 30 pages, that
- * is 0x0000001e in hex. The last bit is the 5th, which is the order we
- * would use to allocate 32 contiguous pages).
- *
- * Since we also need to save those pages, we add the number of pages that
- * we need to nr_copy_pages, and in case of an overflow, do the
- * calculation again to update the number of pages needed.
- *
- * With this model, we will tend to waste a lot of memory if we just cross
- * an order boundary. Plus, the higher the order of allocation that we try
- * to do, the more likely we are to fail in a low-memory situtation
- * (though we're unlikely to get this far in such a case, since swsusp
- * requires half of memory to be free anyway).
*/
+static suspend_pagedir_t * alloc_one_pagedir(suspend_pagedir_t *prev,
+ unsigned char *collide)
+{
+ suspend_pagedir_t *pgdir = NULL;
+ int i;

+ pgdir = (suspend_pagedir_t *)swsusp_get_safe_free_page(collide);

-static void calc_order(void)
+ /*pr_debug("pgdir: %p, %p, %d\n",
+ pgdir, prev, sizeof(suspend_pagedir_t)); */
+ for (i = 0; i < ONE_PAGE_PBE_NUM; i++) {
+ pgdir[i].dummy.val = 0;
+ pgdir[i].address = 0;
+ pgdir[i].orig_address = 0;
+ if (prev)
+ prev[i].dummy.val= (unsigned long)pgdir;
+ }
+
+ return (pgdir);
+}
+
+/* calc_nums - Determine the nums of allocation needed for pagedir_save. */
+static int calc_nums(int nr_copy)
{
- int diff = 0;
- int order = 0;
-
+ int diff = 0, ret = 0;
do {
- diff = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages)) - order;
+ diff = (nr_copy / ONE_PAGE_PBE_NUM) - ret + 1;
if (diff) {
- order += diff;
- nr_copy_pages += 1 << diff;
+ ret += diff;
+ nr_copy += diff;
}
- } while(diff);
- pagedir_order = order;
+ } while (diff);
+ return nr_copy;
}

-
/**
* alloc_pagedir - Allocate the page directory.
+ * @pbe:
+ * @pbe_nums:
+ * @collide:
+ * @page_nums:
*
* First, determine exactly how many contiguous pages we need and
* allocate them.
*/

-static int alloc_pagedir(void)
+static int alloc_pagedir(struct pbe **pbe, int pbe_nums,
+ unsigned char *collide, int page_nums)
{
- calc_order();
- pagedir_save = (suspend_pagedir_t *)__get_free_pages(GFP_ATOMIC | __GFP_COLD,
- pagedir_order);
- if (!pagedir_save)
- return -ENOMEM;
- memset(pagedir_save, 0, (1 << pagedir_order) * PAGE_SIZE);
- pagedir_nosave = pagedir_save;
- return 0;
+ unsigned int nums = 0;
+ unsigned int after_alloc = pbe_nums;
+ suspend_pagedir_t *prev = NULL, *cur = NULL;
+
+ if (page_nums)
+ after_alloc = ONE_PAGE_PBE_NUM * page_nums;
+ else
+ after_alloc = calc_nums(after_alloc);
+ pr_debug("alloc_pagedir: %d, %d\n", pbe_nums, after_alloc);
+ for (nums = 0 ; nums < after_alloc ; nums += ONE_PAGE_PBE_NUM) {
+ cur = alloc_one_pagedir(prev, collide);
+ pr_debug("alloc_one_pagedir: %p\n", cur);
+ if (!cur) { /* get page failed */
+ goto no_mem;
+ }
+ if (nums == 0) { /* setup the head */
+ *pbe = cur;
+ }
+ prev = cur;
+ }
+ return after_alloc - pbe_nums;
+
+no_mem:
+ pagedir_free(*pbe);
+ *pbe = NULL;
+
+ return (-ENOMEM);
}

/**
@@ -656,11 +859,10 @@

static void free_image_pages(void)
{
- struct pbe * p;
+ struct pbe * p, * n;
int i;

- p = pagedir_save;
- for (i = 0, p = pagedir_save; i < nr_copy_pages; i++, p++) {
+ pbe_for_each(p, n, i, nr_copy_pages, pagedir_save) {
if (p->address) {
ClearPageNosave(virt_to_page(p->address));
free_page(p->address);
@@ -676,10 +878,10 @@

static int alloc_image_pages(void)
{
- struct pbe * p;
+ struct pbe * p, * n;
int i;

- for (i = 0, p = pagedir_save; i < nr_copy_pages; i++, p++) {
+ pbe_for_each(p, n, i, nr_copy_pages, pagedir_save) {
p->address = get_zeroed_page(GFP_ATOMIC | __GFP_COLD);
if (!p->address)
return -ENOMEM;
@@ -693,7 +895,7 @@
BUG_ON(PageNosave(virt_to_page(pagedir_save)));
BUG_ON(PageNosaveFree(virt_to_page(pagedir_save)));
free_image_pages();
- free_pages((unsigned long) pagedir_save, pagedir_order);
+ pagedir_free(pagedir_save);
}


@@ -751,17 +953,20 @@
if (!enough_swap())
return -ENOSPC;

- if ((error = alloc_pagedir())) {
+ error = alloc_pagedir(&pagedir_save, nr_copy_pages, NULL, 0);
+ if (error < 0) {
pr_debug("suspend: Allocating pagedir failed.\n");
return error;
}
+ pr_debug("alloc_pagedir: addon %d\n", error);
+ nr_copy_pages += error;
if ((error = alloc_image_pages())) {
pr_debug("suspend: Allocating image pages failed.\n");
swsusp_free();
return error;
}
+ pagedir_nosave = pagedir_save;

- pagedir_order_check = pagedir_order;
return 0;
}

@@ -854,8 +1059,6 @@

asmlinkage int swsusp_restore(void)
{
- BUG_ON (pagedir_order_check != pagedir_order);
-
/* Even mappings of "global" things (vmalloc) need to be fixed */
__flush_tlb_global();
wbinvd(); /* Nigel says wbinvd here is good idea... */
@@ -882,98 +1085,6 @@
}


-
-/* More restore stuff */
-
-#define does_collide(addr) does_collide_order(pagedir_nosave, addr, 0)
-
-/*
- * Returns true if given address/order collides with any orig_address
- */
-static int __init does_collide_order(suspend_pagedir_t *pagedir, unsigned long addr,
- int order)
-{
- int i;
- unsigned long addre = addr + (PAGE_SIZE<<order);
-
- for (i=0; i < nr_copy_pages; i++)
- if ((pagedir+i)->orig_address >= addr &&
- (pagedir+i)->orig_address < addre)
- return 1;
-
- return 0;
-}
-
-/*
- * We check here that pagedir & pages it points to won't collide with pages
- * where we're going to restore from the loaded pages later
- */
-static int __init check_pagedir(void)
-{
- int i;
-
- for(i=0; i < nr_copy_pages; i++) {
- unsigned long addr;
-
- do {
- addr = get_zeroed_page(GFP_ATOMIC);
- if(!addr)
- return -ENOMEM;
- } while (does_collide(addr));
-
- (pagedir_nosave+i)->address = addr;
- }
- return 0;
-}
-
-static int __init swsusp_pagedir_relocate(void)
-{
- /*
- * We have to avoid recursion (not to overflow kernel stack),
- * and that's why code looks pretty cryptic
- */
- suspend_pagedir_t *old_pagedir = pagedir_nosave;
- void **eaten_memory = NULL;
- void **c = eaten_memory, *m, *f;
- int ret = 0;
-
- printk("Relocating pagedir ");
-
- if (!does_collide_order(old_pagedir, (unsigned long)old_pagedir, pagedir_order)) {
- printk("not necessary\n");
- return check_pagedir();
- }
-
- while ((m = (void *) __get_free_pages(GFP_ATOMIC, pagedir_order)) != NULL) {
- if (!does_collide_order(old_pagedir, (unsigned long)m, pagedir_order))
- break;
- eaten_memory = m;
- printk( "." );
- *eaten_memory = c;
- c = eaten_memory;
- }
-
- if (!m) {
- printk("out of memory\n");
- ret = -ENOMEM;
- } else {
- pagedir_nosave =
- memcpy(m, old_pagedir, PAGE_SIZE << pagedir_order);
- }
-
- c = eaten_memory;
- while (c) {
- printk(":");
- f = c;
- c = *c;
- free_pages((unsigned long)f, pagedir_order);
- }
- if (ret)
- return ret;
- printk("|\n");
- return check_pagedir();
-}
-
/**
* Using bio to read from swap.
* This code requires a bit more work than just using buffer heads
@@ -1038,12 +1149,12 @@
return error;
}

-int bio_read_page(pgoff_t page_off, void * page)
+static int bio_read_page(pgoff_t page_off, void * page)
{
return submit(READ, page_off, page);
}

-int bio_write_page(pgoff_t page_off, void * page)
+static int bio_write_page(pgoff_t page_off, void * page)
{
return submit(WRITE, page_off, page);
}
@@ -1088,7 +1199,6 @@
return -EPERM;
}
nr_copy_pages = swsusp_info.image_pages;
- pagedir_order = get_bitmask_order(SUSPEND_PD_PAGES(nr_copy_pages));
return error;
}

@@ -1115,7 +1225,96 @@
return error;
}

+static void __init eat_progress(void)
+{
+ char *eaten_progess = "-\\|/";
+ static int eaten_i = 0;
+
+ printk("\b%c", eaten_progess[eaten_i]);
+ eaten_i ++;
+ if (eaten_i > 3) eaten_i = 0;
+}
+
+static int __init check_one_pbe(struct pbe *p, void *collide, int cur)
+{
+ unsigned long addr = 0;
+
+ pr_debug("check_one_pbe: %p %lu o{%p} ",
+ p, p->swap_address.val, (void*)p->orig_address);
+ addr = (unsigned long)swsusp_get_safe_free_page(collide);
+ if(!addr)
+ return -ENOMEM;
+ pr_debug("c{%p} done\n", (void*)addr);
+ p->address = addr;
+
+ return 0;
+}
+
+static void __init swsusp_copy_pagedir(suspend_pagedir_t *d_pgdir,
+ suspend_pagedir_t *s_pgdir)
+{
+ int i = 0;
+
+ while (s_pgdir != NULL) {
+ suspend_pagedir_t *s_next = (suspend_pagedir_t *)s_pgdir->dummy.val;
+ suspend_pagedir_t *d_next = (suspend_pagedir_t *)d_pgdir->dummy.val;
+ for (i = 0; i < ONE_PAGE_PBE_NUM; i++) {
+ d_pgdir->address = s_pgdir->address;
+ d_pgdir->orig_address = s_pgdir->orig_address;
+ d_pgdir->swap_address = s_pgdir->swap_address;
+ s_pgdir ++; d_pgdir ++;
+ }
+ d_pgdir = d_next;
+ s_pgdir = s_next;
+ };
+}
/**
+ * We check here that pagedir & pages it points to won't collide with pages
+ * where we're going to restore from the loaded pages later
+ */
+static int __init check_pagedir(void)
+{
+ void **c, *f;
+ struct pbe *next, *pos;
+ int error, index;
+ suspend_pagedir_t *addr = NULL;
+ unsigned char *bitmap = collide_bitmap_init(pagedir_nosave);
+
+ BUG_ON(bitmap == NULL);
+
+ printk("Relocating pagedir ... ");
+ error = alloc_pagedir(&addr, nr_copy_pages, bitmap,
+ swsusp_info.pagedir_pages);
+ if (error < 0) {
+ return error;
+ }
+ swsusp_copy_pagedir(addr, pagedir_nosave);
+ pagedir_free(pagedir_nosave);
+
+ /* check copy address */
+ pbe_for_each(pos, next, index, nr_copy_pages, addr) {
+ error = check_one_pbe(pos, bitmap, index);
+ BUG_ON(error);
+ }
+
+ /* free eaten memory */
+ c = eaten_memory;
+ while (c) {
+ eat_progress();
+ f = c;
+ c = *c;
+ free_pages((unsigned long)f, 0);
+ }
+ /* free unused memory */
+ collide_bitmap_free(bitmap);
+ printk(" done\n");
+
+ pagedir_nosave = addr;
+
+ return 0;
+}
+
+/**
* swsusp_read_data - Read image pages from swap.
*
* You do not need to check for overlaps, check_pagedir()
@@ -1124,53 +1323,67 @@

static int __init data_read(void)
{
- struct pbe * p;
+ struct pbe * p, * n;
int error;
int i;
- int mod = nr_copy_pages / 100;

- if (!mod)
- mod = 1;
+ if ((error = check_pagedir())) {
+ return -ENOMEM;
+ }

- if ((error = swsusp_pagedir_relocate()))
- return error;
+ mod_progress = nr_copy_pages / 100;

printk( "Reading image data (%d pages): ", nr_copy_pages );
- for(i = 0, p = pagedir_nosave; i < nr_copy_pages && !error; i++, p++) {
- if (!(i%mod))
- printk( "\b\b\b\b%3d%%", i / mod );
- error = bio_read_page(swp_offset(p->swap_address),
- (void *)p->address);
+ pbe_for_each(p, n, i, nr_copy_pages, pagedir_nosave) {
+ error = read_one_pbe(p, (void*)p->address, i);
+ if (error) break;
}
printk(" %d done.\n",i);
return error;
-
}

extern dev_t __init name_to_dev_t(const char *line);

+static int __init read_one_pagedir(suspend_pagedir_t *pgdir, int i)
+{
+ unsigned long offset = swp_offset(swsusp_info.pagedir[i]);
+ unsigned long next;
+ int error = 0;
+
+ next = pgdir->dummy.val;
+ pr_debug("read_one_pagedir: %p, %d, %lu, %p\n",
+ pgdir, i, offset, (void*)next);
+ if ((error = bio_read_page(offset, (void *)pgdir))) {
+ return error;
+ }
+ pgdir->dummy.val = next;
+
+ return error;
+}
+
+/*
+ * reading pagedir from swap device
+ */
static int __init read_pagedir(void)
{
- unsigned long addr;
- int i, n = swsusp_info.pagedir_pages;
+ int i = 0, n = swsusp_info.pagedir_pages;
int error = 0;
+ suspend_pagedir_t *pgdir, *next;

- addr = __get_free_pages(GFP_ATOMIC, pagedir_order);
- if (!addr)
+ error = alloc_pagedir(&pagedir_nosave, nr_copy_pages, NULL, n);
+ if (error < 0)
return -ENOMEM;
- pagedir_nosave = (struct pbe *)addr;

- pr_debug("pmdisk: Reading pagedir (%d Pages)\n",n);
+ printk("pmdisk: Reading pagedir (%d Pages)\n",n);

- for (i = 0; i < n && !error; i++, addr += PAGE_SIZE) {
- unsigned long offset = swp_offset(swsusp_info.pagedir[i]);
- if (offset)
- error = bio_read_page(offset, (void *)addr);
- else
- error = -EFAULT;
+ pgdir_for_each(pgdir, next, pagedir_nosave) {
+ error = read_one_pagedir(pgdir, i);
+ if (error) break;
+ i++;
}
+ BUG_ON(i != n);
if (error)
- free_pages((unsigned long)pagedir_nosave, pagedir_order);
+ pagedir_free(pagedir_nosave);
return error;
}

@@ -1185,7 +1398,7 @@
if ((error = read_pagedir()))
return error;
if ((error = data_read()))
- free_pages((unsigned long)pagedir_nosave, pagedir_order);
+ pagedir_free(pagedir_nosave);
return error;
}

@@ -1217,3 +1430,50 @@
pr_debug("pmdisk: Error %d resuming\n", error);
return error;
}
+
+/**
+ * for_each_pbe_copy_back -
+ *
+ * That usefuly for help us writing the code in assemble code
+ *
+ */
+/* #define CREATE_ASM_CODE */
+#ifdef CREATE_ASM_CODE
+#if 0 /* if your copy back code is running in real mode, enable it */
+#define GET_ADDRESS(x) __pa(x)
+#else
+#define GET_ADDRESS(x) (x)
+#endif
+asmlinkage void for_each_pbe_copy_back(void)
+{
+ struct pbe *pgdir, *next;
+
+ pgdir = pagedir_nosave;
+ while (pgdir != NULL) {
+ unsigned long nums, i;
+ pgdir = (struct pbe *)GET_ADDRESS(pgdir);
+ next = (struct pbe*)pgdir->dummy.val;
+ for (nums = 0; nums < ONE_PAGE_PBE_NUM; nums++) {
+ register unsigned long *orig, *copy;
+ orig = (unsigned long *)pgdir->orig_address;
+ if (orig == 0) goto end;
+ orig = (unsigned long *)GET_ADDRESS(orig);
+ copy = (unsigned long *)GET_ADDRESS(pgdir->address);
+#if 0
+ memcpy(orig, copy, PAGE_SIZE);
+#else
+ for (i = 0; i < PAGE_SIZE / sizeof(unsigned long); i+=4) {
+ *(orig + i) = *(copy + i);
+ *(orig + i+1) = *(copy + i+1);
+ *(orig + i+2) = *(copy + i+2);
+ *(orig + i+3) = *(copy + i+3);
+ }
+#endif
+ pgdir ++;
+ }
+ pgdir = next;
+ }
+end:
+ panic("just asm code");
+}
+#endif
=== arch/i386/power/swsusp.S
==================================================================
--- arch/i386/power/swsusp.S (revision 24520)
+++ arch/i386/power/swsusp.S (local)
@@ -31,25 +31,34 @@
movl $swsusp_pg_dir-__PAGE_OFFSET,%ecx
movl %ecx,%cr3

- movl pagedir_nosave, %ebx
- xorl %eax, %eax
- xorl %edx, %edx
- .p2align 4,,7
+ movl pagedir_nosave, %eax
+ test %eax, %eax
+ je copy_loop_end
+ movl $1024, %edx

-copy_loop:
- movl 4(%ebx,%edx),%edi
- movl (%ebx,%edx),%esi
+copy_loop_start:
+ movl 0xc(%eax), %ebp
+ xorl %ebx, %ebx
+ leal 0x0(%esi),%esi

- movl $1024, %ecx
- rep
- movsl
+copy_one_pgdir:
+ movl 0x4(%eax),%edi
+ test %edi, %edi
+ je copy_loop_end

- incl %eax
- addl $16, %edx
- cmpl nr_copy_pages,%eax
- jb copy_loop
- .p2align 4,,7
+ movl (%eax), %esi
+ movl %edx, %ecx
+ repz movsl %ds:(%esi),%es:(%edi)

+ incl %ebx
+ addl $0x10, %eax
+ cmpl $0xff, %ebx
+ jbe copy_one_pgdir
+ test %ebp, %ebp
+ movl %ebp, %eax
+ jne copy_loop_start
+copy_loop_end:
+
movl saved_context_esp, %esp
movl saved_context_ebp, %ebp
movl saved_context_ebx, %ebx
--
Hu Gang / Steve
Linux Registered User 204016
GPG Public Key: http://soulinfo.com/~hugang/hugang.asc

2004-11-29 22:20:55

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 24/51: Keyboard and serial console hooks.

Hi.

On Mon, 2004-11-29 at 10:39, Pavel Machek wrote:
> Hi!
>
> > > > > Here we add simple hooks so that the user can interact with suspend
> > > > > while it is running. (Hmm. The serial console condition could be
> > > > > simplified :>). The hooks allow you to do such things as:
> >
> > > > > - change the amount of detail of debugging info shown
> > >
> > > Use sysrq-X as you do during runtime.
> >
> > No, I don't do this anymore. When I did, I had problems post-resume with
> > the keyboard handler sometimes thinking SysRq was still pressed.
>
> Fix keyboard handler, then... It probably happens with other keys
> beside SysRq, right?

I guess it would. Nevertheless, it's ugly to have to press SysRq +
level; why make things more awkward than they need to be?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-29 22:25:16

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Mon, 2004-11-29 at 20:34, Stefan Seyfried wrote:
> Nigel Cunningham wrote:
>
> > The cryptoapi provides support for both compression and encryption. I'd
> > happily make use of that, but we still need a way for the user to choose
> > what compression/encryption they want and configure it. I'm not at all
>
> And encryption is in fact much more needed than compression. Remember,
> you are writing everything in memory (including maybe ssh passphrases or
> gpg keys) to swap in clear text. Not nice. And i agree that compression
> is nice to have, too.
>
> >>>:> But not everyone who uses 2.6.9 uses swsusp. :>
>
> and not everyone who downloads suspend2 uses it ;-)

Yes... I'd say the relative percentage would be much higher, though.

> > change a parameter or forcing them to do an ls in /dev with obscure
> > parameters (to get the major and minor numbers) when they already know
> > they want /dev/sda1 isn't user friendly. Obviously user friendliness is
>
> This can easily be done by a userspace helper. You do use the
> (userspace) X server to display your GUI, don't you?

No. Not at all. All of userspace is well and truly wedged in a block of
ice by then.

> Putting only the absolutely necessary things into the kernel (the same
> is true for the interactive resume thing - if someone wants interactive
> startup at a failing resume, he has to use an initrd, i don't see a
> problem with that) will probably increase the acceptance a bit :-)

That's fine if your initrd is properly configured and you're willing to
add extra cruft to the kernel so userspace can get the info it needs,
and report what the user wants to do. If, however, you don't use an
initrd, you're sunk.

Regarding acceptance, there's no point in getting it accepted into the
kernel if we end up with something that's user-unfriendly. I think it
will help a lot if we agree that suspend does need to blur the lines
between kernel and userspace a little, in the interests of providing
software that is superior.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-29 22:43:32

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > >>>:> But not everyone who uses 2.6.9 uses swsusp. :>
> >
> > and not everyone who downloads suspend2 uses it ;-)
>
> Yes... I'd say the relative percentage would be much higher, though.

Agreed.

> > > change a parameter or forcing them to do an ls in /dev with obscure
> > > parameters (to get the major and minor numbers) when they already know
> > > they want /dev/sda1 isn't user friendly. Obviously user friendliness is
> >
> > This can easily be done by a userspace helper. You do use the
> > (userspace) X server to display your GUI, don't you?
>
> No. Not at all. All of userspace is well and truly wedged in a block of
> ice by then.

I think that was not what Stefan wanted to say.

> Regarding acceptance, there's no point in getting it accepted into the
> kernel if we end up with something that's user-unfriendly. I think it
> will help a lot if we agree that suspend does need to blur the lines
> between kernel and userspace a little, in the interests of providing
> software that is superior.

I guess we'll have to agree to disagree here. I do not think suspend
is special enough to blur the lines...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-30 00:28:09

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi.

On Mon, 2004-11-29 at 20:55, Rob Landley wrote:
> On Wednesday 24 November 2004 08:02 am, Nigel Cunningham wrote:
> > A plugin for verifying the consistency of an image. Working with kdb, it
> > can look up the locations of variations. There will always be some
> > variations shown, simply because we're touching memory before we get
> > here and as we check the image.
>
> A while back I suggested checking the last mount time of the mounted local
> filesystems as a quick and dirty sanity check between loading the image and
> unfreezing all the processes. (Since a read-only mount shouldn't touch this,
> triggering swsusp resume from userspace after prodding various hardware
> shouldn't cause a major problem either...) Does that sound like a good idea?

If I recall correctly, someone replied that even a read only mount under
one filesystem (XFS? Not sure), would replay the journal, so it wasn't a
goer.

> Haven't had time to look into it myself, though. (Just recently got time
> enough to bang on busybox again. Somewhere around 2.6.7, software suspend
> stopped working for me and I haven't even had a chance to track _that_ down
> yet. Hopefully fixed in 2.6.9 or 2.6.10, I haven't played with it
> recently...)

If you mean suspend2, I might be able to help if given more info.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-30 00:28:54

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Tue, 2004-11-30 at 00:03, Pavel Machek wrote:
> > If we want to compare apples with apples, we're going to have to make
> > the only difference which code is run. A normal load on my computer is
> > evolution, cyrus imapd, opera, win4lin running Libronix and a kernel
> > tree in the cache (last image sizes were 1000, 1002, 995, 949 and
> > 910MB). I'm happy to run your sped-up code for some tests, if you'd
> > like. You know where to find mine if you want to make sure I'm not
> > cheating :>
>
> Okay, I started galeon (no opera here :-(), evolution, xpdf,
> oowriter. Well, it is not going to be too much "apples-to-apples"
> since swsusp1 cheats and discards caches (etc). Machine has 1GB memory
> total, before suspend attempt 800MB were in use. Suspend took 20
> seconds, after resume (and some swap-in) 250MB was in use.

Are you able to time up to when the swap in is finished? Without that,
we're not really comparing apples with apples, it seems.

> > > > These discussions are getting really unreasonable. "I don't want that
> > > > feature, therefore it shouldn't be merged" isn't a valid argument.
> > > > Neither is "Well, I can suspend in seven seconds with hardly any memory
> > > > in use." If you just don't want suspend2 in the kernel, come out and say
> > > > it.
> > >
> > > Ok, "I do not want suspend2 in kernel". Not what you'd call suspend2,
> > > anyway. I thought that stripping down suspend2 then merging it is
> > > reasonable way to go, but now it seems to me that enhancing swsusp1 is
> > > easier way to go. At least I'll be able to do it incrementally.
> >
> > You'll be able to do that within limits, but once you do seriously given
> > up on the max-half-of-memory limit, you'll need some major redesigning.
> > If that's the way you want to go, okay. Assuming nothing else changes,
>
> I'm not sure if I want to do full page-cache saving (and without that,
> half-of-memory limit does not bite too badly). "Everything is swapped
> out" problem is actually not limited to swsusp, updatedb overnight
> tends to have the same effect. Perhaps more generic solution is
> needed...

Would increases in the amount of memory machines have make this bite
more and more over time?

I guess the more generic solution would be to abandon using bio calls
and have your own device driver that could write the whole image to disk
without having to do the atomic copy. You'd have to write a lot of
support for drivers, though. I'd find it hard to imagine it being worth
the effort.

> cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null

What does this do?

> does solve part of the problem. (Another problem is how to actually
> measure improvements in this area).

Yes; that's always an 'interesting' issue :>

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-30 00:32:26

by Rob Landley

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

On Monday 29 November 2004 07:24 pm, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2004-11-29 at 20:55, Rob Landley wrote:
> > On Wednesday 24 November 2004 08:02 am, Nigel Cunningham wrote:
> > > A plugin for verifying the consistency of an image. Working with kdb,
> > > it can look up the locations of variations. There will always be some
> > > variations shown, simply because we're touching memory before we get
> > > here and as we check the image.
> >
> > A while back I suggested checking the last mount time of the mounted
> > local filesystems as a quick and dirty sanity check between loading the
> > image and unfreezing all the processes. (Since a read-only mount
> > shouldn't touch this, triggering swsusp resume from userspace after
> > prodding various hardware shouldn't cause a major problem either...)
> > Does that sound like a good idea?
>
> If I recall correctly, someone replied that even a read only mount under
> one filesystem (XFS? Not sure), would replay the journal, so it wasn't a
> goer.

You could always special case the broken one until they fix it... :)

> > Haven't had time to look into it myself, though. (Just recently got time
> > enough to bang on busybox again. Somewhere around 2.6.7, software
> > suspend stopped working for me and I haven't even had a chance to track
> > _that_ down yet. Hopefully fixed in 2.6.9 or 2.6.10, I haven't played
> > with it recently...)
>
> If you mean suspend2, I might be able to help if given more info.

Nah, the one that's built in. I'll try it again when I upgrade to 2.6.10 in a
few days.

> Regards,
>
> Nigel

Rob

2004-11-30 00:53:29

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi.

On Tue, 2004-11-30 at 10:30, Rob Landley wrote:
> On Monday 29 November 2004 07:24 pm, Nigel Cunningham wrote:
> > Hi.
> >
> > On Mon, 2004-11-29 at 20:55, Rob Landley wrote:
> > > On Wednesday 24 November 2004 08:02 am, Nigel Cunningham wrote:
> > > > A plugin for verifying the consistency of an image. Working with kdb,
> > > > it can look up the locations of variations. There will always be some
> > > > variations shown, simply because we're touching memory before we get
> > > > here and as we check the image.
> > >
> > > A while back I suggested checking the last mount time of the mounted
> > > local filesystems as a quick and dirty sanity check between loading the
> > > image and unfreezing all the processes. (Since a read-only mount
> > > shouldn't touch this, triggering swsusp resume from userspace after
> > > prodding various hardware shouldn't cause a major problem either...)
> > > Does that sound like a good idea?
> >
> > If I recall correctly, someone replied that even a read only mount under
> > one filesystem (XFS? Not sure), would replay the journal, so it wasn't a
> > goer.
>
> You could always special case the broken one until they fix it... :)

Mmm. I wonder how much code that would require us to add. I do like the
idea of not interacting where the answer is obvious :>. I still think,
however, that interacting when the answer isn't obvious is the right
thing to do. Take for example the case where we find an image, but the
device numbers look like they belong to 2.4 and we're a 2.6 kernel. We
can't read the header (we can't be sure that this is the cause). The
user - or their cat - might have selected the wrong boot image
unintentionally. Why shouldn't we give them the opportunity to reboot
and get the right one?

> > > Haven't had time to look into it myself, though. (Just recently got time
> > > enough to bang on busybox again. Somewhere around 2.6.7, software
> > > suspend stopped working for me and I haven't even had a chance to track
> > > _that_ down yet. Hopefully fixed in 2.6.9 or 2.6.10, I haven't played
> > > with it recently...)
> >
> > If you mean suspend2, I might be able to help if given more info.
>
> Nah, the one that's built in. I'll try it again when I upgrade to 2.6.10 in a
> few days.

Okay.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-30 10:21:09

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > I'm not sure if I want to do full page-cache saving (and without that,
> > half-of-memory limit does not bite too badly). "Everything is swapped
> > out" problem is actually not limited to swsusp, updatedb overnight
> > tends to have the same effect. Perhaps more generic solution is
> > needed...
>
> Would increases in the amount of memory machines have make this bite
> more and more over time?

Actually, it should bite less and less, because ammount of memory
actually used does not seem to grow as fast as ammount of memory
available. On 4MB machine, I could imagine kernel using >2MB memory
and "half-memory-free" trick not working at all. On 1GB
machine... well kernel will never use >512MB of memory, so we are safe.

> I guess the more generic solution would be to abandon using bio calls
> and have your own device driver that could write the whole image to disk
> without having to do the atomic copy. You'd have to write a lot of
> support for drivers, though. I'd find it hard to imagine it being worth
> the effort.

That would mean rewriting half of kernel.

> > cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null
>
> What does this do?

Attempts to load all the binaries into memory. Poor man's "make
machine responsive after swsusp".
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-30 12:16:39

by Stefan Seyfried

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi,

Nigel Cunningham wrote:

> On Mon, 2004-11-29 at 20:34, Stefan Seyfried wrote:

>>and not everyone who downloads suspend2 uses it ;-)
>
> Yes... I'd say the relative percentage would be much higher, though.

you are probably right here :-)

>>>change a parameter or forcing them to do an ls in /dev with obscure
>>>parameters (to get the major and minor numbers) when they already know
>>>they want /dev/sda1 isn't user friendly. Obviously user friendliness is
>>
>>This can easily be done by a userspace helper. You do use the
>>(userspace) X server to display your GUI, don't you?
>
> No. Not at all. All of userspace is well and truly wedged in a block of
> ice by then.

you are not changing the suspend device after freezing userspace, or i
am getting something horribly wrong here.

so if you have 2 choices of an interface:
1) more complex kernel code, but you can do "echo /dev/name > /proc/foo"
2) less complex kernel code, now you have a userspace helper e.g.
"suspend_ctl foodev /dev/name" which then does the magic number
calculations in userspace and puts the magic number into the kernel.

I think that interface 2) would be preferred by most kernel developers.
Especially since this is code only needed on a relatively small subset
of all linux installations.

There is a "top" userspace program to parse kernel numbers, we don't
have "/proc/top".

>>Putting only the absolutely necessary things into the kernel (the same
>>is true for the interactive resume thing - if someone wants interactive
>>startup at a failing resume, he has to use an initrd, i don't see a
>>problem with that) will probably increase the acceptance a bit :-)
>
> That's fine if your initrd is properly configured and you're willing to

This is something distributions have to take care of.

> add extra cruft to the kernel so userspace can get the info it needs,

not much extra cruft is needed. The "echo resume > /sys/power/state"
just returns (which it wouldn't if the resume was successful), then you
can decide what to do next.

> and report what the user wants to do. If, however, you don't use an
> initrd, you're sunk.

yes. There are other prerequisites for suspend than using an initrd
though (you need a computer :-). If you don't use an initrd, you cannot
use the interactive features but have to decide at compile time which
way to go if the resume fails. That's life.

> Regarding acceptance, there's no point in getting it accepted into the
> kernel if we end up with something that's user-unfriendly. I think it
> will help a lot if we agree that suspend does need to blur the lines
> between kernel and userspace a little, in the interests of providing
> software that is superior.

User-friendlyness is an joint effort of kernel and userspace. The user
does not care who does the work when he clicks on his "hibernation" Icon
in the taskbar. (The same is true for users of an hibernation script).
Actually, the thing that makes suspend2 more reliable than swsusp is
probably the very good hibernation script (userspace) that saves users
the reading of documentation since it automatically unloads all critical
modules etc. For me, pavel's later versions as in SUSE 9.2 have worked
out of the box on every non-SMP i386 notebook i have laid my hands on in
the last 6 months (thanks to userspace taking care of bad modules etc).

Regards,

Stefan

2004-11-30 13:24:40

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi!

> > > If I recall correctly, someone replied that even a read only mount under
> > > one filesystem (XFS? Not sure), would replay the journal, so it wasn't a
> > > goer.
> >
> > You could always special case the broken one until they fix it... :)
>
> Mmm. I wonder how much code that would require us to add. I do like the
> idea of not interacting where the answer is obvious :>. I still think,
> however, that interacting when the answer isn't obvious is the right
> thing to do. Take for example the case where we find an image, but the
> device numbers look like they belong to 2.4 and we're a 2.6 kernel. We
> can't read the header (we can't be sure that this is the cause). The
> user - or their cat - might have selected the wrong boot image
> unintentionally. Why shouldn't we give them the opportunity to reboot
> and get the right one?

Well, kernel depending on user feedback has some interesting issues...
...like user not speaking english or user using speech output.
Thats why pushing "Shall I reboot?" etc prompts into userland
is good idea. (Distros probably will not get it right, either, but at least
they get a chance.)
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-30 13:24:37

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi!

> > A plugin for verifying the consistency of an image. Working with kdb, it
> > can look up the locations of variations. There will always be some
> > variations shown, simply because we're touching memory before we get
> > here and as we check the image.
>
> A while back I suggested checking the last mount time of the mounted local
> filesystems as a quick and dirty sanity check between loading the image and
> unfreezing all the processes. (Since a read-only mount shouldn't touch this,
> triggering swsusp resume from userspace after prodding various hardware
> shouldn't cause a major problem either...) Does that sound like a good idea?

Yes, it would be good sanity check. ext3 replays journals even on
read-only mount so your / will need to be ext2...
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2004-11-30 13:38:54

by Matthew Garrett

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Pavel Machek <[email protected]> wrote:

> Yes, it would be good sanity check. ext3 replays journals even on
> read-only mount so your / will need to be ext2...

The alternative is to have a userspace application that can check these
things without having to replay the log.

--
Matthew Garrett | [email protected]

2004-11-30 21:20:46

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi.

On Tue, 2004-11-30 at 23:16, Stefan Seyfried wrote:
> >>>change a parameter or forcing them to do an ls in /dev with obscure
> >>>parameters (to get the major and minor numbers) when they already know
> >>>they want /dev/sda1 isn't user friendly. Obviously user friendliness is
> >>
> >>This can easily be done by a userspace helper. You do use the
> >>(userspace) X server to display your GUI, don't you?
> >
> > No. Not at all. All of userspace is well and truly wedged in a block of
> > ice by then.
>
> you are not changing the suspend device after freezing userspace, or i
> am getting something horribly wrong here.

No, it doesn't change once userspace is frozen; you're correct.

> so if you have 2 choices of an interface:
> 1) more complex kernel code, but you can do "echo /dev/name > /proc/foo"
> 2) less complex kernel code, now you have a userspace helper e.g.
> "suspend_ctl foodev /dev/name" which then does the magic number
> calculations in userspace and puts the magic number into the kernel.
>
> I think that interface 2) would be preferred by most kernel developers.
> Especially since this is code only needed on a relatively small subset
> of all linux installations.
>
> There is a "top" userspace program to parse kernel numbers, we don't
> have "/proc/top".

Forgive me for asking a stupid question, but why all this fuss when the
code is already in the kernel? And isn't really that complex anyway.
Instead of whatever for parsing a major and minor, we have

resume_device = name_to_dev_t(commandline);

Is it really worth all this heat for that call and making two routines
(name_to_dev_t and try_name IIRC) not be __init. It seems to me that
it's far more complex to create some userspace program to do this stuff.

> >>Putting only the absolutely necessary things into the kernel (the same
> >>is true for the interactive resume thing - if someone wants interactive
> >>startup at a failing resume, he has to use an initrd, i don't see a
> >>problem with that) will probably increase the acceptance a bit :-)
> >
> > That's fine if your initrd is properly configured and you're willing to
>
> This is something distributions have to take care of.

No; it's something the users will have to take care of. Distro makers
might make the process more automated, but in the end it's the user's
problem if it doesn't work.

> > add extra cruft to the kernel so userspace can get the info it needs,
>
> not much extra cruft is needed. The "echo resume > /sys/power/state"
> just returns (which it wouldn't if the resume was successful), then you
> can decide what to do next.
>
> > and report what the user wants to do. If, however, you don't use an
> > initrd, you're sunk.
>
> yes. There are other prerequisites for suspend than using an initrd
> though (you need a computer :-). If you don't use an initrd, you cannot
> use the interactive features but have to decide at compile time which
> way to go if the resume fails. That's life.

Have you looked at the code for handling this? It's really very simple.

> > Regarding acceptance, there's no point in getting it accepted into the
> > kernel if we end up with something that's user-unfriendly. I think it
> > will help a lot if we agree that suspend does need to blur the lines
> > between kernel and userspace a little, in the interests of providing
> > software that is superior.
>
> User-friendlyness is an joint effort of kernel and userspace. The user
> does not care who does the work when he clicks on his "hibernation" Icon
> in the taskbar. (The same is true for users of an hibernation script).
> Actually, the thing that makes suspend2 more reliable than swsusp is
> probably the very good hibernation script (userspace) that saves users
> the reading of documentation since it automatically unloads all critical
> modules etc. For me, pavel's later versions as in SUSE 9.2 have worked
> out of the box on every non-SMP i386 notebook i have laid my hands on in
> the last 6 months (thanks to userspace taking care of bad modules etc).

Have those boxes had DRI enabled or serious USB usage? I'd be surprised
if you haven't run into the same problems we have.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-30 21:49:18

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi.

On Wed, 2004-12-01 at 00:07, Pavel Machek wrote:
> > Mmm. I wonder how much code that would require us to add. I do like the
> > idea of not interacting where the answer is obvious :>. I still think,
> > however, that interacting when the answer isn't obvious is the right
> > thing to do. Take for example the case where we find an image, but the
> > device numbers look like they belong to 2.4 and we're a 2.6 kernel. We
> > can't read the header (we can't be sure that this is the cause). The
> > user - or their cat - might have selected the wrong boot image
> > unintentionally. Why shouldn't we give them the opportunity to reboot
> > and get the right one?
>
> Well, kernel depending on user feedback has some interesting issues...
> ...like user not speaking english or user using speech output.
> Thats why pushing "Shall I reboot?" etc prompts into userland
> is good idea. (Distros probably will not get it right, either, but at least
> they get a chance.)

And if we don't have userspace yet? (No initrd/initramfs).

The language issue is a good point; the whole issue of kernel messages
and languages needs a more general solution.

It probably also helps to remember the point to this:
- avoid file system corruption
- give the user a chance to confirm/fix actions that look wrong

Would making interaction a compile time option make you happy? (That
said, I'm not looking forward to trying to guess what the system should
do in some of these cases if not allowed to ask).

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-30 22:32:34

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge

Hi!

> > >>Putting only the absolutely necessary things into the kernel (the same
> > >>is true for the interactive resume thing - if someone wants interactive
> > >>startup at a failing resume, he has to use an initrd, i don't see a
> > >>problem with that) will probably increase the acceptance a bit :-)
> > >
> > > That's fine if your initrd is properly configured and you're willing to
> >
> > This is something distributions have to take care of.
>
> No; it's something the users will have to take care of. Distro makers
> might make the process more automated, but in the end it's the user's
> problem if it doesn't work.

Actually, no, its not like that.

User will click icon in KDE, and if it does not suspend & resume
properly, distribution has problem to fix. And yes, it works well in
SUSE9.2.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-30 22:43:39

by Pavel Machek

[permalink] [raw]
Subject: Re: Suspend 2 merge: 49/51: Checksumming

Hi!

> > Yes, it would be good sanity check. ext3 replays journals even on
> > read-only mount so your / will need to be ext2...
>
> The alternative is to have a userspace application that can check these
> things without having to replay the log.

Well, that works as long as you do not have your application on ext3
filesystem :-). If your root filesystem is ext2, you have no problem,
and whether or not checking is done in kernelspace does not matter.

Well, you could probably mount ext3 as read-only ext2...

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-01-09 22:43:49

by Pavel Machek

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

Hi!

> > I can not merge anything before 2.6.10. As you have seen, I have quite
> > a lot of patches in my tree, and I do not want mix them with these...
> >
> > > device-tree.diff
> > > base from suspend2 with a little changed.
> >
> > I do not want this one.
> >
> > > core.diff
> > > 1: redefine struct pbe for using _no_ continuous as pagedir.
> >
> > Can I get this one as a separate diff?
>
> Here is it.

Do you have any updates? It would be nice to separate non-continuous
pagedir from speeding up check_pagedir?

...plus check_pagedir should really use PageNosaveFree flag instead of
allocating there own (big!) bitmaps. It should also make the code
simpler...
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-01-11 02:06:31

by Hu Gang

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

On Sun, Jan 09, 2005 at 11:43:25PM +0100, Pavel Machek wrote:
> Hi!
>
> Do you have any updates? It would be nice to separate non-continuous
> pagedir from speeding up check_pagedir?
>
> ...plus check_pagedir should really use PageNosaveFree flag instead of
> allocating there own (big!) bitmaps. It should also make the code
> simpler...
> Pavel

I'm very happy with current swsusp, that's stable for me.
2.6.10-mm1 + ppc patch from
http://honk.physik.uni-konstanz.de/~agx/linux-ppc/kernel/
+ your free some memory patch

I using it for a week, never failed, never oops. :)

The only problem is relocating a little slowly.

Now I don't think non-continuous pagedir is really need. Anyway I'll
prepare a patch to make swsusp using non-continuous pagedir.

any comments.

--
Hu Gang .-.
/v\
// \\
Linux User /( )\ [204016]
GPG Key ID ^^-^^ http://soulinfo.com/~hugang/hugang.asc

2005-01-11 03:19:09

by Pavel Machek

[permalink] [raw]
Subject: Re: software suspend patch [1/6]

Hi!

> > Do you have any updates? It would be nice to separate non-continuous
> > pagedir from speeding up check_pagedir?
> >
> > ...plus check_pagedir should really use PageNosaveFree flag instead of
> > allocating there own (big!) bitmaps. It should also make the code
> > simpler...
>
> I'm very happy with current swsusp, that's stable for me.
> 2.6.10-mm1 + ppc patch from
> http://honk.physik.uni-konstanz.de/~agx/linux-ppc/kernel/
> + your free some memory patch
>
> I using it for a week, never failed, never oops. :)
>
> The only problem is relocating a little slowly.

I just got very nice patch from Lukas Hejtmanek to relocate
faster... It would be great if you could test it.

> Now I don't think non-continuous pagedir is really need. Anyway I'll
> prepare a patch to make swsusp using non-continuous pagedir.

Thanks.

Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!