Hi,
The following patch limits the size of the suspend image to approx. 500 MB,
which should improve the overall performance of swsusp on systems with
more than 1 GB of RAM.
It introduces the constant IMAGE_SIZE that can be set to the preferred size
of the image (in MB) and modifies the memory-shrinking part of
swsusp to take this constant into account (500 is the default value
of IMAGE_SIZE).
Please apply (Pavel, please ack if that's ok).
Greetings,
Rafael
Signed-off-by: Rafael J. Wysocki <[email protected]>
kernel/power/power.h | 8 +++-----
kernel/power/swsusp.c | 17 ++++++++---------
2 files changed, 11 insertions(+), 14 deletions(-)
Index: linux-2.6.15-rc5-mm1/kernel/power/power.h
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/power.h 2005-12-05 22:07:12.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/power.h 2005-12-07 22:16:55.000000000 +0100
@@ -53,12 +53,10 @@
extern struct pbe *pagedir_nosave;
/*
- * This compilation switch determines the way in which memory will be freed
- * during suspend. If defined, only as much memory will be freed as needed
- * to complete the suspend, which will make it go faster. Otherwise, the
- * largest possible amount of memory will be freed.
+ * Preferred image size in MB (set it to zero to get the smallest
+ * image possible)
*/
-#define FAST_FREE 1
+#define IMAGE_SIZE 500
extern asmlinkage int swsusp_arch_suspend(void);
extern asmlinkage int swsusp_arch_resume(void);
Index: linux-2.6.15-rc5-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/swsusp.c 2005-12-05 22:07:12.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/swsusp.c 2005-12-07 14:15:40.000000000 +0100
@@ -626,7 +626,7 @@
int swsusp_shrink_memory(void)
{
- long tmp;
+ long size, tmp;
struct zone *zone;
unsigned long pages = 0;
unsigned int i = 0;
@@ -634,11 +634,11 @@
printk("Shrinking memory... ");
do {
-#ifdef FAST_FREE
- tmp = 2 * count_highmem_pages();
- tmp += tmp / 50 + count_data_pages();
- tmp += (tmp + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
+ size = 2 * count_highmem_pages();
+ size += size / 50 + count_data_pages();
+ size += (size + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
PAGES_FOR_IO;
+ tmp = size;
for_each_zone (zone)
if (!is_highmem(zone))
tmp -= zone->free_pages;
@@ -647,11 +647,10 @@
if (!tmp)
return -ENOMEM;
pages += tmp;
+ } else if (size > (IMAGE_SIZE * 1024 * 1024) / PAGE_SIZE) {
+ tmp = shrink_all_memory(SHRINK_BITE);
+ pages += tmp;
}
-#else
- tmp = shrink_all_memory(SHRINK_BITE);
- pages += tmp;
-#endif
printk("\b%c", p[i++%4]);
} while (tmp > 0);
printk("\bdone (%lu pages freed)\n", pages);
On St 07-12-05 22:46:05, Rafael J. Wysocki wrote:
> Hi,
>
> The following patch limits the size of the suspend image to approx. 500 MB,
> which should improve the overall performance of swsusp on systems with
> more than 1 GB of RAM.
>
> It introduces the constant IMAGE_SIZE that can be set to the preferred size
> of the image (in MB) and modifies the memory-shrinking part of
> swsusp to take this constant into account (500 is the default value
> of IMAGE_SIZE).
>
> Please apply (Pavel, please ack if that's ok).
ACK.
--
Thanks, Sharp!
Rafael J. Wysocki wrote:
> The following patch limits the size of the suspend image to approx. 500 MB,
> which should improve the overall performance of swsusp on systems with
> more than 1 GB of RAM.
>
> It introduces the constant IMAGE_SIZE that can be set to the preferred size
> of the image (in MB) and modifies the memory-shrinking part of
> swsusp to take this constant into account (500 is the default value
> of IMAGE_SIZE).
What happens if IMAGE_SIZE is bigger than free swap? Do we "try harder"
or do we fail?
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, N?rnberg \ -- Leonard Cohen
Rafael J. Wysocki wrote:
> The following patch limits the size of the suspend image to approx. 500 MB,
> which should improve the overall performance of swsusp on systems with
> more than 1 GB of RAM.
>
> It introduces the constant IMAGE_SIZE that can be set to the preferred size
> of the image (in MB) and modifies the memory-shrinking part of
> swsusp to take this constant into account (500 is the default value
> of IMAGE_SIZE).
What happens if IMAGE_SIZE is bigger than free swap? Do we "try harder"
or do we fail?
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, N?rnberg \ -- Leonard Cohen
On Friday, 9 December 2005 16:48, Stefan Seyfried wrote:
> Rafael J. Wysocki wrote:
>
> > The following patch limits the size of the suspend image to approx. 500 MB,
> > which should improve the overall performance of swsusp on systems with
> > more than 1 GB of RAM.
> >
> > It introduces the constant IMAGE_SIZE that can be set to the preferred size
> > of the image (in MB) and modifies the memory-shrinking part of
> > swsusp to take this constant into account (500 is the default value
> > of IMAGE_SIZE).
>
> What happens if IMAGE_SIZE is bigger than free swap? Do we "try harder"
> or do we fail?
First, with swsusp the image can't be bigger than 1/2 of lowmem (1/2 of RAM
on x86-64) and the too great values of IMAGE_SIZE have no effect. Still, if
the amount of free swap is smaller than 1/2 of RAM and the image happens
to be bigger, we will fail.
Greetings,
Rafael
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
Rafael J. Wysocki wrote:
> On Friday, 9 December 2005 16:48, Stefan Seyfried wrote:
>
>> What happens if IMAGE_SIZE is bigger than free swap? Do we "try harder"
>> or do we fail?
>
> First, with swsusp the image can't be bigger than 1/2 of lowmem (1/2 of RAM
> on x86-64) and the too great values of IMAGE_SIZE have no effect. Still, if
> the amount of free swap is smaller than 1/2 of RAM and the image happens
> to be bigger, we will fail.
ok. This is not nice since we might fail without any _real_ need. Can we
make this parameter userspace-tweakable, so that my userspace app can do
something like (pseudocode):
echo 500 > /sys/power/swsusp/imagesize
echo disk > /sys/power/state
R=$?
if [ $R -eq $ENOMEM ]; then
echo 100 > /sys/power/swsusp/imagesize # try again
echo disk > /sys/power/state
R=$?
fi
if [ $R -ne 0 ]; then
pop_up_some_loud_beeping_window "suspend failed!"
fi
This would at least give us a chance for a second try. I know that Pavel
dislikes userspace tunables, but i dislike failing suspends ;-)
Best regards,
Stefan
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, N?rnberg \ -- Leonard Cohen
Hi!
> >> What happens if IMAGE_SIZE is bigger than free swap? Do we "try harder"
> >> or do we fail?
> >
> > First, with swsusp the image can't be bigger than 1/2 of lowmem (1/2 of RAM
> > on x86-64) and the too great values of IMAGE_SIZE have no effect. Still, if
> > the amount of free swap is smaller than 1/2 of RAM and the image happens
> > to be bigger, we will fail.
>
> ok. This is not nice since we might fail without any _real_ need. Can we
> make this parameter userspace-tweakable, so that my userspace app can do
> something like (pseudocode):
>
> echo 500 > /sys/power/swsusp/imagesize
> echo disk > /sys/power/state
> R=$?
> if [ $R -eq $ENOMEM ]; then
> echo 100 > /sys/power/swsusp/imagesize # try again
You can do 'echo 0' -- as small as possible.
> echo disk > /sys/power/state
> R=$?
> fi
> if [ $R -ne 0 ]; then
> pop_up_some_loud_beeping_window "suspend failed!"
> fi
>
> This would at least give us a chance for a second try. I know that Pavel
> dislikes userspace tunables, but i dislike failing suspends ;-)
Can we do that when we start seeing failed suspends? I think it will
not happen. If we have reasonably-sized swap partition, it should be
ok.
Pavel
--
Thanks, Sharp!
Hi,
On Friday, 9 December 2005 20:17, Pavel Machek wrote:
> > >> What happens if IMAGE_SIZE is bigger than free swap? Do we "try harder"
> > >> or do we fail?
> > >
> > > First, with swsusp the image can't be bigger than 1/2 of lowmem (1/2 of RAM
> > > on x86-64) and the too great values of IMAGE_SIZE have no effect. Still, if
> > > the amount of free swap is smaller than 1/2 of RAM and the image happens
> > > to be bigger, we will fail.
> >
> > ok. This is not nice since we might fail without any _real_ need.
That depends a good deal on how you define "real". :-)
> > Can we make this parameter userspace-tweakable, so that my userspace app
> > can do something like (pseudocode):
> >
> > echo 500 > /sys/power/swsusp/imagesize
> > echo disk > /sys/power/state
> > R=$?
> > if [ $R -eq $ENOMEM ]; then
> > echo 100 > /sys/power/swsusp/imagesize # try again
>
> You can do 'echo 0' -- as small as possible.
>
> > echo disk > /sys/power/state
> > R=$?
> > fi
> > if [ $R -ne 0 ]; then
> > pop_up_some_loud_beeping_window "suspend failed!"
> > fi
> >
> > This would at least give us a chance for a second try. I know that Pavel
> > dislikes userspace tunables, but i dislike failing suspends ;-)
>
> Can we do that when we start seeing failed suspends? I think it will
> not happen. If we have reasonably-sized swap partition, it should be
> ok.
Yes. Moreover, we can do something like that without a userspace
tunable, if we check for the free swap before we try to shrink memory.
This would take some time to implement, though, and I'd rather
like to do the userland interface first.
The tunable may be useful to people who'd like to achieve the
maximum efficiency of suspend/resume and it would be a nice
feature to have, I think, but let's say we'll try to implement it
in the future, if still needed/wanted.
Greetings,
Rafael
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
Hi,
On Friday, 9 December 2005 23:08, Rafael J. Wysocki wrote:
}-- snip --{
> > >
> > > This would at least give us a chance for a second try. I know that Pavel
> > > dislikes userspace tunables, but i dislike failing suspends ;-)
> >
> > Can we do that when we start seeing failed suspends? I think it will
> > not happen. If we have reasonably-sized swap partition, it should be
> > ok.
>
> Yes. Moreover, we can do something like that without a userspace
> tunable, if we check for the free swap before we try to shrink memory.
> This would take some time to implement, though, and I'd rather
> like to do the userland interface first.
>
> The tunable may be useful to people who'd like to achieve the
> maximum efficiency of suspend/resume and it would be a nice
> feature to have, I think, but let's say we'll try to implement it
> in the future, if still needed/wanted.
Actually the tunable turned out to be quite easy to implement and I think
I'll need it for userspace swsusp (the suspend-handling userspace
process will need to tell the kernel how much space there is for the
image).
Please give it a try.
Greetings,
Rafael
Signed-off-by: Rafael J. Wysocki <[email protected]>
kernel/power/disk.c | 25 +++++++++++++++++++++++++
kernel/power/power.h | 6 ++++--
kernel/power/swsusp.c | 4 +++-
3 files changed, 32 insertions(+), 3 deletions(-)
Index: linux-2.6.15-rc5-mm1/kernel/power/disk.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/disk.c 2005-12-10 13:12:18.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/disk.c 2005-12-10 13:20:38.000000000 +0100
@@ -367,9 +367,34 @@
power_attr(resume);
+static ssize_t image_size_show(struct subsystem * subsys, char *buf)
+{
+ return sprintf(buf, "%u\n", image_size);
+}
+
+static ssize_t image_size_store(struct subsystem * subsys, const char * buf, size_t n)
+{
+ int len;
+ char *p;
+ unsigned int size;
+
+ p = memchr(buf, '\n', n);
+ len = p ? p - buf : n;
+
+ if (sscanf(buf, "%u", &size) == 1) {
+ image_size = size < MAX_IMAGE_SIZE ? size : MAX_IMAGE_SIZE;
+ return n;
+ }
+
+ return -EINVAL;
+}
+
+power_attr(image_size);
+
static struct attribute * g[] = {
&disk_attr.attr,
&resume_attr.attr,
+ &image_size_attr.attr,
NULL,
};
Index: linux-2.6.15-rc5-mm1/kernel/power/power.h
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/power.h 2005-12-10 13:12:18.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/power.h 2005-12-10 13:20:46.000000000 +0100
@@ -53,10 +53,12 @@
extern struct pbe *pagedir_nosave;
/*
- * Preferred image size in MB (set it to zero to get the smallest
+ * Maximum image size in MB (set it to zero to get the smallest
* image possible)
*/
-#define IMAGE_SIZE 500
+#define MAX_IMAGE_SIZE 500
+
+extern unsigned int image_size;
extern asmlinkage int swsusp_arch_suspend(void);
extern asmlinkage int swsusp_arch_resume(void);
Index: linux-2.6.15-rc5-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/swsusp.c 2005-12-10 13:12:18.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/swsusp.c 2005-12-10 13:20:42.000000000 +0100
@@ -69,6 +69,8 @@
#include "power.h"
+unsigned int image_size = MAX_IMAGE_SIZE;
+
#ifdef CONFIG_HIGHMEM
unsigned int count_highmem_pages(void);
int save_highmem(void);
@@ -647,7 +649,7 @@
if (!tmp)
return -ENOMEM;
pages += tmp;
- } else if (size > (IMAGE_SIZE * 1024 * 1024) / PAGE_SIZE) {
+ } else if (size > (image_size * 1024 * 1024) / PAGE_SIZE) {
tmp = shrink_all_memory(SHRINK_BITE);
pages += tmp;
}
Hi!
> > The tunable may be useful to people who'd like to achieve the
> > maximum efficiency of suspend/resume and it would be a nice
> > feature to have, I think, but let's say we'll try to implement it
> > in the future, if still needed/wanted.
>
> Actually the tunable turned out to be quite easy to implement and I think
> I'll need it for userspace swsusp (the suspend-handling userspace
> process will need to tell the kernel how much space there is for the
> image).
Looks mostly okay to me, small comments below.
> Index: linux-2.6.15-rc5-mm1/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/kernel/power/disk.c 2005-12-10 13:12:18.000000000 +0100
> +++ linux-2.6.15-rc5-mm1/kernel/power/disk.c 2005-12-10 13:20:38.000000000 +0100
> @@ -367,9 +367,34 @@
>
> power_attr(resume);
>
> +static ssize_t image_size_show(struct subsystem * subsys, char *buf)
> +{
> + return sprintf(buf, "%u\n", image_size);
> +}
> +
> +static ssize_t image_size_store(struct subsystem * subsys, const char * buf, size_t n)
> +{
> + int len;
> + char *p;
> + unsigned int size;
> +
> + p = memchr(buf, '\n', n);
> + len = p ? p - buf : n;
len and p are unused.
> + if (sscanf(buf, "%u", &size) == 1) {
> + image_size = size < MAX_IMAGE_SIZE ? size : MAX_IMAGE_SIZE;
> + return n;
Why the limit? We may want to allow very big images when someone wants
"as similar to suspended system as possible".
> Index: linux-2.6.15-rc5-mm1/kernel/power/power.h
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/kernel/power/power.h 2005-12-10 13:12:18.000000000 +0100
> +++ linux-2.6.15-rc5-mm1/kernel/power/power.h 2005-12-10 13:20:46.000000000 +0100
> @@ -53,10 +53,12 @@
> extern struct pbe *pagedir_nosave;
>
> /*
> - * Preferred image size in MB (set it to zero to get the smallest
> + * Maximum image size in MB (set it to zero to get the smallest
> * image possible)
> */
In fact this is not maximum. If more than 500MB can't be freed, well,
it will not be freed. Very improbable, but possible.
> -#define IMAGE_SIZE 500
> +#define MAX_IMAGE_SIZE 500
> +
With runtime configuration, we should not need additional
compile-time config. OTOH /proc tunables should have descritption in
Documentation/.
Pavel
--
Thanks, Sharp!
Hi,
On Saturday, 10 December 2005 17:06, Pavel Machek wrote:
> > > The tunable may be useful to people who'd like to achieve the
> > > maximum efficiency of suspend/resume and it would be a nice
> > > feature to have, I think, but let's say we'll try to implement it
> > > in the future, if still needed/wanted.
> >
> > Actually the tunable turned out to be quite easy to implement and I think
> > I'll need it for userspace swsusp (the suspend-handling userspace
> > process will need to tell the kernel how much space there is for the
> > image).
>
> Looks mostly okay to me, small comments below.
>
> > Index: linux-2.6.15-rc5-mm1/kernel/power/disk.c
> > ===================================================================
> > --- linux-2.6.15-rc5-mm1.orig/kernel/power/disk.c 2005-12-10 13:12:18.000000000 +0100
> > +++ linux-2.6.15-rc5-mm1/kernel/power/disk.c 2005-12-10 13:20:38.000000000 +0100
> > @@ -367,9 +367,34 @@
> >
> > power_attr(resume);
> >
> > +static ssize_t image_size_show(struct subsystem * subsys, char *buf)
> > +{
> > + return sprintf(buf, "%u\n", image_size);
> > +}
> > +
> > +static ssize_t image_size_store(struct subsystem * subsys, const char * buf, size_t n)
> > +{
> > + int len;
> > + char *p;
> > + unsigned int size;
> > +
> > + p = memchr(buf, '\n', n);
> > + len = p ? p - buf : n;
>
> len and p are unused.
Right. BTW, the same applies to resume_store().
>
> > + if (sscanf(buf, "%u", &size) == 1) {
> > + image_size = size < MAX_IMAGE_SIZE ? size : MAX_IMAGE_SIZE;
> > + return n;
>
> Why the limit? We may want to allow very big images when someone wants
> "as similar to suspended system as possible".
It's not really necessary. I'll remove it.
>
> > Index: linux-2.6.15-rc5-mm1/kernel/power/power.h
> > ===================================================================
> > --- linux-2.6.15-rc5-mm1.orig/kernel/power/power.h 2005-12-10 13:12:18.000000000 +0100
> > +++ linux-2.6.15-rc5-mm1/kernel/power/power.h 2005-12-10 13:20:46.000000000 +0100
> > @@ -53,10 +53,12 @@
> > extern struct pbe *pagedir_nosave;
> >
> > /*
> > - * Preferred image size in MB (set it to zero to get the smallest
> > + * Maximum image size in MB (set it to zero to get the smallest
> > * image possible)
> > */
>
> In fact this is not maximum. If more than 500MB can't be freed, well,
> it will not be freed. Very improbable, but possible.
>
> > -#define IMAGE_SIZE 500
> > +#define MAX_IMAGE_SIZE 500
> > +
>
> With runtime configuration, we should not need additional
> compile-time config. OTOH /proc tunables should have descritption in
> Documentation/.
This only is an experimental version. I'll write the description for the
final one.
Rafael
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
Update:
On Saturday, 10 December 2005 21:06, Rafael J. Wysocki wrote:
> On Saturday, 10 December 2005 17:06, Pavel Machek wrote:
> > > > The tunable may be useful to people who'd like to achieve the
> > > > maximum efficiency of suspend/resume and it would be a nice
> > > > feature to have, I think, but let's say we'll try to implement it
> > > > in the future, if still needed/wanted.
> > >
> > > Actually the tunable turned out to be quite easy to implement and I think
> > > I'll need it for userspace swsusp (the suspend-handling userspace
> > > process will need to tell the kernel how much space there is for the
> > > image).
> >
> > Looks mostly okay to me, small comments below.
A cleaner version with comments etc. follows.
Rafael
Signed-off-by: Rafael J. Wysocki <[email protected]>
Documentation/power/interface.txt | 11 +++++++++++
Documentation/power/swsusp.txt | 7 ++++++-
kernel/power/disk.c | 20 ++++++++++++++++++++
kernel/power/power.h | 7 ++-----
kernel/power/swsusp.c | 10 +++++++++-
5 files changed, 48 insertions(+), 7 deletions(-)
Index: linux-2.6.15-rc5-mm1/kernel/power/disk.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/disk.c 2005-12-07 23:19:03.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/disk.c 2005-12-10 21:10:53.000000000 +0100
@@ -367,9 +367,29 @@
power_attr(resume);
+static ssize_t image_size_show(struct subsystem * subsys, char *buf)
+{
+ return sprintf(buf, "%u\n", image_size);
+}
+
+static ssize_t image_size_store(struct subsystem * subsys, const char * buf, size_t n)
+{
+ unsigned int size;
+
+ if (sscanf(buf, "%u", &size) == 1) {
+ image_size = size;
+ return n;
+ }
+
+ return -EINVAL;
+}
+
+power_attr(image_size);
+
static struct attribute * g[] = {
&disk_attr.attr,
&resume_attr.attr,
+ &image_size_attr.attr,
NULL,
};
Index: linux-2.6.15-rc5-mm1/kernel/power/power.h
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/power.h 2005-12-10 20:59:52.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/power.h 2005-12-10 21:13:04.000000000 +0100
@@ -52,11 +52,8 @@
extern unsigned int nr_copy_pages;
extern struct pbe *pagedir_nosave;
-/*
- * Preferred image size in MB (set it to zero to get the smallest
- * image possible)
- */
-#define IMAGE_SIZE 500
+/* Preferred image size in MB (default 500) */
+extern unsigned int image_size;
extern asmlinkage int swsusp_arch_suspend(void);
extern asmlinkage int swsusp_arch_resume(void);
Index: linux-2.6.15-rc5-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.15-rc5-mm1.orig/kernel/power/swsusp.c 2005-12-10 20:59:52.000000000 +0100
+++ linux-2.6.15-rc5-mm1/kernel/power/swsusp.c 2005-12-10 22:21:25.000000000 +0100
@@ -69,6 +69,14 @@
#include "power.h"
+/*
+ * Preferred image size in MB (tunable via /sys/power/image_size).
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N MB, but if that is impossible, it will
+ * try to create the smallest image possible.
+ */
+unsigned int image_size = 500;
+
#ifdef CONFIG_HIGHMEM
unsigned int count_highmem_pages(void);
int save_highmem(void);
@@ -647,7 +655,7 @@
if (!tmp)
return -ENOMEM;
pages += tmp;
- } else if (size > (IMAGE_SIZE * 1024 * 1024) / PAGE_SIZE) {
+ } else if (size > (image_size * 1024 * 1024) / PAGE_SIZE) {
tmp = shrink_all_memory(SHRINK_BITE);
pages += tmp;
}
Index: linux-2.6.15-rc5-mm1/Documentation/power/swsusp.txt
===================================================================
--- linux-2.6.15-rc5-mm1.orig/Documentation/power/swsusp.txt 2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc5-mm1/Documentation/power/swsusp.txt 2005-12-10 23:49:08.000000000 +0100
@@ -27,6 +27,11 @@
echo platform > /sys/power/disk; echo disk > /sys/power/state
+If you want to limit the suspend image size to N megabytes, do
+
+echo N > /sys/power/image_size
+
+before suspend (it is limited to 500 MB by default).
Encrypted suspend image:
------------------------
Index: linux-2.6.15-rc5-mm1/Documentation/power/interface.txt
===================================================================
--- linux-2.6.15-rc5-mm1.orig/Documentation/power/interface.txt 2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.15-rc5-mm1/Documentation/power/interface.txt 2005-12-10 23:42:39.000000000 +0100
@@ -41,3 +41,14 @@
It will only change to 'firmware' or 'platform' if the system supports
it.
+/sys/power/image_size controls the size of the image created by
+the suspend-to-disk mechanism. It can be written a string
+representing a non-negative integer that will be used as an upper
+limit of the image size, in megabytes. The suspend-to-disk mechanism will
+do its best to ensure the image size will not exceed that number. However,
+if this turns out to be impossible, it will try to suspend anyway using the
+smallest image possible. In particular, if "0" is written to this file, the
+suspend image will be as small as possible.
+
+Reading from this file will display the current image size limit, which
+is set to 500 MB by default.
Hi!
> > > +static ssize_t image_size_store(struct subsystem * subsys, const char * buf, size_t n)
> > > +{
> > > + int len;
> > > + char *p;
> > > + unsigned int size;
> > > +
> > > + p = memchr(buf, '\n', n);
> > > + len = p ? p - buf : n;
> >
> > len and p are unused.
>
> Right. BTW, the same applies to resume_store().
Thanks, fixed locally.
Pavel
--
Thanks, Sharp!
Hi!
> > > > > The tunable may be useful to people who'd like to achieve the
> > > > > maximum efficiency of suspend/resume and it would be a nice
> > > > > feature to have, I think, but let's say we'll try to implement it
> > > > > in the future, if still needed/wanted.
> > > >
> > > > Actually the tunable turned out to be quite easy to implement and I think
> > > > I'll need it for userspace swsusp (the suspend-handling userspace
> > > > process will need to tell the kernel how much space there is for the
> > > > image).
> > >
> > > Looks mostly okay to me, small comments below.
>
> A cleaner version with comments etc. follows.
Looks good to me. ACK.
Pavel
> Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> Documentation/power/interface.txt | 11 +++++++++++
> Documentation/power/swsusp.txt | 7 ++++++-
> kernel/power/disk.c | 20 ++++++++++++++++++++
> kernel/power/power.h | 7 ++-----
> kernel/power/swsusp.c | 10 +++++++++-
> 5 files changed, 48 insertions(+), 7 deletions(-)
>
> Index: linux-2.6.15-rc5-mm1/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/kernel/power/disk.c 2005-12-07 23:19:03.000000000 +0100
> +++ linux-2.6.15-rc5-mm1/kernel/power/disk.c 2005-12-10 21:10:53.000000000 +0100
> @@ -367,9 +367,29 @@
>
> power_attr(resume);
>
> +static ssize_t image_size_show(struct subsystem * subsys, char *buf)
> +{
> + return sprintf(buf, "%u\n", image_size);
> +}
> +
> +static ssize_t image_size_store(struct subsystem * subsys, const char * buf, size_t n)
> +{
> + unsigned int size;
> +
> + if (sscanf(buf, "%u", &size) == 1) {
> + image_size = size;
> + return n;
> + }
> +
> + return -EINVAL;
> +}
> +
> +power_attr(image_size);
> +
> static struct attribute * g[] = {
> &disk_attr.attr,
> &resume_attr.attr,
> + &image_size_attr.attr,
> NULL,
> };
>
> Index: linux-2.6.15-rc5-mm1/kernel/power/power.h
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/kernel/power/power.h 2005-12-10 20:59:52.000000000 +0100
> +++ linux-2.6.15-rc5-mm1/kernel/power/power.h 2005-12-10 21:13:04.000000000 +0100
> @@ -52,11 +52,8 @@
> extern unsigned int nr_copy_pages;
> extern struct pbe *pagedir_nosave;
>
> -/*
> - * Preferred image size in MB (set it to zero to get the smallest
> - * image possible)
> - */
> -#define IMAGE_SIZE 500
> +/* Preferred image size in MB (default 500) */
> +extern unsigned int image_size;
>
> extern asmlinkage int swsusp_arch_suspend(void);
> extern asmlinkage int swsusp_arch_resume(void);
> Index: linux-2.6.15-rc5-mm1/kernel/power/swsusp.c
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/kernel/power/swsusp.c 2005-12-10 20:59:52.000000000 +0100
> +++ linux-2.6.15-rc5-mm1/kernel/power/swsusp.c 2005-12-10 22:21:25.000000000 +0100
> @@ -69,6 +69,14 @@
>
> #include "power.h"
>
> +/*
> + * Preferred image size in MB (tunable via /sys/power/image_size).
> + * When it is set to N, swsusp will do its best to ensure the image
> + * size will not exceed N MB, but if that is impossible, it will
> + * try to create the smallest image possible.
> + */
> +unsigned int image_size = 500;
> +
> #ifdef CONFIG_HIGHMEM
> unsigned int count_highmem_pages(void);
> int save_highmem(void);
> @@ -647,7 +655,7 @@
> if (!tmp)
> return -ENOMEM;
> pages += tmp;
> - } else if (size > (IMAGE_SIZE * 1024 * 1024) / PAGE_SIZE) {
> + } else if (size > (image_size * 1024 * 1024) / PAGE_SIZE) {
> tmp = shrink_all_memory(SHRINK_BITE);
> pages += tmp;
> }
> Index: linux-2.6.15-rc5-mm1/Documentation/power/swsusp.txt
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/Documentation/power/swsusp.txt 2005-10-28 02:02:08.000000000 +0200
> +++ linux-2.6.15-rc5-mm1/Documentation/power/swsusp.txt 2005-12-10 23:49:08.000000000 +0100
> @@ -27,6 +27,11 @@
>
> echo platform > /sys/power/disk; echo disk > /sys/power/state
>
> +If you want to limit the suspend image size to N megabytes, do
> +
> +echo N > /sys/power/image_size
> +
> +before suspend (it is limited to 500 MB by default).
>
> Encrypted suspend image:
> ------------------------
> Index: linux-2.6.15-rc5-mm1/Documentation/power/interface.txt
> ===================================================================
> --- linux-2.6.15-rc5-mm1.orig/Documentation/power/interface.txt 2005-10-28 02:02:08.000000000 +0200
> +++ linux-2.6.15-rc5-mm1/Documentation/power/interface.txt 2005-12-10 23:42:39.000000000 +0100
> @@ -41,3 +41,14 @@
> It will only change to 'firmware' or 'platform' if the system supports
> it.
>
> +/sys/power/image_size controls the size of the image created by
> +the suspend-to-disk mechanism. It can be written a string
> +representing a non-negative integer that will be used as an upper
> +limit of the image size, in megabytes. The suspend-to-disk mechanism will
> +do its best to ensure the image size will not exceed that number. However,
> +if this turns out to be impossible, it will try to suspend anyway using the
> +smallest image possible. In particular, if "0" is written to this file, the
> +suspend image will be as small as possible.
> +
> +Reading from this file will display the current image size limit, which
> +is set to 500 MB by default.
--
Thanks, Sharp!
On Sat, Dec 10, 2005 at 11:56:26PM +0100, Rafael J. Wysocki wrote:
> Update:
>
> A cleaner version with comments etc. follows.
>
> Documentation/power/interface.txt | 11 +++++++++++
> Documentation/power/swsusp.txt | 7 ++++++-
> kernel/power/disk.c | 20 ++++++++++++++++++++
> kernel/power/power.h | 7 ++-----
> kernel/power/swsusp.c | 10 +++++++++-
> 5 files changed, 48 insertions(+), 7 deletions(-)
I've been testing this as present in 2.6.15-rc5-mm2 and it works like a
champ. I am currently running with image_size=150, which seems to be a
nice balance between quick suspend and still having a reasonable working
set after resume.
I did have one concerning failure during an earlier test, though - I
have only 512MB of swap with 1.25GB RAM. Now obviously if there are
>500MB user pages, swsusp must fail; but I think I had a failure even
though there was only about 400MB user pages - some of it was swapped
out, and I had image_size=500, which resulted in an image that would not
fit into the available swap space.
It sucks to fail the suspend just because the chosen image size didn't
fit into the current state of the machine. Now obviously I need to
resize my swap partition to prevent this from happening, but it would
be nice if swsusp would automatically "try harder!" if appropriate.
I can reproduce this and collect more data if needed.
-andy
On Thu, Dec 15, 2005 at 06:09:03PM -0800, Andy Isaacson wrote:
> I did have one concerning failure during an earlier test, though - I
> have only 512MB of swap with 1.25GB RAM. Now obviously if there are
> >500MB user pages, swsusp must fail; but I think I had a failure even
> though there was only about 400MB user pages - some of it was swapped
> out, and I had image_size=500, which resulted in an image that would not
> fit into the available swap space.
>
> It sucks to fail the suspend just because the chosen image size didn't
> fit into the current state of the machine. Now obviously I need to
> resize my swap partition to prevent this from happening, but it would
> be nice if swsusp would automatically "try harder!" if appropriate.
This is almost trivially solvable from userspace (not tested, beware :-):
- check the return code of your write() to /sys/power/state
- if it is ENOMEM (better look into the kernel code if this is what is
actually reported...), then write "0" to image_size and try again.
or (not as sophisticated, and i am not sure if the paths are all correct):
----
#!/bin/sh
echo 150 > /sys/power/image_size
echo disk > /sys/power/state
if [ $? -ne 0 ]; then
echo 0 > /sys/power/image_size
echo disk > /sys/power/state
fi
----
this will retry on any error (e.g. process not stopped, no swap space
at all, device refused to suspend...) not only on ENOMEM, but echo
unfortunately does not return the error code, only success or failure.
Easy solution would be a small perl or C program.
I am not convinced that this should be handled in the kernel.
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, N?rnberg \ -- Leonard Cohen
On Fri, Dec 16, 2005 at 11:16:23AM +0100, Stefan Seyfried wrote:
> This is almost trivially solvable from userspace (not tested, beware :-):
> - check the return code of your write() to /sys/power/state
> - if it is ENOMEM (better look into the kernel code if this is what is
> actually reported...), then write "0" to image_size and try again.
>
> or (not as sophisticated, and i am not sure if the paths are all correct):
> ----
> #!/bin/sh
> echo 150 > /sys/power/image_size
> echo disk > /sys/power/state
> if [ $? -ne 0 ]; then
> echo 0 > /sys/power/image_size
> echo disk > /sys/power/state
> fi
> ----
> this will retry on any error (e.g. process not stopped, no swap space
> at all, device refused to suspend...) not only on ENOMEM, but echo
> unfortunately does not return the error code, only success or failure.
> Easy solution would be a small perl or C program.
>
> I am not convinced that this should be handled in the kernel.
I do not see a horrific logical problem here, given that the maximum
desired image size is the minimum of max_image_size and free swap space
available. The main question is the one of implementation, though.
Yours,
Chris
Hi,
On Friday, 16 December 2005 15:26, Christian Trefzer wrote:
> On Fri, Dec 16, 2005 at 11:16:23AM +0100, Stefan Seyfried wrote:
> > This is almost trivially solvable from userspace (not tested, beware :-):
> > - check the return code of your write() to /sys/power/state
> > - if it is ENOMEM (better look into the kernel code if this is what is
> > actually reported...), then write "0" to image_size and try again.
> >
> > or (not as sophisticated, and i am not sure if the paths are all correct):
> > ----
> > #!/bin/sh
> > echo 150 > /sys/power/image_size
> > echo disk > /sys/power/state
> > if [ $? -ne 0 ]; then
> > echo 0 > /sys/power/image_size
> > echo disk > /sys/power/state
> > fi
> > ----
> > this will retry on any error (e.g. process not stopped, no swap space
> > at all, device refused to suspend...) not only on ENOMEM, but echo
> > unfortunately does not return the error code, only success or failure.
> > Easy solution would be a small perl or C program.
> >
> > I am not convinced that this should be handled in the kernel.
>
> I do not see a horrific logical problem here, given that the maximum
> desired image size is the minimum of max_image_size and free swap space
> available. The main question is the one of implementation, though.
The problem is the free swap space is not constant as long as you are
trying to free more RAM, because some pages can get swapped out in the
process at any time.
To handle this properly we would have to count the amount of free swap in
every iteration of the loop in swsusp_shrink_memory(), but I wouldn't like
to make this function swap-dependent.
We are going to move the image-writing and reading functionality of swsusp
to the user space anyway and the userspace process controlling the suspend
will solve this problem. For now, please use workarounds like the Stefan's
one.
Greetings,
Rafael
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
On Fri, Dec 16, 2005 at 07:08:56PM +0100, Rafael J. Wysocki wrote:
> Hi,
>
> On Friday, 16 December 2005 15:26, Christian Trefzer wrote:
> > The problem is the free swap space is not constant as long as you are
> > trying to free more RAM, because some pages can get swapped out in the
> > process at any time.
>
> To handle this properly we would have to count the amount of free swap in
> every iteration of the loop in swsusp_shrink_memory(), but I wouldn't like
> to make this function swap-dependent.
Well, I did not quite see that. Renders the problem a lot less trivial.
More on that later.
> We are going to move the image-writing and reading functionality of swsusp
> to the user space anyway and the userspace process controlling the suspend
> will solve this problem. For now, please use workarounds like the Stefan's
> one.
In the long term, if I am correctly informed, everything should be
controlled by userspace, so it seems to me that some current problems
will be solved along the way. I am not looking for a workaround, though.
Instead I wanted to ask if there was no simple way to keep swsusp from
failing in a self-contained manner, i.e. without sneaky workarounds.
What occured to me while writing my first mail was that maybe it is not
trivial at all to determine the remaining swap space, as you already
confirmed.
My point is, that the generic algorithm to determine the maximum image
size desired by the user will, as a whole, remain the same, whether
implemented in kernel or user space. Unfortunately, I am very unfamiliar
with swsusp code and all of its implications wrt. drivers etc. - this
still holds true after looking at swsusp_shrink_memory().
But generally speaking: at some point, memory is freed for the image
creation process. Recent efforts towards tunable max_image_size make it
seem to me that we likely know beforehand, how much we are going to
free. Now say we have max_image_size of 500MB, but we know that active
swaps will hold 400MB at max. So we start out freeing enough memory for
a 400MB image, and once we're done, we notice we can only save 350MB.
Does anything keep us from reducing the image size to 350MB right now?
Apologies for wasting your time, I am trying to get rid of my ignorance.
Yours,
Chris
Hi,
On Friday, 16 December 2005 20:29, Christian Trefzer wrote:
> On Fri, Dec 16, 2005 at 07:08:56PM +0100, Rafael J. Wysocki wrote:
> > On Friday, 16 December 2005 15:26, Christian Trefzer wrote:
> > > The problem is the free swap space is not constant as long as you are
> > > trying to free more RAM, because some pages can get swapped out in the
> > > process at any time.
> >
> > To handle this properly we would have to count the amount of free swap in
> > every iteration of the loop in swsusp_shrink_memory(), but I wouldn't like
> > to make this function swap-dependent.
>
> Well, I did not quite see that. Renders the problem a lot less trivial.
> More on that later.
>
>
> > We are going to move the image-writing and reading functionality of swsusp
> > to the user space anyway and the userspace process controlling the suspend
> > will solve this problem. For now, please use workarounds like the Stefan's
> > one.
>
> In the long term, if I am correctly informed, everything should be
> controlled by userspace, so it seems to me that some current problems
> will be solved along the way. I am not looking for a workaround, though.
> Instead I wanted to ask if there was no simple way to keep swsusp from
> failing in a self-contained manner, i.e. without sneaky workarounds.
> What occured to me while writing my first mail was that maybe it is not
> trivial at all to determine the remaining swap space, as you already
> confirmed.
>
> My point is, that the generic algorithm to determine the maximum image
> size desired by the user will, as a whole, remain the same, whether
> implemented in kernel or user space. Unfortunately, I am very unfamiliar
> with swsusp code and all of its implications wrt. drivers etc. - this
> still holds true after looking at swsusp_shrink_memory().
>
> But generally speaking: at some point, memory is freed for the image
> creation process. Recent efforts towards tunable max_image_size make it
> seem to me that we likely know beforehand, how much we are going to
> free. Now say we have max_image_size of 500MB, but we know that active
> swaps will hold 400MB at max. So we start out freeing enough memory for
> a 400MB image, and once we're done, we notice we can only save 350MB.
> Does anything keep us from reducing the image size to 350MB right now?
Basically nothing and it could be implemented, but IMO it would be a step
in a wrong direction.
Namely, we want the image-writing code to go to the user space in the long
run. Then, the whole suspend will be handled by a userspace process and the
kernel will not know what kind of storage this process is using (it may or
may not be a swap partition). Consequently, the kernel will not know how
much space there is for the image, _unless_ the user space process tells it.
The user space process can do this _once_, using /sys/power/image_size
before it asks the kernel to create the image. Next, the kernel creates
the image and swsusp_shrink_memory() is called in the process. Of course
at this point the amount of available storage can change, but to know it
the kernel would have to ask the userspace process if that happened.
Generally speaking it can't do this, so it "returns" the image to the
userspace process which should verify if the image is small enough
and if not, repeat from writing /sys/power/image_size with a smaller
number or just fail (if so configured).
> Apologies for wasting your time, I am trying to get rid of my ignorance.
It's not a waste of time at all. :-) IMO the issue is definitely worth
discussing.
Greetings,
Rafael
--
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin
Hello there,
On Fri, Dec 16, 2005 at 10:09:52PM +0100, Rafael J. Wysocki wrote:
> On Friday, 16 December 2005 20:29, Christian Trefzer wrote:
> > My point is, that the generic algorithm to determine the maximum image
> > size desired by the user will, as a whole, remain the same, whether
> > implemented in kernel or user space. Unfortunately, I am very unfamiliar
> > with swsusp code and all of its implications wrt. drivers etc. - this
> > still holds true after looking at swsusp_shrink_memory().
> >
> > But generally speaking: at some point, memory is freed for the image
> > creation process. Recent efforts towards tunable max_image_size make it
> > seem to me that we likely know beforehand, how much we are going to
> > free. Now say we have max_image_size of 500MB, but we know that active
> > swaps will hold 400MB at max. So we start out freeing enough memory for
> > a 400MB image, and once we're done, we notice we can only save 350MB.
> > Does anything keep us from reducing the image size to 350MB right now?
>
> Basically nothing and it could be implemented, but IMO it would be a step
> in a wrong direction.
>
> Namely, we want the image-writing code to go to the user space in the long
> run. Then, the whole suspend will be handled by a userspace process and the
> kernel will not know what kind of storage this process is using (it may or
> may not be a swap partition). Consequently, the kernel will not know how
> much space there is for the image, _unless_ the user space process tells it.
So far everything seems clear to me.
> The user space process can do this _once_, using /sys/power/image_size
> before it asks the kernel to create the image. Next, the kernel creates
> the image and swsusp_shrink_memory() is called in the process. Of course
> at this point the amount of available storage can change, but to know it
> the kernel would have to ask the userspace process if that happened.
> Generally speaking it can't do this, so it "returns" the image to the
> userspace process which should verify if the image is small enough
> and if not, repeat from writing /sys/power/image_size with a smaller
> number or just fail (if so configured).
I see now - when I was thinking about uswsusp, I wrongly guessed that
image creation will happen in the same region (user vs. kernel space) as
the checks for available storage. We can't really let image_size
converge towards free_storage with an upper limit of max_image_size
anyway, yet the silent retry with a smaller image size will eventually
be an option, although configurable via the userland utility. The latter
was what I intended, for checking available storage in every loop
iteration would indeed not be very efficient. Nice one!
Just a spontaneous idea: before we smash memory free with a sledge
hammer (no offense intended), would it be sensible to create sort of a
map of actual memory usage, determining the importance (kernel/user
apps/buffers) and relevance (frequently/recently/not used) of pages?
With a long-term target of a responsive post-resume system in mind, it
seems desirable to me to restore as much information as possible.
Putting pressure on the VM will (afaik) free buffers first, but swap out
applications later. If i am not misunderstood once more, that means
swsusp will mostly trash some cache, at least while freeing memory, yet
not push applications out to swaps. Obviously, the process of suspending
and resuming will require some memory by itself, and this amount will
grow the more functionality is added.
I hereby assume that the "underlying" functionality of reserving space
for suspend and resume routines, as well as "freezing everything" for
image creation and "restoring everything", basically "just works". That
means, from my naive point of view, that "time has stopped" wrt. the
"running" kernel and applications, leaving alive only the means to write
the image and swaps. Now it is "only" a matter of what to write to the
image, what to swap out, and what to forget.
Basically, importance of pages is pretty boolean - there are pages we
can throw away without harm, and there are those we can't. Then again,
some of the latter can be swapped if necessary, but a reluctance to do
so would improve system responsiveness post-resume. Unused application
data will eventually be swapped out already by the time we want to
suspend, yet with today's common memory size and current kernel
versions, this assumption becomes less and less valid. Therefore, I
assume again that application data is mostly paged in by the time
suspend is triggered.
So far my picture of the situation - any misunderstandings in there
might flaw the following suggestions.
My basic idea was to create the image incrementally, using a variable
slice size, in which important pages come before unimportant ones and
hot pages before cold ones. I bet the kernel/uswsusp interface for that
mechanism can become arbitrarily complex, but to keep it simple, would
it be possible to request one image slice at a time? Say, request and
write to the image all pages related to kernel space, then user apps,
then caches, ordered by relevance ("temperature").
* Looking at a case of extreme storage constraints:
If not even kernel and userland can be saved during susped, we fail.
If the image is stored anywhere but in swaps, we can try to swap out
pages [1] - once all the necessary areas are safe, we are as well, else
we fail. Caches can be forgotten if necessary, but are nice to have
- keep as much as possible/desired.
* In case of huge storage amounts for suspend:
Save the kernel first, then userland, then caches until we reach a
user-defined limit. The ordering of the latter two could be mixed
slice-wise according to page relevance, forcing some LRU/LFU
application pages out to swap in favour of keeping larger caches
[1].
* Anywhere inbetween / basic algorithm:
- Create a map of memory usage.
- Group data by importance: kernel | userland | buffers/caches
- Group equivalently relevant ("hot") pages in slices. Use
histogram-like maths to determine slice boundaries and stepping. [2]
- Determine the ordering of slices by weight, with a threshold of
importance vs. relevance. [3]
- Request and store one slice at a time:
- Slice is unswappable and thus must be stored in the suspend
image. Lack of storage implies suspend failure.
- Slice does not fit into suspend image, but is swappable [1].
Lack of swap space leads to suspend failure. Otherwise, we
instruct the kernel to swap out the pages contained within that
specific slice. The fact that these pages are swapped out must
be known to the VM before resuming the frozen kernel [4].
- Slice contains buffer data which can be silently dropped, yet
the fact that these buffers are gone must also make its way into
the VM state before we let the frozen kernel continue where it
has left off [4]
[1] Aspects of storing pages in swap:
A basic approach to guarantee integrity would be to make all
important pages backed by swap. This is contraproductive wrt. swap
as storage for images which can be compressed in the long run, so
that is a no-go.
It would also require swap to be at least as large as the amount of
memory currently occupied by applications, which won't necessarily
be the case, esp. if the suspend image will be written to a
dedicated partition or even an ordinary file. My conclusion is that
swapping out a lot of pages is only feasible in case the suspend
image is stored "out-of-swap".
[2] Thoughts on histograms:
Small slices induce more overhead, large slices are more likely to
exceed the maximum possible/desired image size. It therefore appears
sensible to me to stick to dynamically sized slices, where
horizontal size (amount of memory) and vertical size (relevance
range) should represent a "significant" histogram.
That said, we would end up having a suitable number of slices, with
each bearing a sane amount of data, esp. wrt. cache slices being
stored or dropped. I yet have to draw some draft charts and graphs
to visualize some relations between different sizes and
thresholds...
[3] Page weight / coloring:
Looking at [1], we can skip coloring of important pages if the image
is written to swaps as they will all need to be stored. In this
case, we can as well treat kernel space and userland as one slice
each, or even squash them together into one.
In case of external suspend image and buffer pages, we could color
pages according to their relevance, and after sorting them we have a
gradient from, say, red (hot) to blue (cold) pages. Using some sane
histograph techniques [2] we can now slice that up, ending up with,
well, slices we can treat "atomically".
[4] Changing VM state after freezing it:
This occured to me pretty late during this draft and is, of course,
crucial to all of it ; ) Murphy rules. So anyway, if we kind of
"tar" our memory up, one slice at a time, how can we tell the frozen
kernel's VM, "Hey, by the way, there were some pages of cache over
there, but we decided to drop them during suspend!" Even more
complex at first sight appears to me the treatment of pages swapped
out due to lack of suspend storage, external suspend image presumed.
This would require the already written kernel slices to be re-made.
After all, the one-slice-at-a-time approach is a great
simplification for illustrating the line of thought, but not for
writing a consistent image. The whole point of slicing the image up
is a calculation of what pages to store where (or at all), given an
arbitrary configuration wrt. memory, available/desired storage/swap
and throughput constraints. So instead of writing slices
immediately, we take everything into consideration to achieve an
optimum result, and once it is clear what shall be done, it is done.
The one big problem I see here is the eventual increase in memory
requirements to do all the maths, and potentially the need for free
space to create an atomic copy of some special pages. Also, "late
swapping" of slices stands in opposition to a totally frozen state
of the previously running environment. We therefore need to
determine which pages to swap out, order the VM system to do so
(without clearing out buffers), take into account all changes caused
by these actions and re-iterate/backtrack as few steps as necessary
to act appropriately. If we can rely on a partially frozen kernel to
only touch the pages it is told to swap out, the only further
changes should be visible within the page tables. Complexity results
only if such changes can shift slice boundaries, leaving at least
one question open to the VM gods: is this possible, and how
predictable is the resulting beaviour?
Well, so much for "quick" brainstorming on the issue... Don't bother
flaming me for any misunderstanding or misconception : )
> > Apologies for wasting your time, I am trying to get rid of my ignorance.
>
> It's not a waste of time at all. :-) IMO the issue is definitely worth
> discussing.
Thanks for clearing my conscience, yet I am afraid I just stained it
even more : )
Yours,
Chris
> Well, so much for "quick" brainstorming on the issue... Don't bother
> flaming me for any misunderstanding or misconception : )
That's empty reply for you, then.
Pavel
[mm layer already discards most usefull pages last, so Rafael's
patches do the right thing]
--
Thanks, Sharp!
On Sun, Dec 18, 2005 at 06:44:53PM +0100, Pavel Machek wrote:
> > Well, so much for "quick" brainstorming on the issue... Don't bother
> > flaming me for any misunderstanding or misconception : )
>
> That's empty reply for you, then.
> Pavel
Argh. Looks like I said the opposite of what I was trying to express.
Corrections are of course warmly welcome, even if the tone might not
exactly be nice.
> [mm layer already discards most usefull pages last, so Rafael's
> patches do the right thing]
So I wasted my time and will now quit wasting yours. Thanks anyway : )
Chris