2006-08-09 10:16:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH -mm 0/5] swsusp: Fix handling of highmem

Hi,

Currently swsusp handles highmem in a simplistic way, by trying to store a
copy of each saveable highmem page in the "normal" memory before creating
the suspend image. These copies are then copied once again before saving,
because they are treated as any other non-highmem pages with data. For this
reason, to save one highmem page swsusp needs two free pages in the "normal"
memory, so on a system with high memory it is practically impossible to create
a suspend image above 400 kilobytes. Moreover, if there's much more highmem
than the "normal" memory in the system, it may be impossible to suspend at all
due to the lack of space for saving non-freeable highmem pages.

This limitation may be overcome in a satisfactory way if swsusp does its best
to store the copies of saveable highmem pages in the highmem itself. However,
for this purpose swsusp has to be taught to use pfns, or (struct page *)
pointers, instead of kernel virtual addresses to identify memory pages.
Yet, if this is to be implemented, we can also attack the minor problem that
the current swsusp's internal data structure, the list of page backup entries
(aka PBEs), is not very efficient as far as the memory usage is concerned.

This issue can be addressed by replacing the list of PBEs with a pair of
memory bitmaps. Still, to remove the list of PBEs completely, we would
have to make some complicated modifications to the architecture-dependent
parts of the code which would be quite undesirable. However, we can use
the observation that memory is only a limiting resource during the suspend
phase of the suspend-resume cycle, because during the resume phase
many image pages may be loaded directly to their "original" page frames, so
we don't need to keep a copy of each of them in the memory. Thus the list of
PBEs may be used safely in the last part of the resume phase without limitting
the amount of memory we can use.

The following series of patches introduces the memory bitmap data structure,
makes swsusp use it to store its internal information and implements the
improved handling of saveable highmem pages.

Comments welcome.

Greetings,
Rafael


2006-08-09 10:16:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH -mm 4/5] swsusp: Change the name of pagedir_nosave

The name of the pagedir_nosave variable does not make sense any more, so it
seems reasonable to change it to something more meaningful.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/i386/power/swsusp.S | 2 +-
arch/powerpc/kernel/swsusp_32.S | 4 ++--
arch/x86_64/kernel/suspend_asm.S | 2 +-
kernel/power/power.h | 2 --
kernel/power/snapshot.c | 15 +++++++++------
5 files changed, 13 insertions(+), 12 deletions(-)

Index: linux-2.6.18-rc3-mm2/arch/i386/power/swsusp.S
===================================================================
--- linux-2.6.18-rc3-mm2.orig/arch/i386/power/swsusp.S
+++ linux-2.6.18-rc3-mm2/arch/i386/power/swsusp.S
@@ -32,7 +32,7 @@ ENTRY(swsusp_arch_resume)
movl $swsusp_pg_dir-__PAGE_OFFSET, %ecx
movl %ecx, %cr3

- movl pagedir_nosave, %edx
+ movl restore_pblist, %edx
.p2align 4,,7

copy_loop:
Index: linux-2.6.18-rc3-mm2/arch/powerpc/kernel/swsusp_32.S
===================================================================
--- linux-2.6.18-rc3-mm2.orig/arch/powerpc/kernel/swsusp_32.S
+++ linux-2.6.18-rc3-mm2/arch/powerpc/kernel/swsusp_32.S
@@ -159,8 +159,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
isync

/* Load ptr the list of pages to copy in r3 */
- lis r11,(pagedir_nosave - KERNELBASE)@h
- ori r11,r11,pagedir_nosave@l
+ lis r11,(restore_pblist - KERNELBASE)@h
+ ori r11,r11,restore_pblist@l
lwz r10,0(r11)

/* Copy the pages. This is a very basic implementation, to
Index: linux-2.6.18-rc3-mm2/arch/x86_64/kernel/suspend_asm.S
===================================================================
--- linux-2.6.18-rc3-mm2.orig/arch/x86_64/kernel/suspend_asm.S
+++ linux-2.6.18-rc3-mm2/arch/x86_64/kernel/suspend_asm.S
@@ -54,7 +54,7 @@ ENTRY(restore_image)
movq %rcx, %cr3;
movq %rax, %cr4; # turn PGE back on

- movq pagedir_nosave(%rip), %rdx
+ movq restore_pblist(%rip), %rdx
loop:
testq %rdx, %rdx
jz done
Index: linux-2.6.18-rc3-mm2/kernel/power/power.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/power.h
+++ linux-2.6.18-rc3-mm2/kernel/power/power.h
@@ -38,8 +38,6 @@ extern struct subsystem power_subsys;
/* References to section boundaries */
extern const void __nosave_begin, __nosave_end;

-extern struct pbe *pagedir_nosave;
-
/* Preferred image size in bytes (default 500 MB) */
extern unsigned long image_size;
extern int in_suspend;
Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
+++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
@@ -38,8 +38,11 @@
* the suspend and included in the suspend image, but have also been
* allocated by the "resume" kernel, so their contents cannot be written
* directly to their "original" page frames.
+ *
+ * NOTE: It cannot be static, because it is used by the architecture-dependent
+ * atomic restore code.
*/
-struct pbe *pagedir_nosave;
+struct pbe *restore_pblist;

/* Pointer to an auxiliary buffer (1 page) */
static void *buffer;
@@ -824,7 +827,7 @@ void swsusp_free(void)
}
nr_copy_pages = 0;
nr_meta_pages = 0;
- pagedir_nosave = NULL;
+ restore_pblist = NULL;
buffer = NULL;
}

@@ -1203,7 +1206,7 @@ load_header(struct swsusp_info *info)
{
int error;

- pagedir_nosave = NULL;
+ restore_pblist = NULL;
error = check_header(info);
if (!error) {
nr_copy_pages = info->image_pages;
@@ -1562,8 +1565,8 @@ static void *get_buffer(struct memory_bi
pbe->orig_address = page_address(page);
pbe->address = safe_pages_list;
safe_pages_list = safe_pages_list->next;
- pbe->next = pagedir_nosave;
- pagedir_nosave = pbe;
+ pbe->next = restore_pblist;
+ restore_pblist = pbe;
return (void *)pbe->address;
}

@@ -1628,7 +1631,7 @@ int snapshot_write_next(struct snapshot_

chain_init(&ca, GFP_ATOMIC, 1);
bm_position_reset(&orig_bm);
- pagedir_nosave = NULL;
+ restore_pblist = NULL;
handle->sync_read = 0;
handle->buffer = get_buffer(&orig_bm, &ca);
if (!handle->buffer)

2006-08-09 10:16:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Make swsusp use memory bitmaps to store its internal information during the
resume phase of the suspend-resume cycle.

If the pfns of saveable pages are saved during the suspend phase instead of
the kernel virtual addresses of these pages, we can use them during the resume
phase directly to set the corresponding bits in a memory bitmap. Then, this
bitmap is used to mark the page frames corresponding to the pages that were
saveable before the suspend (aka "unsafe" page frames).

Next, we allocate as many page frames as needed to store the entire suspend
image and make sure that there will be some extra free "safe" page frames for
the list of PBEs constructed later. Subsequently, the image is loaded and,
if possible, the data loaded from it are written into their "original" page frames
(ie. the ones they had occupied before the suspend). The image data that
cannot be written into their "original" page frames are loaded into "safe" page
frames and their "original" kernel virtual addresses, as well as the addresses
of the "safe" pages containing their copies, are stored in a list of PBEs.
Finally, the list of PBEs is used to copy the remaining image data into their
"original" page frames (this is done atomically, by the architecture-dependent
parts of swsusp).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
include/linux/suspend.h | 27 ---
kernel/power/power.h | 11 -
kernel/power/snapshot.c | 421 ++++++++++++++++++++----------------------------
kernel/power/swap.c | 4
kernel/power/user.c | 1
5 files changed, 195 insertions(+), 269 deletions(-)

Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
+++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
@@ -37,7 +37,7 @@
struct pbe *pagedir_nosave;
static unsigned int nr_copy_pages;
static unsigned int nr_meta_pages;
-static unsigned long *buffer;
+static void *buffer;

#ifdef CONFIG_HIGHMEM
unsigned int count_highmem_pages(void)
@@ -165,7 +165,7 @@ static inline int restore_highmem(void)
* and we count them using unsafe_pages
*/

-static unsigned int unsafe_pages;
+static unsigned int allocated_unsafe_pages;

static void *alloc_image_page(gfp_t gfp_mask, int safe_needed)
{
@@ -176,7 +176,7 @@ static void *alloc_image_page(gfp_t gfp_
while (res && PageNosaveFree(virt_to_page(res))) {
/* The page is unsafe, mark it for swsusp_free() */
SetPageNosave(virt_to_page(res));
- unsafe_pages++;
+ allocated_unsafe_pages++;
res = (void *)get_zeroed_page(gfp_mask);
}
if (res) {
@@ -765,101 +765,10 @@ copy_data_pages(struct memory_bitmap *co
}

/**
- * free_pagedir - free pages allocated with alloc_pagedir()
- */
-
-static void free_pagedir(struct pbe *pblist, int clear_nosave_free)
-{
- struct pbe *pbe;
-
- while (pblist) {
- pbe = (pblist + PB_PAGE_SKIP)->next;
- free_image_page(pblist, clear_nosave_free);
- pblist = pbe;
- }
-}
-
-/**
- * fill_pb_page - Create a list of PBEs on a given memory page
- */
-
-static inline void fill_pb_page(struct pbe *pbpage, unsigned int n)
-{
- struct pbe *p;
-
- p = pbpage;
- pbpage += n - 1;
- do
- p->next = p + 1;
- while (++p < pbpage);
-}
-
-/**
- * create_pbe_list - Create a list of PBEs on top of a given chain
- * of memory pages allocated with alloc_pagedir()
- *
- * This function assumes that pages allocated by alloc_image_page() will
- * always be zeroed.
- */
-
-static inline void create_pbe_list(struct pbe *pblist, unsigned int nr_pages)
-{
- struct pbe *pbpage;
- unsigned int num = PBES_PER_PAGE;
-
- for_each_pb_page (pbpage, pblist) {
- if (num >= nr_pages)
- break;
-
- fill_pb_page(pbpage, PBES_PER_PAGE);
- num += PBES_PER_PAGE;
- }
- if (pbpage) {
- num -= PBES_PER_PAGE;
- fill_pb_page(pbpage, nr_pages - num);
- }
-}
-
-/**
- * alloc_pagedir - Allocate the page directory.
- *
- * First, determine exactly how many pages we need and
- * allocate them.
+ * swsusp_free - free pages allocated for the suspend.
*
- * We arrange the pages in a chain: each page is an array of PBES_PER_PAGE
- * struct pbe elements (pbes) and the last element in the page points
- * to the next page.
- *
- * On each page we set up a list of struct_pbe elements.
- */
-
-static struct pbe *alloc_pagedir(unsigned int nr_pages, gfp_t gfp_mask,
- int safe_needed)
-{
- unsigned int num;
- struct pbe *pblist, *pbe;
-
- if (!nr_pages)
- return NULL;
-
- pblist = alloc_image_page(gfp_mask, safe_needed);
- pbe = pblist;
- for (num = PBES_PER_PAGE; num < nr_pages; num += PBES_PER_PAGE) {
- if (!pbe) {
- free_pagedir(pblist, 1);
- return NULL;
- }
- pbe += PB_PAGE_SKIP;
- pbe->next = alloc_image_page(gfp_mask, safe_needed);
- pbe = pbe->next;
- }
- create_pbe_list(pblist, nr_pages);
- return pblist;
-}
-
-/**
- * Free pages we allocated for suspend. Suspend pages are alocated
- * before atomic copy, so we need to free them after resume.
+ * Suspend pages are alocated before the atomic copy is made, so we
+ * need to release them after the resume.
*/

void swsusp_free(void)
@@ -897,14 +806,18 @@ void swsusp_free(void)
static int enough_free_mem(unsigned int nr_pages)
{
struct zone *zone;
- unsigned int n = 0;
+ unsigned int free = 0, meta = 0;

for_each_zone (zone)
- if (!is_highmem(zone))
- n += zone->free_pages;
- pr_debug("swsusp: available memory: %u pages\n", n);
- return n > (nr_pages + PAGES_FOR_IO +
- (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
+ if (!is_highmem(zone)) {
+ free += zone->free_pages;
+ meta += snapshot_additional_pages(zone);
+ }
+
+ pr_debug("swsusp: pages needed: %u + %u + %u, available pages: %u\n",
+ nr_pages, PAGES_FOR_IO, meta, free);
+
+ return free > nr_pages + PAGES_FOR_IO + meta;
}

static int
@@ -954,11 +867,6 @@ asmlinkage int swsusp_save(void)
nr_pages = count_data_pages();
printk("swsusp: Need to copy %u pages\n", nr_pages);

- pr_debug("swsusp: pages needed: %u + %lu + %u, free: %u\n",
- nr_pages,
- (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
- PAGES_FOR_IO, nr_free_pages());
-
if (!enough_free_mem(nr_pages)) {
printk(KERN_ERR "swsusp: Not enough free memory\n");
return -ENOMEM;
@@ -1000,22 +908,19 @@ static void init_header(struct swsusp_in
}

/**
- * pack_addresses - the addresses corresponding to pfns found in the
- * bitmap @bm are stored in the array @buf[] (1 page)
+ * pack_pfns - pfns corresponding to the set bits found in the bitmap @bm
+ * are stored in the array @buf[] (1 page at a time)
*/

static inline void
-pack_addresses(unsigned long *buf, struct memory_bitmap *bm)
+pack_pfns(unsigned long *buf, struct memory_bitmap *bm)
{
int j;

for (j = 0; j < PAGE_SIZE / sizeof(long); j++) {
- unsigned long pfn = memory_bm_next_pfn(bm);
-
- if (unlikely(pfn == BM_END_OF_MAP))
+ buf[j] = memory_bm_next_pfn(bm);
+ if (unlikely(buf[j] == BM_END_OF_MAP))
break;
-
- buf[j] = (unsigned long)page_address(pfn_to_page(pfn));
}
}

@@ -1061,7 +966,7 @@ int snapshot_read_next(struct snapshot_h
if (handle->prev < handle->cur) {
if (handle->cur <= nr_meta_pages) {
memset(buffer, 0, PAGE_SIZE);
- pack_addresses(buffer, &orig_bm);
+ pack_pfns(buffer, &orig_bm);
} else {
unsigned long pfn = memory_bm_next_pfn(&copy_bm);

@@ -1087,14 +992,10 @@ int snapshot_read_next(struct snapshot_h
* had been used before suspend
*/

-static int mark_unsafe_pages(struct pbe *pblist)
+static int mark_unsafe_pages(struct memory_bitmap *bm)
{
struct zone *zone;
unsigned long pfn, max_zone_pfn;
- struct pbe *p;
-
- if (!pblist) /* a sanity check */
- return -EINVAL;

/* Clear page flags */
for_each_zone (zone) {
@@ -1104,30 +1005,37 @@ static int mark_unsafe_pages(struct pbe
ClearPageNosaveFree(pfn_to_page(pfn));
}

- /* Mark orig addresses */
- for_each_pbe (p, pblist) {
- if (virt_addr_valid(p->orig_address))
- SetPageNosaveFree(virt_to_page(p->orig_address));
- else
- return -EFAULT;
- }
+ /* Mark pages that correspond to the "original" pfns as "unsafe" */
+ bm_position_reset(bm);
+ do {
+ pfn = memory_bm_next_pfn(bm);
+ if (likely(pfn != BM_END_OF_MAP)) {
+ if (likely(pfn_valid(pfn)))
+ SetPageNosaveFree(pfn_to_page(pfn));
+ else
+ return -EFAULT;
+ }
+ } while (pfn != BM_END_OF_MAP);

- unsafe_pages = 0;
+ allocated_unsafe_pages = 0;

return 0;
}

-static void copy_page_backup_list(struct pbe *dst, struct pbe *src)
+static void
+duplicate_memory_bitmap(struct memory_bitmap *dst, struct memory_bitmap *src)
{
- /* We assume both lists contain the same number of elements */
- while (src) {
- dst->orig_address = src->orig_address;
- dst = dst->next;
- src = src->next;
+ unsigned long pfn;
+
+ bm_position_reset(src);
+ pfn = memory_bm_next_pfn(src);
+ while (pfn != BM_END_OF_MAP) {
+ memory_bm_set_bit(dst, pfn);
+ pfn = memory_bm_next_pfn(src);
}
}

-static int check_header(struct swsusp_info *info)
+static inline int check_header(struct swsusp_info *info)
{
char *reason = NULL;

@@ -1154,19 +1062,14 @@ static int check_header(struct swsusp_in
* load header - check the image header and copy data from it
*/

-static int load_header(struct snapshot_handle *handle,
- struct swsusp_info *info)
+static int
+load_header(struct swsusp_info *info)
{
int error;
- struct pbe *pblist;

+ pagedir_nosave = NULL;
error = check_header(info);
if (!error) {
- pblist = alloc_pagedir(info->image_pages, GFP_ATOMIC, 0);
- if (!pblist)
- return -ENOMEM;
- pagedir_nosave = pblist;
- handle->pbe = pblist;
nr_copy_pages = info->image_pages;
nr_meta_pages = info->pages - info->image_pages - 1;
}
@@ -1174,108 +1077,137 @@ static int load_header(struct snapshot_h
}

/**
- * unpack_orig_addresses - copy the elements of @buf[] (1 page) to
- * the PBEs in the list starting at @pbe
+ * unpack_orig_pfns - for each element of @buf[] (1 page at a time) set
+ * the corresponding bit in the memory bitmap @bm
*/

-static inline struct pbe *unpack_orig_addresses(unsigned long *buf,
- struct pbe *pbe)
+static inline void
+unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
{
int j;

- for (j = 0; j < PAGE_SIZE / sizeof(long) && pbe; j++) {
- pbe->orig_address = buf[j];
- pbe = pbe->next;
+ for (j = 0; j < PAGE_SIZE / sizeof(long); j++) {
+ if (unlikely(buf[j] == BM_END_OF_MAP))
+ break;
+
+ memory_bm_set_bit(bm, buf[j]);
}
- return pbe;
}

/**
- * prepare_image - use metadata contained in the PBE list
- * pointed to by pagedir_nosave to mark the pages that will
- * be overwritten in the process of restoring the system
- * memory state from the image ("unsafe" pages) and allocate
- * memory for the image
+ * prepare_image - use the memory bitmap @bm to mark the pages that will
+ * be overwritten in the process of restoring the system memory state
+ * from the suspend image ("unsafe" pages) and allocate memory for the
+ * image.
*
- * The idea is to allocate the PBE list first and then
- * allocate as many pages as it's needed for the image data,
- * but not to assign these pages to the PBEs initially.
- * Instead, we just mark them as allocated and create a list
- * of "safe" which will be used later
+ * The idea is to allocate a new memory bitmap first and then allocate
+ * as many pages as needed for the image data, but not to assign these
+ * pages to specific tasks initially. Instead, we just mark them as
+ * allocated and create a list of "safe" pages that will be used later.
*/

-static struct linked_page *safe_pages;
+#define PBES_PER_LINKED_PAGE (LINKED_PAGE_DATA_SIZE / sizeof(struct pbe))

-static int prepare_image(struct snapshot_handle *handle)
+static struct linked_page *safe_pages_list;
+
+static int
+prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
{
- int error = 0;
- unsigned int nr_pages = nr_copy_pages;
- struct pbe *p, *pblist = NULL;
+ unsigned int nr_pages;
+ struct linked_page *sp_list, *lp;
+ int error;

- p = pagedir_nosave;
- error = mark_unsafe_pages(p);
- if (!error) {
- pblist = alloc_pagedir(nr_pages, GFP_ATOMIC, 1);
- if (pblist)
- copy_page_backup_list(pblist, p);
- free_pagedir(p, 0);
- if (!pblist)
+ error = mark_unsafe_pages(bm);
+ if (error)
+ goto Free;
+
+ error = create_memory_bm(new_bm, GFP_ATOMIC, 1);
+ if (error)
+ goto Free;
+
+ duplicate_memory_bitmap(new_bm, bm);
+ free_memory_bm(bm, 0);
+ /* Reserve some safe pages for potential later use.
+ *
+ * NOTE: This way we make sure there will be enough safe pages for the
+ * chain_alloc() in get_buffer(). It is a bit wasteful, but
+ * nr_copy_pages cannot be greater than 50% of the memory anyway.
+ */
+ sp_list = NULL;
+ /* nr_copy_pages cannot be lesser than allocated_unsafe_pages */
+ nr_pages = nr_copy_pages - allocated_unsafe_pages;
+ nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE);
+ while (nr_pages > 0) {
+ lp = alloc_image_page(GFP_ATOMIC, 1);
+ if (!lp) {
error = -ENOMEM;
- }
- safe_pages = NULL;
- if (!error && nr_pages > unsafe_pages) {
- nr_pages -= unsafe_pages;
- while (nr_pages--) {
- struct linked_page *ptr;
-
- ptr = (void *)get_zeroed_page(GFP_ATOMIC);
- if (!ptr) {
- error = -ENOMEM;
- break;
- }
- if (!PageNosaveFree(virt_to_page(ptr))) {
- /* The page is "safe", add it to the list */
- ptr->next = safe_pages;
- safe_pages = ptr;
- }
- /* Mark the page as allocated */
- SetPageNosave(virt_to_page(ptr));
- SetPageNosaveFree(virt_to_page(ptr));
+ goto Free;
}
+ lp->next = sp_list;
+ sp_list = lp;
+ nr_pages--;
+ }
+ /* Preallocate memory for the image */
+ safe_pages_list = NULL;
+ nr_pages = nr_copy_pages - allocated_unsafe_pages;
+ while (nr_pages > 0) {
+ lp = (struct linked_page *)get_zeroed_page(GFP_ATOMIC);
+ if (!lp) {
+ error = -ENOMEM;
+ goto Free;
+ }
+ if (!PageNosaveFree(virt_to_page(lp))) {
+ /* The page is "safe", add it to the list */
+ lp->next = safe_pages_list;
+ safe_pages_list = lp;
+ }
+ /* Mark the page as allocated */
+ SetPageNosave(virt_to_page(lp));
+ SetPageNosaveFree(virt_to_page(lp));
+ nr_pages--;
+ }
+ /* Free the reserved safe pages so that chain_alloc() can use them */
+ while (sp_list) {
+ lp = sp_list->next;
+ free_image_page(sp_list, 1);
+ sp_list = lp;
}
- if (!error) {
- pagedir_nosave = pblist;
- } else {
- handle->pbe = NULL;
- swsusp_free();
- }
+ return 0;
+
+Free:
+ swsusp_free();
return error;
}

-static void *get_buffer(struct snapshot_handle *handle)
+/**
+ * get_buffer - compute the address that snapshot_write_next() should
+ * set for its caller to write to.
+ */
+
+static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
{
- struct pbe *pbe = handle->pbe, *last = handle->last_pbe;
- struct page *page = virt_to_page(pbe->orig_address);
+ struct pbe *pbe;
+ struct page *page = pfn_to_page(memory_bm_next_pfn(bm));

- if (PageNosave(page) && PageNosaveFree(page)) {
- /*
- * We have allocated the "original" page frame and we can
- * use it directly to store the read page
+ if (PageNosave(page) && PageNosaveFree(page))
+ /* We have allocated the "original" page frame and we can
+ * use it directly to store the loaded page.
*/
- pbe->address = 0;
- if (last && last->next)
- last->next = NULL;
- return (void *)pbe->orig_address;
- }
- /*
- * The "original" page frame has not been allocated and we have to
- * use a "safe" page frame to store the read page
+ return page_address(page);
+
+ /* The "original" page frame has not been allocated and we have to
+ * use a "safe" page frame to store the loaded page.
*/
- pbe->address = (unsigned long)safe_pages;
- safe_pages = safe_pages->next;
- if (last)
- last->next = pbe;
- handle->last_pbe = pbe;
+ pbe = chain_alloc(ca, sizeof(struct pbe));
+ if (!pbe) {
+ swsusp_free();
+ return NULL;
+ }
+ pbe->orig_address = (unsigned long)page_address(page);
+ pbe->address = (unsigned long)safe_pages_list;
+ safe_pages_list = safe_pages_list->next;
+ pbe->next = pagedir_nosave;
+ pagedir_nosave = pbe;
return (void *)pbe->address;
}

@@ -1303,10 +1235,13 @@ static void *get_buffer(struct snapshot_

int snapshot_write_next(struct snapshot_handle *handle, size_t count)
{
+ static struct chain_allocator ca;
int error = 0;

+ /* Check if we have already loaded the entire image */
if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages)
return 0;
+
if (!buffer) {
/* This makes the buffer be freed by swsusp_free() */
buffer = alloc_image_page(GFP_ATOMIC, 0);
@@ -1317,27 +1252,32 @@ int snapshot_write_next(struct snapshot_
handle->buffer = buffer;
handle->sync_read = 1;
if (handle->prev < handle->cur) {
- if (!handle->prev) {
- error = load_header(handle,
- (struct swsusp_info *)buffer);
+ if (handle->prev == 0) {
+ error = load_header(buffer);
+ if (error)
+ return error;
+
+ error = create_memory_bm(&copy_bm, GFP_ATOMIC, 0);
if (error)
return error;
+
} else if (handle->prev <= nr_meta_pages) {
- handle->pbe = unpack_orig_addresses(buffer,
- handle->pbe);
- if (!handle->pbe) {
- error = prepare_image(handle);
+ unpack_orig_pfns(buffer, &copy_bm);
+ if (handle->prev == nr_meta_pages) {
+ error = prepare_image(&orig_bm, &copy_bm);
if (error)
return error;
- handle->pbe = pagedir_nosave;
- handle->last_pbe = NULL;
- handle->buffer = get_buffer(handle);
+
+ chain_init(&ca, GFP_ATOMIC, 1);
+ bm_position_reset(&orig_bm);
+ pagedir_nosave = NULL;
handle->sync_read = 0;
+ handle->buffer = get_buffer(&orig_bm, &ca);
+ if (!handle->buffer)
+ return -ENOMEM;
}
} else {
- handle->pbe = handle->pbe->next;
- handle->buffer = get_buffer(handle);
- handle->sync_read = 0;
+ handle->buffer = get_buffer(&orig_bm, &ca);
}
handle->prev = handle->cur;
}
@@ -1355,6 +1295,13 @@ int snapshot_write_next(struct snapshot_

int snapshot_image_loaded(struct snapshot_handle *handle)
{
- return !(!handle->pbe || handle->pbe->next || !nr_copy_pages ||
- handle->cur <= nr_meta_pages + nr_copy_pages);
+ return !(!nr_copy_pages ||
+ handle->cur <= nr_meta_pages + nr_copy_pages);
+}
+
+void snapshot_free_unused_memory(struct snapshot_handle *handle)
+{
+ /* Free only if we have loaded the image entirely */
+ if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages)
+ free_memory_bm(&orig_bm, 1);
}
Index: linux-2.6.18-rc3-mm2/kernel/power/power.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/power.h
+++ linux-2.6.18-rc3-mm2/kernel/power/power.h
@@ -83,16 +83,6 @@ struct snapshot_handle {
unsigned int prev; /* number of the block of PAGE_SIZE bytes that
* was the current one previously
*/
- struct pbe *pbe; /* PBE that corresponds to 'buffer' */
- struct pbe *last_pbe; /* When the image is restored (eg. read
- * from disk) we can store some image
- * data directly in the page frames
- * in which they were before suspend.
- * In such a case the PBEs that
- * correspond to them will be unused.
- * This is the last PBE, so far, that
- * does not correspond to such data.
- */
void *buffer; /* address of the block to read from
* or write to
*/
@@ -115,6 +105,7 @@ extern unsigned int snapshot_additional_
extern int snapshot_read_next(struct snapshot_handle *handle, size_t count);
extern int snapshot_write_next(struct snapshot_handle *handle, size_t count);
extern int snapshot_image_loaded(struct snapshot_handle *handle);
+extern void snapshot_free_unused_memory(struct snapshot_handle *handle);

#define SNAPSHOT_IOC_MAGIC '3'
#define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
Index: linux-2.6.18-rc3-mm2/kernel/power/swap.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/swap.c
+++ linux-2.6.18-rc3-mm2/kernel/power/swap.c
@@ -332,8 +332,7 @@ static int enough_swap(unsigned int nr_p
unsigned int free_swap = count_swap_pages(root_swap, 1);

pr_debug("swsusp: free swap pages: %u\n", free_swap);
- return free_swap > (nr_pages + PAGES_FOR_IO +
- (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
+ return free_swap > nr_pages + PAGES_FOR_IO;
}

/**
@@ -548,6 +547,7 @@ static int load_image(struct swap_map_ha
error = err2;
if (!error) {
printk("\b\b\b\bdone\n");
+ snapshot_free_unused_memory(snapshot);
if (!snapshot_image_loaded(snapshot))
error = -ENODATA;
}
Index: linux-2.6.18-rc3-mm2/kernel/power/user.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/user.c
+++ linux-2.6.18-rc3-mm2/kernel/power/user.c
@@ -193,6 +193,7 @@ static int snapshot_ioctl(struct inode *
error = -EPERM;
break;
}
+ snapshot_free_unused_memory(&data->handle);
down(&pm_sem);
pm_prepare_console();
error = device_suspend(PMSG_PRETHAW);
Index: linux-2.6.18-rc3-mm2/include/linux/suspend.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/include/linux/suspend.h
+++ linux-2.6.18-rc3-mm2/include/linux/suspend.h
@@ -9,30 +9,17 @@
#include <linux/init.h>
#include <linux/pm.h>

-/* page backup entry */
-typedef struct pbe {
+/* struct pbe is used for creating the list of pages that should be restored
+ * atomically during the resume from disk, because the page frames they have
+ * occupied before the suspend are in use.
+ */
+struct pbe {
unsigned long address; /* address of the copy */
unsigned long orig_address; /* original address of page */
struct pbe *next;
-} suspend_pagedir_t;
-
-#define for_each_pbe(pbe, pblist) \
- for (pbe = pblist ; pbe ; pbe = pbe->next)
-
-#define PBES_PER_PAGE (PAGE_SIZE/sizeof(struct pbe))
-#define PB_PAGE_SKIP (PBES_PER_PAGE-1)
-
-#define for_each_pb_page(pbe, pblist) \
- for (pbe = pblist ; pbe ; pbe = (pbe+PB_PAGE_SKIP)->next)
-
-
-#define SWAP_FILENAME_MAXLENGTH 32
-
+};

extern dev_t swsusp_resume_device;
-
-/* mm/vmscan.c */
-extern int shrink_mem(void);

/* mm/page_alloc.c */
extern void drain_local_pages(void);
@@ -53,7 +40,7 @@ static inline void pm_restore_console(vo
static inline int software_suspend(void)
{
printk("Warning: fake suspend called\n");
- return -EPERM;
+ return -ENOSYS;
}
#endif /* CONFIG_PM */


2006-08-09 10:16:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH -mm 5/5] swsusp: Introduce some helpful constants

Introduce some constants that hopefully will help improve the readability kernel/power/snapshot.c | 45 +++++++++++++++++++++++++--------------------
1 files changed, 25 insertions(+), 20 deletions(-)

Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
+++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
@@ -57,6 +57,11 @@ static void *buffer;
* so that swsusp_free() can release it.
*/

+#define PG_ANY 0
+#define PG_SAFE 1
+#define PG_UNSAFE_CLEAR 1
+#define PG_UNSAFE_KEEP 0
+
static unsigned int allocated_unsafe_pages;

static void *get_image_page(gfp_t gfp_mask, int safe_needed)
@@ -357,7 +362,7 @@ create_memory_bm(struct memory_bitmap *b
zone_bm = create_zone_bm_list(nr, &ca);
bm->zone_bm_list = zone_bm;
if (!zone_bm) {
- chain_free(&ca, 1);
+ chain_free(&ca, PG_UNSAFE_CLEAR);
return -ENOMEM;
}

@@ -410,7 +415,7 @@ create_memory_bm(struct memory_bitmap *b

Free:
bm->p_list = ca.chain;
- free_memory_bm(bm, 1);
+ free_memory_bm(bm, PG_UNSAFE_CLEAR);
return -ENOMEM;
}

@@ -936,15 +941,15 @@ swsusp_alloc(struct memory_bitmap *orig_
{
int error;

- error = create_memory_bm(orig_bm, GFP_ATOMIC, 0);
+ error = create_memory_bm(orig_bm, GFP_ATOMIC, PG_ANY);
if (error)
goto Free;

- error = create_memory_bm(copy_bm, GFP_ATOMIC, 0);
+ error = create_memory_bm(copy_bm, GFP_ATOMIC, PG_ANY);
if (error)
goto Free;

- error = get_highmem_buffer(0);
+ error = get_highmem_buffer(PG_ANY);
if (error)
goto Free;

@@ -1078,7 +1083,7 @@ int snapshot_read_next(struct snapshot_h

if (!buffer) {
/* This makes the buffer be freed by swsusp_free() */
- buffer = get_image_page(GFP_ATOMIC, 0);
+ buffer = get_image_page(GFP_ATOMIC, PG_ANY);
if (!buffer)
return -ENOMEM;
}
@@ -1299,10 +1304,10 @@ prepare_highmem_image(struct memory_bitm
{
unsigned int to_alloc;

- if (create_memory_bm(bm, GFP_ATOMIC, 1))
+ if (create_memory_bm(bm, GFP_ATOMIC, PG_SAFE))
return -ENOMEM;

- if (get_highmem_buffer(1))
+ if (get_highmem_buffer(PG_SAFE))
return -ENOMEM;

to_alloc = count_free_highmem_pages();
@@ -1415,8 +1420,8 @@ static inline int last_highmem_page_copi

static inline void free_highmem_data(void)
{
- free_memory_bm(safe_highmem_bm, 1);
- free_image_page(buffer, 1);
+ free_memory_bm(safe_highmem_bm, PG_UNSAFE_CLEAR);
+ free_image_page(buffer, PG_UNSAFE_CLEAR);
}
#else
static inline int get_safe_write_buffer(void) { return 0; }
@@ -1465,19 +1470,19 @@ prepare_image(struct memory_bitmap *new_
int error;

/* If there is no highmem, the buffer will not be necessary */
- free_image_page(buffer, 1);
+ free_image_page(buffer, PG_UNSAFE_CLEAR);

nr_highmem = count_highmem_image_pages(bm);
error = mark_unsafe_pages(bm);
if (error)
goto Free;

- error = create_memory_bm(new_bm, GFP_ATOMIC, 1);
+ error = create_memory_bm(new_bm, GFP_ATOMIC, PG_SAFE);
if (error)
goto Free;

duplicate_memory_bitmap(new_bm, bm);
- free_memory_bm(bm, 0);
+ free_memory_bm(bm, PG_UNSAFE_KEEP);
if (nr_highmem > 0) {
error = prepare_highmem_image(bm, &nr_highmem);
if (error)
@@ -1494,7 +1499,7 @@ prepare_image(struct memory_bitmap *new_
nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages;
nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE);
while (nr_pages > 0) {
- lp = get_image_page(GFP_ATOMIC, 1);
+ lp = get_image_page(GFP_ATOMIC, PG_SAFE);
if (!lp) {
error = -ENOMEM;
goto Free;
@@ -1525,7 +1530,7 @@ prepare_image(struct memory_bitmap *new_
/* Free the reserved safe pages so that chain_alloc() can use them */
while (sp_list) {
lp = sp_list->next;
- free_image_page(sp_list, 1);
+ free_image_page(sp_list, PG_UNSAFE_CLEAR);
sp_list = lp;
}
return 0;
@@ -1604,7 +1609,7 @@ int snapshot_write_next(struct snapshot_
if (handle->offset == 0) {
if (!buffer)
/* This makes the buffer be freed by swsusp_free() */
- buffer = get_image_page(GFP_ATOMIC, 0);
+ buffer = get_image_page(GFP_ATOMIC, PG_ANY);

if (!buffer)
return -ENOMEM;
@@ -1618,7 +1623,7 @@ int snapshot_write_next(struct snapshot_
if (error)
return error;

- error = create_memory_bm(&copy_bm, GFP_ATOMIC, 0);
+ error = create_memory_bm(&copy_bm, GFP_ATOMIC, PG_ANY);
if (error)
return error;

@@ -1668,7 +1673,7 @@ void snapshot_write_finalize(struct snap
copy_last_highmem_page();
/* Free only if we have loaded the image entirely */
if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages) {
- free_memory_bm(&orig_bm, 1);
+ free_memory_bm(&orig_bm, PG_UNSAFE_CLEAR);
free_highmem_data();
}
}
@@ -1710,7 +1715,7 @@ int restore_highmem(void)
struct highmem_pbe *pbe = highmem_pblist;
void *buf;

- buf = get_image_page(GFP_ATOMIC, 1);
+ buf = get_image_page(GFP_ATOMIC, PG_SAFE);
if (!buf)
return -ENOMEM;

@@ -1718,7 +1723,7 @@ int restore_highmem(void)
swap_two_pages_data(pbe->copy_page, pbe->orig_page, buf);
pbe = pbe->next;
}
- free_image_page(buf, 1);
+ free_image_page(buf, PG_UNSAFE_CLEAR);
return 0;
}
#endif /* CONFIG_HIGHMEM */

of the code.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---

2006-08-09 10:16:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Introduce the memory bitmap data structure and make swsusp use in the suspend
phase.

The current swsusp's internal data structure is not very efficient from the
memory usage point of view, so it seems reasonable to replace it with a data
structure that will require less memory, such as a pair of bitmaps.

The idea is to use bitmaps that may be allocated as sets of individual pages,
so that we can avoid making allocations of order greater than 0. For this
reason the memory bitmap structure consists of several linked lists of objects
that contain pointers to memory pages with the actual bitmap data. Still, for
a typical system all of these lists fit in a single page, so it's reasonable
to introduce an additional mechanism allowing us to allocate all of them
efficiently without sacrificing the generality of the design. This is done
with the help of the chain_allocator structure and associated functions.

We need to use two memory bitmaps during the suspend phase of the
suspend-resume cycle. One of them is necessary for marking the saveable
pages, and the second is used to mark the pages in which to store the copies
of them (aka image pages).

First, the bitmaps are created and we allocate as many image pages as needed
(the corresponding bits in the second bitmap are set as soon as the pages are
allocated). Second, the bits corresponding to the saveable pages are set in
the first bitmap and the saveable pages are copied to the image pages.
Finally, the first bitmap is used to save the kernel virtual addresses of the
saveable pages and the second one is used to save the contents of the image
pages.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/power.h | 3
kernel/power/snapshot.c | 605 ++++++++++++++++++++++++++++++++++++++++++------
kernel/power/swsusp.c | 5
3 files changed, 542 insertions(+), 71 deletions(-)

Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
+++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
@@ -204,6 +204,467 @@ static inline void free_image_page(void
free_page((unsigned long)addr);
}

+/* struct linked_page is used to build chains of pages */
+
+#define LINKED_PAGE_DATA_SIZE (PAGE_SIZE - sizeof(void *))
+
+struct linked_page {
+ struct linked_page *next;
+ char data[LINKED_PAGE_DATA_SIZE];
+} __attribute__((packed));
+
+static inline void
+free_list_of_pages(struct linked_page *list, int clear_page_nosave)
+{
+ while (list) {
+ struct linked_page *lp = list->next;
+
+ free_image_page(list, clear_page_nosave);
+ list = lp;
+ }
+}
+
+/**
+ * struct chain_allocator is used for allocating small objects out of
+ * a linked list of pages called 'the chain'.
+ *
+ * The chain grows each time when there is no room for a new object in
+ * the current page. The allocated objects cannot be freed individually.
+ * It is only possible to free them all at once, by freeing the entire
+ * chain.
+ *
+ * NOTE: The chain allocator may be inefficient if the allocated objects
+ * are not much smaller than PAGE_SIZE.
+ */
+
+struct chain_allocator {
+ struct linked_page *chain; /* the chain */
+ unsigned int used_space; /* total size of objects allocated out
+ * of the current page
+ */
+ gfp_t gfp_mask; /* mask for allocating pages */
+ int safe_needed; /* if set, only "safe" pages are allocated */
+};
+
+static void
+chain_init(struct chain_allocator *ca, gfp_t gfp_mask, int safe_needed)
+{
+ ca->chain = NULL;
+ ca->used_space = LINKED_PAGE_DATA_SIZE;
+ ca->gfp_mask = gfp_mask;
+ ca->safe_needed = safe_needed;
+}
+
+static void *chain_alloc(struct chain_allocator *ca, unsigned int size)
+{
+ void *ret;
+
+ if (LINKED_PAGE_DATA_SIZE - ca->used_space < size) {
+ struct linked_page *lp;
+
+ lp = alloc_image_page(ca->gfp_mask, ca->safe_needed);
+ if (!lp)
+ return NULL;
+
+ lp->next = ca->chain;
+ ca->chain = lp;
+ ca->used_space = 0;
+ }
+ ret = ca->chain->data + ca->used_space;
+ ca->used_space += size;
+ return ret;
+}
+
+static void chain_free(struct chain_allocator *ca, int clear_page_nosave)
+{
+ free_list_of_pages(ca->chain, clear_page_nosave);
+ memset(ca, 0, sizeof(struct chain_allocator));
+}
+
+/**
+ * Data types related to memory bitmaps.
+ *
+ * Memory bitmap is a structure consiting of many linked lists of
+ * objects. The main list's elements are of type struct zone_bitmap
+ * and each of them corresonds to one zone. For each zone bitmap
+ * object there is a list of objects of type struct bm_block that
+ * represent each blocks of bit chunks in which information is
+ * stored.
+ *
+ * struct memory_bitmap contains a pointer to the main list of zone
+ * bitmap objects, a struct bm_position used for browsing the bitmap,
+ * and a pointer to the list of pages used for allocating all of the
+ * zone bitmap objects and bitmap block objects.
+ *
+ * NOTE: It has to be possible to lay out the bitmap in memory
+ * using only allocations of order 0. Additionally, the bitmap is
+ * designed to work with arbitrary number of zones (this is over the
+ * top for now, but let's avoid making unnecessary assumptions ;-).
+ *
+ * struct zone_bitmap contains a pointer to a list of bitmap block
+ * objects and a pointer to the bitmap block object that has been
+ * most recently used for setting bits. Additionally, it contains the
+ * pfns that correspond to the start and end of the represented zone.
+ *
+ * struct bm_block contains a pointer to the memory page in which
+ * information is stored (in the form of a block of bit chunks
+ * of type unsigned long each). It also contains the pfns that
+ * correspond to the start and end of the represented memory area and
+ * the number of bit chunks in the block.
+ *
+ * NOTE: Memory bitmaps are used for two types of operations only:
+ * "set a bit" and "find the next bit set". Moreover, the searching
+ * is always carried out after all of the "set a bit" operations
+ * on given bitmap.
+ */
+
+#define BM_END_OF_MAP (~0UL)
+
+#define BM_CHUNKS_PER_BLOCK (PAGE_SIZE / sizeof(long))
+#define BM_BITS_PER_CHUNK (sizeof(long) << 3)
+#define BM_BITS_PER_BLOCK (PAGE_SIZE << 3)
+
+struct bm_block {
+ struct bm_block *next; /* next element of the list */
+ unsigned long start_pfn; /* pfn represented by the first bit */
+ unsigned long end_pfn; /* pfn represented by the last bit plus 1 */
+ unsigned int size; /* number of bit chunks */
+ unsigned long *data; /* chunks of bits representing pages */
+};
+
+struct zone_bitmap {
+ struct zone_bitmap *next; /* next element of the list */
+ unsigned long start_pfn; /* minimal pfn in this zone */
+ unsigned long end_pfn; /* maximal pfn in this zone plus 1 */
+ struct bm_block *bm_blocks; /* list of bitmap blocks */
+ struct bm_block *cur_block; /* recently used bitmap block */
+};
+
+/* strcut bm_position is used for browsing memory bitmaps */
+
+struct bm_position {
+ struct zone_bitmap *zone_bm;
+ struct bm_block *block;
+ int chunk;
+ int bit;
+};
+
+struct memory_bitmap {
+ struct zone_bitmap *zone_bm_list; /* list of zone bitmaps */
+ struct linked_page *p_list; /* list of pages used to store zone
+ * bitmap objects and bitmap block
+ * objects
+ */
+ struct bm_position cur; /* most recently used bit position */
+};
+
+/* Functions that operate on memory bitmaps */
+
+static inline void bm_position_reset_chunk(struct memory_bitmap *bm)
+{
+ bm->cur.chunk = 0;
+ bm->cur.bit = -1;
+}
+
+static void bm_position_reset(struct memory_bitmap *bm)
+{
+ struct zone_bitmap *zone_bm;
+
+ zone_bm = bm->zone_bm_list;
+ bm->cur.zone_bm = zone_bm;
+ bm->cur.block = zone_bm->bm_blocks;
+ bm_position_reset_chunk(bm);
+}
+
+static void free_memory_bm(struct memory_bitmap *bm, int clear_nosave_free);
+
+/**
+ * create_bm_block_list - create a list of block bitmap objects
+ */
+
+static inline struct bm_block *
+create_bm_block_list(unsigned int nr_blocks, struct chain_allocator *ca)
+{
+ struct bm_block *bblist = NULL;
+
+ while (nr_blocks-- > 0) {
+ struct bm_block *bb;
+
+ bb = chain_alloc(ca, sizeof(struct bm_block));
+ if (!bb)
+ return NULL;
+
+ bb->next = bblist;
+ bblist = bb;
+ }
+ return bblist;
+}
+
+/**
+ * create_zone_bm_list - create a list of zone bitmap objects
+ */
+
+static inline struct zone_bitmap *
+create_zone_bm_list(unsigned int nr_zones, struct chain_allocator *ca)
+{
+ struct zone_bitmap *zbmlist = NULL;
+
+ while (nr_zones-- > 0) {
+ struct zone_bitmap *zbm;
+
+ zbm = chain_alloc(ca, sizeof(struct zone_bitmap));
+ if (!zbm)
+ return NULL;
+
+ zbm->next = zbmlist;
+ zbmlist = zbm;
+ }
+ return zbmlist;
+}
+
+/**
+ * create_memory_bm - allocate memory for a memory bitmap
+ */
+
+static int
+create_memory_bm(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed)
+{
+ struct chain_allocator ca;
+ struct zone *zone;
+ struct zone_bitmap *zone_bm;
+ struct bm_block *bb;
+ unsigned int nr;
+
+ chain_init(&ca, gfp_mask, safe_needed);
+
+ /* Compute the number of zones */
+ nr = 0;
+ for_each_zone (zone)
+ if (populated_zone(zone) && !is_highmem(zone))
+ nr++;
+
+ /* Allocate the list of zones bitmap objects */
+ zone_bm = create_zone_bm_list(nr, &ca);
+ bm->zone_bm_list = zone_bm;
+ if (!zone_bm) {
+ chain_free(&ca, 1);
+ return -ENOMEM;
+ }
+
+ /* Initialize the zone bitmap objects */
+ for_each_zone (zone) {
+ unsigned long pfn;
+
+ if (!populated_zone(zone) || is_highmem(zone))
+ continue;
+
+ zone_bm->start_pfn = zone->zone_start_pfn;
+ zone_bm->end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ /* Allocate the list of bitmap block objects */
+ nr = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK);
+ bb = create_bm_block_list(nr, &ca);
+ zone_bm->bm_blocks = bb;
+ zone_bm->cur_block = bb;
+ if (!bb)
+ goto Free;
+
+ nr = zone->spanned_pages;
+ pfn = zone->zone_start_pfn;
+ /* Initialize the bitmap block objects */
+ while (bb) {
+ unsigned long *ptr;
+
+ ptr = alloc_image_page(gfp_mask, safe_needed);
+ bb->data = ptr;
+ if (!ptr)
+ goto Free;
+
+ bb->start_pfn = pfn;
+ if (nr >= BM_BITS_PER_BLOCK) {
+ pfn += BM_BITS_PER_BLOCK;
+ bb->size = BM_CHUNKS_PER_BLOCK;
+ nr -= BM_BITS_PER_BLOCK;
+ } else {
+ /* This is executed only once in the loop */
+ pfn += nr;
+ bb->size = DIV_ROUND_UP(nr, BM_BITS_PER_CHUNK);
+ }
+ bb->end_pfn = pfn;
+ bb = bb->next;
+ }
+ zone_bm = zone_bm->next;
+ }
+ bm->p_list = ca.chain;
+ bm_position_reset(bm);
+ return 0;
+
+Free:
+ bm->p_list = ca.chain;
+ free_memory_bm(bm, 1);
+ return -ENOMEM;
+}
+
+/**
+ * free_memory_bm - free memory occupied by the memory bitmap @bm
+ */
+
+static void free_memory_bm(struct memory_bitmap *bm, int clear_nosave_free)
+{
+ struct zone_bitmap *zone_bm;
+
+ /* Free the list of bit blocks for each zone_bitmap object */
+ zone_bm = bm->zone_bm_list;
+ while (zone_bm) {
+ struct bm_block *bb;
+
+ bb = zone_bm->bm_blocks;
+ while (bb) {
+ if (bb->data)
+ free_image_page(bb->data, clear_nosave_free);
+ bb = bb->next;
+ }
+ zone_bm = zone_bm->next;
+ }
+ free_list_of_pages(bm->p_list, clear_nosave_free);
+ bm->zone_bm_list = NULL;
+}
+
+/**
+ * memory_bm_set_bit - set the bit in the bitmap @bm that corresponds
+ * to given pfn. The cur_zone_bm member of @bm and the cur_block member
+ * of @bm->cur_zone_bm are updated.
+ *
+ * If the bit cannot be set, the function returns -EINVAL .
+ */
+
+static int
+memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn)
+{
+ struct zone_bitmap *zone_bm;
+ struct bm_block *bb;
+
+ /* Check if the pfn is from the current zone */
+ zone_bm = bm->cur.zone_bm;
+ if (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
+ zone_bm = bm->zone_bm_list;
+ /* We don't assume that the zones are sorted by pfns */
+ while (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
+ zone_bm = zone_bm->next;
+ if (unlikely(!zone_bm))
+ return -EINVAL;
+ }
+ bm->cur.zone_bm = zone_bm;
+ }
+ /* Check if the pfn corresponds to the current bitmap block */
+ bb = zone_bm->cur_block;
+ if (pfn < bb->start_pfn)
+ bb = zone_bm->bm_blocks;
+
+ while (pfn >= bb->end_pfn) {
+ bb = bb->next;
+ if (unlikely(!bb))
+ return -EINVAL;
+ }
+ zone_bm->cur_block = bb;
+ pfn -= bb->start_pfn;
+ set_bit(pfn % BM_BITS_PER_CHUNK, bb->data + pfn / BM_BITS_PER_CHUNK);
+ return 0;
+}
+
+/* Two auxiliary functions for memory_bm_next_pfn */
+
+/* Find the first set bit in the given chunk, if there is one */
+
+static inline int next_bit_in_chunk(int bit, unsigned long *chunk_p)
+{
+ bit++;
+ while (bit < BM_BITS_PER_CHUNK) {
+ if (test_bit(bit, chunk_p))
+ return bit;
+
+ bit++;
+ }
+ return -1;
+}
+
+/* Find a chunk containing some bits set in given block of bits */
+
+static inline int next_chunk_in_block(int n, struct bm_block *bb)
+{
+ n++;
+ while (n < bb->size) {
+ if (bb->data[n])
+ return n;
+
+ n++;
+ }
+ return -1;
+}
+
+/**
+ * memory_bm_next_pfn - find the pfn that corresponds to the next set bit
+ * in the bitmap @bm. If the pfn cannot be found, BM_END_OF_MAP is
+ * returned.
+ *
+ * It is required to run bm_position_reset() before the first call to
+ * this function.
+ */
+
+static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm)
+{
+ struct zone_bitmap *zone_bm;
+ struct bm_block *bb;
+ int chunk;
+ int bit;
+
+ do {
+ bb = bm->cur.block;
+ do {
+ chunk = bm->cur.chunk;
+ bit = bm->cur.bit;
+ do {
+ bit = next_bit_in_chunk(bit, bb->data + chunk);
+ if (bit >= 0)
+ goto Return_pfn;
+
+ chunk = next_chunk_in_block(chunk, bb);
+ bit = -1;
+ } while (chunk >= 0);
+ bb = bb->next;
+ bm->cur.block = bb;
+ bm_position_reset_chunk(bm);
+ } while (bb);
+ zone_bm = bm->cur.zone_bm->next;
+ if (zone_bm) {
+ bm->cur.zone_bm = zone_bm;
+ bm->cur.block = zone_bm->bm_blocks;
+ bm_position_reset_chunk(bm);
+ }
+ } while (zone_bm);
+ bm_position_reset(bm);
+ return BM_END_OF_MAP;
+
+Return_pfn:
+ bm->cur.chunk = chunk;
+ bm->cur.bit = bit;
+ return bb->start_pfn + chunk * BM_BITS_PER_CHUNK + bit;
+}
+
+/**
+ * snapshot_additional_pages - estimate the number of additional pages
+ * be needed for setting up the suspend image data structures for given
+ * zone (usually the returned value is greater than the exact number)
+ */
+
+unsigned int snapshot_additional_pages(struct zone *zone)
+{
+ unsigned int res;
+
+ res = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK);
+ res += DIV_ROUND_UP(res * sizeof(struct bm_block), PAGE_SIZE);
+ return res;
+}
+
/**
* pfn_is_nosave - check if given pfn is in the 'nosave' section
*/
@@ -269,32 +730,38 @@ static inline void copy_data_page(long *
*dst++ = *src++;
}

-static void copy_data_pages(struct pbe *pblist)
+static void
+copy_data_pages(struct memory_bitmap *copy_bm, struct memory_bitmap *orig_bm)
{
struct zone *zone;
- unsigned long pfn, max_zone_pfn;
- struct pbe *pbe;
+ unsigned long pfn;

- pbe = pblist;
for_each_zone (zone) {
+ unsigned long max_zone_pfn;
+
if (is_highmem(zone))
continue;
+
mark_free_pages(zone);
max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
- for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
- struct page *page = saveable_page(pfn);
-
- if (page) {
- void *ptr = page_address(page);
-
- BUG_ON(!pbe);
- copy_data_page((void *)pbe->address, ptr);
- pbe->orig_address = (unsigned long)ptr;
- pbe = pbe->next;
- }
- }
+ for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+ if (saveable_page(pfn))
+ memory_bm_set_bit(orig_bm, pfn);
}
- BUG_ON(pbe);
+ bm_position_reset(orig_bm);
+ bm_position_reset(copy_bm);
+ do {
+ pfn = memory_bm_next_pfn(orig_bm);
+ if (likely(pfn != BM_END_OF_MAP)) {
+ struct page *page;
+ void *src;
+
+ page = pfn_to_page(pfn);
+ src = page_address(page);
+ page = pfn_to_page(memory_bm_next_pfn(copy_bm));
+ copy_data_page(page_address(page), src);
+ }
+ } while (pfn != BM_END_OF_MAP);
}

/**
@@ -440,36 +907,43 @@ static int enough_free_mem(unsigned int
(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
}

-static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
+static int
+swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
+ unsigned int nr_pages)
{
- struct pbe *p;
+ int error;

- for_each_pbe (p, pblist) {
- p->address = (unsigned long)alloc_image_page(gfp_mask, safe_needed);
- if (!p->address)
- return -ENOMEM;
+ error = create_memory_bm(orig_bm, GFP_ATOMIC, 0);
+ if (error)
+ goto Free;
+
+ error = create_memory_bm(copy_bm, GFP_ATOMIC, 0);
+ if (error)
+ goto Free;
+
+ while (nr_pages-- > 0) {
+ struct page *page = alloc_page(GFP_ATOMIC);
+ if (!page)
+ goto Free;
+
+ SetPageNosave(page);
+ SetPageNosaveFree(page);
+ memory_bm_set_bit(copy_bm, page_to_pfn(page));
}
return 0;
-}
-
-static struct pbe *swsusp_alloc(unsigned int nr_pages)
-{
- struct pbe *pblist;
-
- if (!(pblist = alloc_pagedir(nr_pages, GFP_ATOMIC | __GFP_COLD, 0))) {
- printk(KERN_ERR "suspend: Allocating pagedir failed.\n");
- return NULL;
- }
-
- if (alloc_data_pages(pblist, GFP_ATOMIC | __GFP_COLD, 0)) {
- printk(KERN_ERR "suspend: Allocating image pages failed.\n");
- swsusp_free();
- return NULL;
- }

- return pblist;
+Free:
+ swsusp_free();
+ return -ENOMEM;
}

+/* Memory bitmap used for marking saveable pages */
+static struct memory_bitmap orig_bm;
+/* Memory bitmap used for marking allocated pages that will contain the copies
+ * of saveable pages
+ */
+static struct memory_bitmap copy_bm;
+
asmlinkage int swsusp_save(void)
{
unsigned int nr_pages;
@@ -490,15 +964,14 @@ asmlinkage int swsusp_save(void)
return -ENOMEM;
}

- pagedir_nosave = swsusp_alloc(nr_pages);
- if (!pagedir_nosave)
+ if (swsusp_alloc(&orig_bm, &copy_bm, nr_pages))
return -ENOMEM;

/* During allocating of suspend pagedir, new cold pages may appear.
* Kill them.
*/
drain_local_pages();
- copy_data_pages(pagedir_nosave);
+ copy_data_pages(&copy_bm, &orig_bm);

/*
* End of critical section. From now on, we can write to memory,
@@ -527,22 +1000,23 @@ static void init_header(struct swsusp_in
}

/**
- * pack_orig_addresses - the .orig_address fields of the PBEs from the
- * list starting at @pbe are stored in the array @buf[] (1 page)
+ * pack_addresses - the addresses corresponding to pfns found in the
+ * bitmap @bm are stored in the array @buf[] (1 page)
*/

-static inline struct pbe *pack_orig_addresses(unsigned long *buf, struct pbe *pbe)
+static inline void
+pack_addresses(unsigned long *buf, struct memory_bitmap *bm)
{
int j;

- for (j = 0; j < PAGE_SIZE / sizeof(long) && pbe; j++) {
- buf[j] = pbe->orig_address;
- pbe = pbe->next;
+ for (j = 0; j < PAGE_SIZE / sizeof(long); j++) {
+ unsigned long pfn = memory_bm_next_pfn(bm);
+
+ if (unlikely(pfn == BM_END_OF_MAP))
+ break;
+
+ buf[j] = (unsigned long)page_address(pfn_to_page(pfn));
}
- if (!pbe)
- for (; j < PAGE_SIZE / sizeof(long); j++)
- buf[j] = 0;
- return pbe;
}

/**
@@ -571,6 +1045,7 @@ int snapshot_read_next(struct snapshot_h
{
if (handle->cur > nr_meta_pages + nr_copy_pages)
return 0;
+
if (!buffer) {
/* This makes the buffer be freed by swsusp_free() */
buffer = alloc_image_page(GFP_ATOMIC, 0);
@@ -580,16 +1055,17 @@ int snapshot_read_next(struct snapshot_h
if (!handle->offset) {
init_header((struct swsusp_info *)buffer);
handle->buffer = buffer;
- handle->pbe = pagedir_nosave;
+ bm_position_reset(&orig_bm);
+ bm_position_reset(&copy_bm);
}
if (handle->prev < handle->cur) {
if (handle->cur <= nr_meta_pages) {
- handle->pbe = pack_orig_addresses(buffer, handle->pbe);
- if (!handle->pbe)
- handle->pbe = pagedir_nosave;
+ memset(buffer, 0, PAGE_SIZE);
+ pack_addresses(buffer, &orig_bm);
} else {
- handle->buffer = (void *)handle->pbe->address;
- handle->pbe = handle->pbe->next;
+ unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+
+ handle->buffer = page_address(pfn_to_page(pfn));
}
handle->prev = handle->cur;
}
@@ -728,12 +1204,7 @@ static inline struct pbe *unpack_orig_ad
* of "safe" which will be used later
*/

-struct safe_page {
- struct safe_page *next;
- char padding[PAGE_SIZE - sizeof(void *)];
-};
-
-static struct safe_page *safe_pages;
+static struct linked_page *safe_pages;

static int prepare_image(struct snapshot_handle *handle)
{
@@ -755,9 +1226,9 @@ static int prepare_image(struct snapshot
if (!error && nr_pages > unsafe_pages) {
nr_pages -= unsafe_pages;
while (nr_pages--) {
- struct safe_page *ptr;
+ struct linked_page *ptr;

- ptr = (struct safe_page *)get_zeroed_page(GFP_ATOMIC);
+ ptr = (void *)get_zeroed_page(GFP_ATOMIC);
if (!ptr) {
error = -ENOMEM;
break;
Index: linux-2.6.18-rc3-mm2/kernel/power/power.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/power.h
+++ linux-2.6.18-rc3-mm2/kernel/power/power.h
@@ -111,9 +111,10 @@ struct snapshot_handle {
*/
#define data_of(handle) ((handle).buffer + (handle).buf_offset)

+extern unsigned int snapshot_additional_pages(struct zone *zone);
extern int snapshot_read_next(struct snapshot_handle *handle, size_t count);
extern int snapshot_write_next(struct snapshot_handle *handle, size_t count);
-int snapshot_image_loaded(struct snapshot_handle *handle);
+extern int snapshot_image_loaded(struct snapshot_handle *handle);

#define SNAPSHOT_IOC_MAGIC '3'
#define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
Index: linux-2.6.18-rc3-mm2/kernel/power/swsusp.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/swsusp.c
+++ linux-2.6.18-rc3-mm2/kernel/power/swsusp.c
@@ -193,14 +193,13 @@ int swsusp_shrink_memory(void)
printk("Shrinking memory... ");
do {
size = 2 * count_highmem_pages();
- size += size / 50 + count_data_pages();
- size += (size + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
- PAGES_FOR_IO;
+ size += size / 50 + count_data_pages() + PAGES_FOR_IO;
tmp = size;
for_each_zone (zone)
if (!is_highmem(zone) && populated_zone(zone)) {
tmp -= zone->free_pages;
tmp += zone->lowmem_reserve[ZONE_NORMAL];
+ tmp += snapshot_additional_pages(zone);
}
if (tmp > 0) {
tmp = __shrink_memory(tmp);

2006-08-09 10:17:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH -mm 3/5] swsusp: Fix handling of highmem

Make swsusp handle highmem pages similarly to the pages of the "normal"
memory.

During the suspend phase of the suspend-resume cycle we try to allocate as
many free highmem pages as there are saveable highmem pages. If there
are not enough highmem image pages to store the contents of all of the
saveable highmem pages, some of them will be stored in image pages located
in the "normal" memory. Next, we allocate as many free non-highmem pages
as needed to store the image data that will not be stored in the highmem.
We use a memory bitmap to mark the allocated free pages (ie. highmem and
"normal" image pages).

Now, we use another memory bitmap to mark all of the saveable pages (highmem
as well as "normal") and the saveable pages are copied into the image pages.
Then, the second bitmap is used to save the pfns corresponding to the saveable
pages and the first one is used to save their data.

During the resume phase the pfns of the pages that were saveable during the
suspend are loaded from the image and used to mark the "unsafe" page frames.
Next, we try to allocate as many free highmem page frames as to load all of
the image data that had been in the highmem before the suspend and we
allocate so many free "normal" page frames that the total number of allocated
free pages (highmem and "normal") is equal to the size of the image. While
doing this we have to make sure that there will be some extra free "normal"
and "safe" page frames for two lists of PBEs constructed later.

Now, the image data are loaded, if possible, into their "original" page
frames. The image data that cannot be written into their "original" page
frames are loaded into "safe" page frames and their "original" kernel virtual
addresses, as well as the addresses of the "safe" pages containing their
copies, are stored in one of two lists of PBEs.

One list of PBEs is for the copies of "normal" suspend pages (ie. "normal"
pages that were saveable during the suspend) and it is used in the same
way as previously (ie. by the architecture-dependent parts of swsusp). The
other list of PBEs is for the copies of highmem suspend pages. The pages in
this list are restored (in a reversible way) right before the arch-dependent
code is called.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
include/linux/suspend.h | 6
kernel/power/power.h | 2
kernel/power/snapshot.c | 822 ++++++++++++++++++++++++++++++++++++------------
kernel/power/swap.c | 2
kernel/power/swsusp.c | 53 +--
kernel/power/user.c | 2
mm/vmscan.c | 3
7 files changed, 658 insertions(+), 232 deletions(-)

Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
+++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
@@ -1,15 +1,15 @@
/*
* linux/kernel/power/snapshot.c
*
- * This file provide system snapshot/restore functionality.
+ * This file provides system snapshot/restore functionality for swsusp.
*
* Copyright (C) 1998-2005 Pavel Machek <[email protected]>
+ * Copyright (C) 2006 Rafael J. Wysocki <[email protected]>
*
- * This file is released under the GPLv2, and is based on swsusp.c.
+ * This file is released under the GPLv2.
*
*/

-
#include <linux/version.h>
#include <linux/module.h>
#include <linux/mm.h>
@@ -34,140 +34,29 @@

#include "power.h"

+/* List of PBEs needed for restoring the pages that were allocated before
+ * the suspend and included in the suspend image, but have also been
+ * allocated by the "resume" kernel, so their contents cannot be written
+ * directly to their "original" page frames.
+ */
struct pbe *pagedir_nosave;
-static unsigned int nr_copy_pages;
-static unsigned int nr_meta_pages;
-static void *buffer;
-
-#ifdef CONFIG_HIGHMEM
-unsigned int count_highmem_pages(void)
-{
- struct zone *zone;
- unsigned long zone_pfn;
- unsigned int n = 0;
-
- for_each_zone (zone)
- if (is_highmem(zone)) {
- mark_free_pages(zone);
- for (zone_pfn = 0; zone_pfn < zone->spanned_pages; zone_pfn++) {
- struct page *page;
- unsigned long pfn = zone_pfn + zone->zone_start_pfn;
- if (!pfn_valid(pfn))
- continue;
- page = pfn_to_page(pfn);
- if (PageReserved(page))
- continue;
- if (PageNosaveFree(page))
- continue;
- n++;
- }
- }
- return n;
-}
-
-struct highmem_page {
- char *data;
- struct page *page;
- struct highmem_page *next;
-};
-
-static struct highmem_page *highmem_copy;
-
-static int save_highmem_zone(struct zone *zone)
-{
- unsigned long zone_pfn;
- mark_free_pages(zone);
- for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
- struct page *page;
- struct highmem_page *save;
- void *kaddr;
- unsigned long pfn = zone_pfn + zone->zone_start_pfn;
-
- if (!(pfn%10000))
- printk(".");
- if (!pfn_valid(pfn))
- continue;
- page = pfn_to_page(pfn);
- /*
- * This condition results from rvmalloc() sans vmalloc_32()
- * and architectural memory reservations. This should be
- * corrected eventually when the cases giving rise to this
- * are better understood.
- */
- if (PageReserved(page))
- continue;
- BUG_ON(PageNosave(page));
- if (PageNosaveFree(page))
- continue;
- save = kmalloc(sizeof(struct highmem_page), GFP_ATOMIC);
- if (!save)
- return -ENOMEM;
- save->next = highmem_copy;
- save->page = page;
- save->data = (void *) get_zeroed_page(GFP_ATOMIC);
- if (!save->data) {
- kfree(save);
- return -ENOMEM;
- }
- kaddr = kmap_atomic(page, KM_USER0);
- memcpy(save->data, kaddr, PAGE_SIZE);
- kunmap_atomic(kaddr, KM_USER0);
- highmem_copy = save;
- }
- return 0;
-}
-
-int save_highmem(void)
-{
- struct zone *zone;
- int res = 0;
-
- pr_debug("swsusp: Saving Highmem");
- drain_local_pages();
- for_each_zone (zone) {
- if (is_highmem(zone))
- res = save_highmem_zone(zone);
- if (res)
- return res;
- }
- printk("\n");
- return 0;
-}

-int restore_highmem(void)
-{
- printk("swsusp: Restoring Highmem\n");
- while (highmem_copy) {
- struct highmem_page *save = highmem_copy;
- void *kaddr;
- highmem_copy = save->next;
-
- kaddr = kmap_atomic(save->page, KM_USER0);
- memcpy(kaddr, save->data, PAGE_SIZE);
- kunmap_atomic(kaddr, KM_USER0);
- free_page((long) save->data);
- kfree(save);
- }
- return 0;
-}
-#else
-static inline unsigned int count_highmem_pages(void) {return 0;}
-static inline int save_highmem(void) {return 0;}
-static inline int restore_highmem(void) {return 0;}
-#endif
+/* Pointer to an auxiliary buffer (1 page) */
+static void *buffer;

/**
* @safe_needed - on resume, for storing the PBE list and the image,
* we can only use memory pages that do not conflict with the pages
- * used before suspend.
+ * used before suspend. The unsafe pages have PageNosaveFree set
+ * and we count them using unsafe_pages.
*
- * The unsafe pages are marked with the PG_nosave_free flag
- * and we count them using unsafe_pages
+ * Each allocated image page is marked as PageNosave and PageNosaveFree
+ * so that swsusp_free() can release it.
*/

static unsigned int allocated_unsafe_pages;

-static void *alloc_image_page(gfp_t gfp_mask, int safe_needed)
+static void *get_image_page(gfp_t gfp_mask, int safe_needed)
{
void *res;

@@ -188,20 +77,38 @@ static void *alloc_image_page(gfp_t gfp_

unsigned long get_safe_page(gfp_t gfp_mask)
{
- return (unsigned long)alloc_image_page(gfp_mask, 1);
+ return (unsigned long)get_image_page(gfp_mask, 1);
+}
+
+static struct page *alloc_image_page(gfp_t gfp_mask) {
+ struct page *page;
+
+ page = alloc_page(gfp_mask);
+ if (page) {
+ SetPageNosave(page);
+ SetPageNosaveFree(page);
+ }
+ return page;
}

/**
* free_image_page - free page represented by @addr, allocated with
- * alloc_image_page (page flags set by it must be cleared)
+ * get_image_page (page flags set by it must be cleared)
*/

static inline void free_image_page(void *addr, int clear_nosave_free)
{
- ClearPageNosave(virt_to_page(addr));
+ struct page *page;
+
+ BUG_ON(!virt_addr_valid(addr));
+
+ page = virt_to_page(addr);
+
+ ClearPageNosave(page);
if (clear_nosave_free)
- ClearPageNosaveFree(virt_to_page(addr));
- free_page((unsigned long)addr);
+ ClearPageNosaveFree(page);
+
+ __free_page(page);
}

/* struct linked_page is used to build chains of pages */
@@ -262,7 +169,7 @@ static void *chain_alloc(struct chain_al
if (LINKED_PAGE_DATA_SIZE - ca->used_space < size) {
struct linked_page *lp;

- lp = alloc_image_page(ca->gfp_mask, ca->safe_needed);
+ lp = get_image_page(ca->gfp_mask, ca->safe_needed);
if (!lp)
return NULL;

@@ -440,7 +347,7 @@ create_memory_bm(struct memory_bitmap *b
/* Compute the number of zones */
nr = 0;
for_each_zone (zone)
- if (populated_zone(zone) && !is_highmem(zone))
+ if (populated_zone(zone))
nr++;

/* Allocate the list of zones bitmap objects */
@@ -455,7 +362,7 @@ create_memory_bm(struct memory_bitmap *b
for_each_zone (zone) {
unsigned long pfn;

- if (!populated_zone(zone) || is_highmem(zone))
+ if (!populated_zone(zone))
continue;

zone_bm->start_pfn = zone->zone_start_pfn;
@@ -474,7 +381,7 @@ create_memory_bm(struct memory_bitmap *b
while (bb) {
unsigned long *ptr;

- ptr = alloc_image_page(gfp_mask, safe_needed);
+ ptr = get_image_page(gfp_mask, safe_needed);
bb->data = ptr;
if (!ptr)
goto Free;
@@ -665,6 +572,78 @@ unsigned int snapshot_additional_pages(s
return res;
}

+#ifdef CONFIG_HIGHMEM
+/**
+ * count_free_highmem_pages - compute the total number of free highmem
+ * pages, system-wide.
+ */
+
+static unsigned int count_free_highmem_pages(void)
+{
+ struct zone *zone;
+ unsigned int cnt = 0;
+
+ for_each_zone (zone)
+ if (populated_zone(zone) && is_highmem(zone))
+ cnt += zone->free_pages;
+
+ return cnt;
+}
+
+/**
+ * saveable_highmem_page - Determine whether a highmem page should be
+ * included in the suspend image.
+ *
+ * We should save the page if it isn't Nosave or NosaveFree, or Reserved,
+ * and it isn't a part of a free chunk of pages.
+ */
+
+static struct page *saveable_highmem_page(unsigned long pfn)
+{
+ struct page *page;
+
+ if (!pfn_valid(pfn))
+ return NULL;
+
+ page = pfn_to_page(pfn);
+
+ BUG_ON(!PageHighMem(page));
+
+ if (PageNosave(page) || PageReserved(page) || PageNosaveFree(page))
+ return NULL;
+
+ return page;
+}
+
+/**
+ * count_highmem_pages - compute the total number of saveable highmem
+ * pages.
+ */
+
+unsigned int count_highmem_pages(void)
+{
+ struct zone *zone;
+ unsigned int n = 0;
+
+ for_each_zone (zone) {
+ unsigned long pfn, max_zone_pfn;
+
+ if (!is_highmem(zone))
+ continue;
+
+ mark_free_pages(zone);
+ max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+ if (saveable_highmem_page(pfn))
+ n++;
+ }
+ return n;
+}
+#else
+static inline void *saveable_highmem_page(unsigned long pfn) { return NULL; }
+static inline unsigned int count_highmem_pages(void) { return 0; }
+#endif /* CONFIG_HIGHMEM */
+
/**
* pfn_is_nosave - check if given pfn is in the 'nosave' section
*/
@@ -677,12 +656,12 @@ static inline int pfn_is_nosave(unsigned
}

/**
- * saveable - Determine whether a page should be cloned or not.
- * @pfn: The page
+ * saveable - Determine whether a non-highmem page should be included in
+ * the suspend image.
*
- * We save a page if it isn't Nosave, and is not in the range of pages
- * statically defined as 'unsaveable', and it
- * isn't a part of a free chunk of pages.
+ * We should save the page if it isn't Nosave, and is not in the range
+ * of pages statically defined as 'unsaveable', and it isn't a part of
+ * a free chunk of pages.
*/

static struct page *saveable_page(unsigned long pfn)
@@ -694,16 +673,22 @@ static struct page *saveable_page(unsign

page = pfn_to_page(pfn);

- if (PageNosave(page))
+ BUG_ON(PageHighMem(page));
+
+ if (PageNosave(page) || PageNosaveFree(page))
return NULL;
+
if (PageReserved(page) && pfn_is_nosave(pfn))
return NULL;
- if (PageNosaveFree(page))
- return NULL;

return page;
}

+/**
+ * count_data_pages - compute the total number of saveable non-highmem
+ * pages.
+ */
+
unsigned int count_data_pages(void)
{
struct zone *zone;
@@ -713,23 +698,76 @@ unsigned int count_data_pages(void)
for_each_zone (zone) {
if (is_highmem(zone))
continue;
+
mark_free_pages(zone);
max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
- n += !!saveable_page(pfn);
+ if(saveable_page(pfn))
+ n++;
}
return n;
}

-static inline void copy_data_page(long *dst, long *src)
+/* This is needed, because copy_page and memcpy are not usable for copying
+ * task structs.
+ */
+static inline void do_copy_page(long *dst, long *src)
{
int n;

- /* copy_page and memcpy are not usable for copying task structs. */
for (n = PAGE_SIZE / sizeof(long); n; n--)
*dst++ = *src++;
}

+#ifdef CONFIG_HIGHMEM
+static inline struct page *
+page_is_saveable(struct zone *zone, unsigned long pfn)
+{
+ return is_highmem(zone) ?
+ saveable_highmem_page(pfn) : saveable_page(pfn);
+}
+
+static inline void
+copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
+{
+ struct page *s_page, *d_page;
+ void *src, *dst;
+
+ s_page = pfn_to_page(src_pfn);
+ d_page = pfn_to_page(dst_pfn);
+ if (PageHighMem(s_page)) {
+ src = kmap_atomic(s_page, KM_USER0);
+ dst = kmap_atomic(d_page, KM_USER1);
+ do_copy_page(dst, src);
+ kunmap_atomic(src, KM_USER0);
+ kunmap_atomic(dst, KM_USER1);
+ } else {
+ src = page_address(s_page);
+ if (PageHighMem(d_page)) {
+ /* Page pointed to by src may contain some kernel
+ * data modified by kmap_atomic()
+ */
+ do_copy_page(buffer, src);
+ dst = kmap_atomic(pfn_to_page(dst_pfn), KM_USER0);
+ memcpy(dst, buffer, PAGE_SIZE);
+ kunmap_atomic(dst, KM_USER0);
+ } else {
+ dst = page_address(d_page);
+ do_copy_page(dst, src);
+ }
+ }
+}
+#else
+#define page_is_saveable(zone, pfn) saveable_page(pfn)
+
+static inline void
+copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
+{
+ do_copy_page(page_address(pfn_to_page(dst_pfn)),
+ page_address(pfn_to_page(src_pfn)));
+}
+#endif /* CONFIG_HIGHMEM */
+
static void
copy_data_pages(struct memory_bitmap *copy_bm, struct memory_bitmap *orig_bm)
{
@@ -739,31 +777,26 @@ copy_data_pages(struct memory_bitmap *co
for_each_zone (zone) {
unsigned long max_zone_pfn;

- if (is_highmem(zone))
- continue;
-
mark_free_pages(zone);
max_zone_pfn = zone->zone_start_pfn + zone->spanned_pages;
for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
- if (saveable_page(pfn))
+ if (page_is_saveable(zone, pfn))
memory_bm_set_bit(orig_bm, pfn);
}
bm_position_reset(orig_bm);
bm_position_reset(copy_bm);
do {
pfn = memory_bm_next_pfn(orig_bm);
- if (likely(pfn != BM_END_OF_MAP)) {
- struct page *page;
- void *src;
-
- page = pfn_to_page(pfn);
- src = page_address(page);
- page = pfn_to_page(memory_bm_next_pfn(copy_bm));
- copy_data_page(page_address(page), src);
- }
+ if (likely(pfn != BM_END_OF_MAP))
+ copy_data_page(memory_bm_next_pfn(copy_bm), pfn);
} while (pfn != BM_END_OF_MAP);
}

+/* Total number of image pages */
+static unsigned int nr_copy_pages;
+/* Number of pages needed for saving the original pfns of the image pages */
+static unsigned int nr_meta_pages;
+
/**
* swsusp_free - free pages allocated for the suspend.
*
@@ -785,7 +818,7 @@ void swsusp_free(void)
if (PageNosave(page) && PageNosaveFree(page)) {
ClearPageNosave(page);
ClearPageNosaveFree(page);
- free_page((long) page_address(page));
+ __free_page(page);
}
}
}
@@ -795,34 +828,108 @@ void swsusp_free(void)
buffer = NULL;
}

+#ifdef CONFIG_HIGHMEM
+/**
+ * count_pages_for_highmem - compute the number of non-highmem pages
+ * that will be necessary for creating copies of highmem pages.
+ */
+
+static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
+{
+ unsigned int free_highmem = count_free_highmem_pages();
+
+ if (free_highmem >= nr_highmem)
+ nr_highmem = 0;
+ else
+ nr_highmem -= free_highmem;
+
+ return nr_highmem;
+}
+#else
+static unsigned int
+count_pages_for_highmem(unsigned int nr_highmem) { return 0; }
+#endif /* CONFIG_HIGHMEM */

/**
- * enough_free_mem - Make sure we enough free memory to snapshot.
- *
- * Returns TRUE or FALSE after checking the number of available
- * free pages.
+ * enough_free_mem - Make sure we have enough free memory for the
+ * snapshot image.
*/

-static int enough_free_mem(unsigned int nr_pages)
+static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
{
struct zone *zone;
unsigned int free = 0, meta = 0;

- for_each_zone (zone)
- if (!is_highmem(zone)) {
+ for_each_zone (zone) {
+ meta += snapshot_additional_pages(zone);
+ if (!is_highmem(zone))
free += zone->free_pages;
- meta += snapshot_additional_pages(zone);
- }
+ }

+ nr_pages += count_pages_for_highmem(nr_highmem);
pr_debug("swsusp: pages needed: %u + %u + %u, available pages: %u\n",
nr_pages, PAGES_FOR_IO, meta, free);

return free > nr_pages + PAGES_FOR_IO + meta;
}

+#ifdef CONFIG_HIGHMEM
+/**
+ * get_highmem_buffer - if there are some highmem pages in the suspend
+ * image, we may need the buffer to copy them and/or load their data.
+ */
+
+static inline int get_highmem_buffer(int safe_needed)
+{
+ buffer = get_image_page(GFP_ATOMIC, safe_needed);
+ return buffer ? 0 : -ENOMEM;
+}
+
+/**
+ * alloc_highmem_image_pages - allocate some highmem pages for the image.
+ * Try to allocate as many pages as needed, but if the number of free
+ * highmem pages is lesser than that, allocate them all.
+ */
+
+static inline unsigned int
+alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
+{
+ unsigned int to_alloc = count_free_highmem_pages();
+
+ if (to_alloc > nr_highmem)
+ to_alloc = nr_highmem;
+
+ nr_highmem -= to_alloc;
+ while (to_alloc-- > 0) {
+ struct page *page;
+
+ page = alloc_image_page(__GFP_HIGHMEM);
+ memory_bm_set_bit(bm, page_to_pfn(page));
+ }
+ return nr_highmem;
+}
+#else
+static inline int get_highmem_buffer(int safe_needed) { return 0; }
+
+static inline unsigned int
+alloc_highmem_image_pages(struct memory_bitmap *bm, unsigned int n) { return 0; }
+#endif /* CONFIG_HIGHMEM */
+
+/**
+ * swsusp_alloc - allocate memory for the suspend image
+ *
+ * We first try to allocate as many highmem pages as there are
+ * saveable highmem pages in the system. If that fails, we allocate
+ * non-highmem pages for the copies of the remaining highmem ones.
+ *
+ * In this approach it is likely that the copies of highmem pages will
+ * also be located in the high memory, because of the way in which
+ * copy_data_pages() works.
+ */
+
static int
swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm,
- unsigned int nr_pages)
+ unsigned int nr_pages, unsigned int nr_highmem)
{
int error;

@@ -834,13 +941,17 @@ swsusp_alloc(struct memory_bitmap *orig_
if (error)
goto Free;

+ error = get_highmem_buffer(0);
+ if (error)
+ goto Free;
+
+ nr_pages += alloc_highmem_image_pages(copy_bm, nr_highmem);
while (nr_pages-- > 0) {
- struct page *page = alloc_page(GFP_ATOMIC);
+ struct page *page = alloc_image_page(GFP_ATOMIC);
+
if (!page)
goto Free;

- SetPageNosave(page);
- SetPageNosaveFree(page);
memory_bm_set_bit(copy_bm, page_to_pfn(page));
}
return 0;
@@ -850,30 +961,39 @@ Free:
return -ENOMEM;
}

-/* Memory bitmap used for marking saveable pages */
+/* Memory bitmap used for marking saveable pages (during suspend) or the
+ * suspend image pages (during resume)
+ */
static struct memory_bitmap orig_bm;
-/* Memory bitmap used for marking allocated pages that will contain the copies
- * of saveable pages
+/* Memory bitmap used on suspend for marking allocated pages that will contain
+ * the copies of saveable pages. During resume it is initially used for
+ * marking the suspend image pages, but then its set bits are duplicated in
+ * @orig_bm and it is released. Next, on systems with high memory, it may be
+ * used for marking "safe" highmem pages, but it has to be reinitialized for
+ * this purpose.
*/
static struct memory_bitmap copy_bm;

asmlinkage int swsusp_save(void)
{
- unsigned int nr_pages;
+ unsigned int nr_pages, nr_highmem;

- pr_debug("swsusp: critical section: \n");
+ printk("swsusp: critical section: \n");

drain_local_pages();
nr_pages = count_data_pages();
- printk("swsusp: Need to copy %u pages\n", nr_pages);
+ nr_highmem = count_highmem_pages();
+ printk("swsusp: Need to copy %u pages\n", nr_pages + nr_highmem);

- if (!enough_free_mem(nr_pages)) {
+ if (!enough_free_mem(nr_pages, nr_highmem)) {
printk(KERN_ERR "swsusp: Not enough free memory\n");
return -ENOMEM;
}

- if (swsusp_alloc(&orig_bm, &copy_bm, nr_pages))
+ if (swsusp_alloc(&orig_bm, &copy_bm, nr_pages, nr_highmem)) {
+ printk(KERN_ERR "swsusp: Memory allocation failed\n");
return -ENOMEM;
+ }

/* During allocating of suspend pagedir, new cold pages may appear.
* Kill them.
@@ -887,10 +1007,12 @@ asmlinkage int swsusp_save(void)
* touch swap space! Except we must write out our image of course.
*/

+ nr_pages += nr_highmem;
nr_copy_pages = nr_pages;
- nr_meta_pages = (nr_pages * sizeof(long) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ nr_meta_pages = DIV_ROUND_UP(nr_pages * sizeof(long), PAGE_SIZE);

printk("swsusp: critical section/: done (%d pages copied)\n", nr_pages);
+
return 0;
}

@@ -953,7 +1075,7 @@ int snapshot_read_next(struct snapshot_h

if (!buffer) {
/* This makes the buffer be freed by swsusp_free() */
- buffer = alloc_image_page(GFP_ATOMIC, 0);
+ buffer = get_image_page(GFP_ATOMIC, 0);
if (!buffer)
return -ENOMEM;
}
@@ -968,9 +1090,23 @@ int snapshot_read_next(struct snapshot_h
memset(buffer, 0, PAGE_SIZE);
pack_pfns(buffer, &orig_bm);
} else {
- unsigned long pfn = memory_bm_next_pfn(&copy_bm);
+ struct page *page;

- handle->buffer = page_address(pfn_to_page(pfn));
+ page = pfn_to_page(memory_bm_next_pfn(&copy_bm));
+ if (PageHighMem(page)) {
+ /* Highmem pages are copied to the buffer,
+ * because we can't return with a kmapped
+ * highmem page (we may not be called again).
+ */
+ void *kaddr;
+
+ kaddr = kmap_atomic(page, KM_USER0);
+ memcpy(buffer, kaddr, PAGE_SIZE);
+ kunmap_atomic(kaddr, KM_USER0);
+ handle->buffer = buffer;
+ } else {
+ handle->buffer = page_address(page);
+ }
}
handle->prev = handle->cur;
}
@@ -1094,6 +1230,214 @@ unpack_orig_pfns(unsigned long *buf, str
}
}

+/* List of "safe" pages that may be used to store data loaded from the suspend
+ * image
+ */
+static struct linked_page *safe_pages_list;
+
+#ifdef CONFIG_HIGHMEM
+/* struct highmem_pbe is used for creating the list of highmem pages that
+ * should be restored atomically during the resume from disk, because the page
+ * frames they have occupied before the suspend are in use.
+ */
+struct highmem_pbe {
+ struct page *copy_page; /* data is here now */
+ struct page *orig_page; /* data was here before the suspend */
+ struct highmem_pbe *next;
+};
+
+/* List of highmem PBEs needed for restoring the highmem pages that were
+ * allocated before the suspend and included in the suspend image, but have
+ * also been allocated by the "resume" kernel, so their contents cannot be
+ * written directly to their "original" page frames.
+ */
+static struct highmem_pbe *highmem_pblist;
+
+/**
+ * count_highmem_image_pages - compute the number of highmem pages in the
+ * suspend image. The bits in the memory bitmap @bm that correspond to the
+ * image pages are assumed to be set.
+ */
+
+static unsigned int count_highmem_image_pages(struct memory_bitmap *bm)
+{
+ unsigned long pfn;
+ unsigned int cnt = 0;
+
+ bm_position_reset(bm);
+ pfn = memory_bm_next_pfn(bm);
+ while (pfn != BM_END_OF_MAP) {
+ if (PageHighMem(pfn_to_page(pfn)))
+ cnt++;
+
+ pfn = memory_bm_next_pfn(bm);
+ }
+ return cnt;
+}
+
+/**
+ * prepare_highmem_image - try to allocate as many highmem pages as
+ * there are highmem image pages (@nr_highmem_p points to the variable
+ * containing the number of highmem image pages). The pages that are
+ * "safe" (ie. will not be overwritten when the suspend image is
+ * restored) have the corresponding bits set in @bm (it must be
+ * unitialized).
+ *
+ * NOTE: This function should not be called if there are no highmem
+ * image pages.
+ */
+
+static unsigned int safe_highmem_pages;
+
+static struct memory_bitmap *safe_highmem_bm;
+
+static int
+prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p)
+{
+ unsigned int to_alloc;
+
+ if (create_memory_bm(bm, GFP_ATOMIC, 1))
+ return -ENOMEM;
+
+ if (get_highmem_buffer(1))
+ return -ENOMEM;
+
+ to_alloc = count_free_highmem_pages();
+ if (to_alloc > *nr_highmem_p)
+ to_alloc = *nr_highmem_p;
+
+ *nr_highmem_p = to_alloc;
+ safe_highmem_pages = 0;
+ while (to_alloc-- > 0) {
+ struct page *page;
+
+ page = alloc_page(__GFP_HIGHMEM);
+ if (!PageNosaveFree(page)) {
+ /* The page is "safe", set its bit the bitmap */
+ memory_bm_set_bit(bm, page_to_pfn(page));
+ safe_highmem_pages++;
+ }
+ /* Mark the page as allocated */
+ SetPageNosave(page);
+ SetPageNosaveFree(page);
+ }
+ bm_position_reset(bm);
+ safe_highmem_bm = bm;
+ return 0;
+}
+
+/**
+ * get_highmem_page_buffer - for given highmem image page find the buffer
+ * that suspend_write_next() should set for its caller to write to.
+ *
+ * If the page is to be saved to its "original" page frame or a copy of
+ * the page is to be made in the highmem, @buffer is returned. Otherwise,
+ * the copy of the page is to be made in normal memory, so the address of
+ * the copy is returned.
+ *
+ * If @buffer is returned, the caller of suspend_write_next() will write
+ * the page's contents to @buffer, so they will have to be copied to the
+ * right location on the next call to suspend_write_next() and it is done
+ * with the help of copy_last_highmem_page(). For this purpose, if
+ * @buffer is returned, @last_highmem page is set to the page to which
+ * the data will have to be copied from @buffer.
+ */
+
+static struct page *last_highmem_page;
+
+static void *
+get_highmem_page_buffer(struct page *page, struct chain_allocator *ca)
+{
+ struct highmem_pbe *pbe;
+ void *kaddr;
+
+ if (PageNosave(page) && PageNosaveFree(page)) {
+ /* We have allocated the "original" page frame and we can
+ * use it directly to store the loaded page.
+ */
+ last_highmem_page = page;
+ return buffer;
+ }
+ /* The "original" page frame has not been allocated and we have to
+ * use a "safe" page frame to store the loaded page.
+ */
+ pbe = chain_alloc(ca, sizeof(struct highmem_pbe));
+ if (!pbe) {
+ swsusp_free();
+ return NULL;
+ }
+ pbe->orig_page = page;
+ if (safe_highmem_pages > 0) {
+ struct page *tmp;
+
+ /* Copy of the page will be stored in high memory */
+ kaddr = buffer;
+ tmp = pfn_to_page(memory_bm_next_pfn(safe_highmem_bm));
+ safe_highmem_pages--;
+ last_highmem_page = tmp;
+ pbe->copy_page = tmp;
+ } else {
+ /* Copy of the page will be stored in normal memory */
+ kaddr = safe_pages_list;
+ safe_pages_list = safe_pages_list->next;
+ pbe->copy_page = virt_to_page(kaddr);
+ }
+ pbe->next = highmem_pblist;
+ highmem_pblist = pbe;
+ return kaddr;
+}
+
+/**
+ * copy_last_highmem_page - copy the contents of a highmem image from
+ * @buffer, where the caller of snapshot_write_next() has place them,
+ * to the right location represented by @last_highmem_page .
+ */
+
+static void copy_last_highmem_page(void)
+{
+ if (last_highmem_page) {
+ void *dst;
+
+ dst = kmap_atomic(last_highmem_page, KM_USER0);
+ memcpy(dst, buffer, PAGE_SIZE);
+ kunmap_atomic(dst, KM_USER0);
+ last_highmem_page = NULL;
+ }
+}
+
+static inline int last_highmem_page_copied(void)
+{
+ return !last_highmem_page;
+}
+
+static inline void free_highmem_data(void)
+{
+ free_memory_bm(safe_highmem_bm, 1);
+ free_image_page(buffer, 1);
+}
+#else
+static inline int get_safe_write_buffer(void) { return 0; }
+
+static unsigned int
+count_highmem_image_pages(struct memory_bitmap *bm) { return 0; }
+
+static inline int
+prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p)
+{
+ return 0;
+}
+
+static inline void *
+get_highmem_page_buffer(struct page *page, struct chain_allocator *ca)
+{
+ return NULL;
+}
+
+static inline void copy_last_highmem_page(void) {}
+static inline int last_highmem_page_copied(void) { return 1; }
+static inline void free_highmem_data(void) {}
+#endif /* CONFIG_HIGHMEM */
+
/**
* prepare_image - use the memory bitmap @bm to mark the pages that will
* be overwritten in the process of restoring the system memory state
@@ -1103,20 +1447,24 @@ unpack_orig_pfns(unsigned long *buf, str
* The idea is to allocate a new memory bitmap first and then allocate
* as many pages as needed for the image data, but not to assign these
* pages to specific tasks initially. Instead, we just mark them as
- * allocated and create a list of "safe" pages that will be used later.
+ * allocated and create a lists of "safe" pages that will be used
+ * later. On systems with high memory a list of "safe" highmem pages is
+ * also created.
*/

#define PBES_PER_LINKED_PAGE (LINKED_PAGE_DATA_SIZE / sizeof(struct pbe))

-static struct linked_page *safe_pages_list;
-
static int
prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
{
- unsigned int nr_pages;
+ unsigned int nr_pages, nr_highmem;
struct linked_page *sp_list, *lp;
int error;

+ /* If there is no highmem, the buffer will not be necessary */
+ free_image_page(buffer, 1);
+
+ nr_highmem = count_highmem_image_pages(bm);
error = mark_unsafe_pages(bm);
if (error)
goto Free;
@@ -1127,6 +1475,11 @@ prepare_image(struct memory_bitmap *new_

duplicate_memory_bitmap(new_bm, bm);
free_memory_bm(bm, 0);
+ if (nr_highmem > 0) {
+ error = prepare_highmem_image(bm, &nr_highmem);
+ if (error)
+ goto Free;
+ }
/* Reserve some safe pages for potential later use.
*
* NOTE: This way we make sure there will be enough safe pages for the
@@ -1135,10 +1488,10 @@ prepare_image(struct memory_bitmap *new_
*/
sp_list = NULL;
/* nr_copy_pages cannot be lesser than allocated_unsafe_pages */
- nr_pages = nr_copy_pages - allocated_unsafe_pages;
+ nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages;
nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE);
while (nr_pages > 0) {
- lp = alloc_image_page(GFP_ATOMIC, 1);
+ lp = get_image_page(GFP_ATOMIC, 1);
if (!lp) {
error = -ENOMEM;
goto Free;
@@ -1149,7 +1502,7 @@ prepare_image(struct memory_bitmap *new_
}
/* Preallocate memory for the image */
safe_pages_list = NULL;
- nr_pages = nr_copy_pages - allocated_unsafe_pages;
+ nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages;
while (nr_pages > 0) {
lp = (struct linked_page *)get_zeroed_page(GFP_ATOMIC);
if (!lp) {
@@ -1189,6 +1542,9 @@ static void *get_buffer(struct memory_bi
struct pbe *pbe;
struct page *page = pfn_to_page(memory_bm_next_pfn(bm));

+ if (PageHighMem(page))
+ return get_highmem_page_buffer(page, ca);
+
if (PageNosave(page) && PageNosaveFree(page))
/* We have allocated the "original" page frame and we can
* use it directly to store the loaded page.
@@ -1203,8 +1559,8 @@ static void *get_buffer(struct memory_bi
swsusp_free();
return NULL;
}
- pbe->orig_address = (unsigned long)page_address(page);
- pbe->address = (unsigned long)safe_pages_list;
+ pbe->orig_address = page_address(page);
+ pbe->address = safe_pages_list;
safe_pages_list = safe_pages_list->next;
pbe->next = pagedir_nosave;
pagedir_nosave = pbe;
@@ -1242,14 +1598,16 @@ int snapshot_write_next(struct snapshot_
if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages)
return 0;

- if (!buffer) {
- /* This makes the buffer be freed by swsusp_free() */
- buffer = alloc_image_page(GFP_ATOMIC, 0);
+ if (handle->offset == 0) {
+ if (!buffer)
+ /* This makes the buffer be freed by swsusp_free() */
+ buffer = get_image_page(GFP_ATOMIC, 0);
+
if (!buffer)
return -ENOMEM;
- }
- if (!handle->offset)
+
handle->buffer = buffer;
+ }
handle->sync_read = 1;
if (handle->prev < handle->cur) {
if (handle->prev == 0) {
@@ -1277,6 +1635,7 @@ int snapshot_write_next(struct snapshot_
return -ENOMEM;
}
} else {
+ copy_last_highmem_page();
handle->buffer = get_buffer(&orig_bm, &ca);
}
handle->prev = handle->cur;
@@ -1293,15 +1652,70 @@ int snapshot_write_next(struct snapshot_
return count;
}

+/**
+ * snapshot_write_finalize - must be called after the last call to
+ * snapshot_write_next() in case the last page in the image happens
+ * to be a highmem page and its contents should be stored in the
+ * highmem. Additionally, it releases the memory that will not be
+ * used any more.
+ */
+
+void snapshot_write_finalize(struct snapshot_handle *handle)
+{
+ copy_last_highmem_page();
+ /* Free only if we have loaded the image entirely */
+ if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages) {
+ free_memory_bm(&orig_bm, 1);
+ free_highmem_data();
+ }
+}
+
int snapshot_image_loaded(struct snapshot_handle *handle)
{
- return !(!nr_copy_pages ||
+ return !(!nr_copy_pages || !last_highmem_page_copied() ||
handle->cur <= nr_meta_pages + nr_copy_pages);
}

-void snapshot_free_unused_memory(struct snapshot_handle *handle)
+#ifdef CONFIG_HIGHMEM
+/* Assumes that @buf is ready and points to a "safe" page */
+static inline void
+swap_two_pages_data(struct page *p1, struct page *p2, void *buf)
{
- /* Free only if we have loaded the image entirely */
- if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages)
- free_memory_bm(&orig_bm, 1);
+ void *kaddr1, *kaddr2;
+
+ kaddr1 = kmap_atomic(p1, KM_USER0);
+ kaddr2 = kmap_atomic(p2, KM_USER1);
+ memcpy(buf, kaddr1, PAGE_SIZE);
+ memcpy(kaddr1, kaddr2, PAGE_SIZE);
+ memcpy(kaddr2, buf, PAGE_SIZE);
+ kunmap_atomic(kaddr1, KM_USER0);
+ kunmap_atomic(kaddr2, KM_USER1);
+}
+
+/**
+ * restore_highmem - for each highmem page that was allocated before
+ * the suspend and included in the suspend image, and also has been
+ * allocated by the "resume" kernel swap its current (ie. "before
+ * resume") contents with the previous (ie. "before suspend") one.
+ *
+ * If the resume eventually fails, we can call this function once
+ * again and restore the "before resume" highmem state.
+ */
+
+int restore_highmem(void)
+{
+ struct highmem_pbe *pbe = highmem_pblist;
+ void *buf;
+
+ buf = get_image_page(GFP_ATOMIC, 1);
+ if (!buf)
+ return -ENOMEM;
+
+ while (pbe) {
+ swap_two_pages_data(pbe->copy_page, pbe->orig_page, buf);
+ pbe = pbe->next;
+ }
+ free_image_page(buf, 1);
+ return 0;
}
+#endif /* CONFIG_HIGHMEM */
Index: linux-2.6.18-rc3-mm2/kernel/power/power.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/power.h
+++ linux-2.6.18-rc3-mm2/kernel/power/power.h
@@ -104,8 +104,8 @@ struct snapshot_handle {
extern unsigned int snapshot_additional_pages(struct zone *zone);
extern int snapshot_read_next(struct snapshot_handle *handle, size_t count);
extern int snapshot_write_next(struct snapshot_handle *handle, size_t count);
+extern void snapshot_write_finalize(struct snapshot_handle *handle);
extern int snapshot_image_loaded(struct snapshot_handle *handle);
-extern void snapshot_free_unused_memory(struct snapshot_handle *handle);

#define SNAPSHOT_IOC_MAGIC '3'
#define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
Index: linux-2.6.18-rc3-mm2/kernel/power/swap.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/swap.c
+++ linux-2.6.18-rc3-mm2/kernel/power/swap.c
@@ -547,7 +547,7 @@ static int load_image(struct swap_map_ha
error = err2;
if (!error) {
printk("\b\b\b\bdone\n");
- snapshot_free_unused_memory(snapshot);
+ snapshot_write_finalize(snapshot);
if (!snapshot_image_loaded(snapshot))
error = -ENODATA;
}
Index: linux-2.6.18-rc3-mm2/kernel/power/user.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/user.c
+++ linux-2.6.18-rc3-mm2/kernel/power/user.c
@@ -188,12 +188,12 @@ static int snapshot_ioctl(struct inode *
break;

case SNAPSHOT_ATOMIC_RESTORE:
+ snapshot_write_finalize(&data->handle);
if (data->mode != O_WRONLY || !data->frozen ||
!snapshot_image_loaded(&data->handle)) {
error = -EPERM;
break;
}
- snapshot_free_unused_memory(&data->handle);
down(&pm_sem);
pm_prepare_console();
error = device_suspend(PMSG_PRETHAW);
Index: linux-2.6.18-rc3-mm2/kernel/power/swsusp.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/kernel/power/swsusp.c
+++ linux-2.6.18-rc3-mm2/kernel/power/swsusp.c
@@ -64,10 +64,8 @@ int in_suspend __nosavedata = 0;

#ifdef CONFIG_HIGHMEM
unsigned int count_highmem_pages(void);
-int save_highmem(void);
int restore_highmem(void);
#else
-static inline int save_highmem(void) { return 0; }
static inline int restore_highmem(void) { return 0; }
static inline unsigned int count_highmem_pages(void) { return 0; }
#endif
@@ -184,7 +182,7 @@ static inline unsigned long __shrink_mem

int swsusp_shrink_memory(void)
{
- long size, tmp;
+ long tmp;
struct zone *zone;
unsigned long pages = 0;
unsigned int i = 0;
@@ -192,15 +190,27 @@ int swsusp_shrink_memory(void)

printk("Shrinking memory... ");
do {
- size = 2 * count_highmem_pages();
- size += size / 50 + count_data_pages() + PAGES_FOR_IO;
+ long size, highmem_size;
+
+ highmem_size = count_highmem_pages();
+ size = count_data_pages() + PAGES_FOR_IO;
tmp = size;
for_each_zone (zone)
- if (!is_highmem(zone) && populated_zone(zone)) {
- tmp -= zone->free_pages;
- tmp += zone->lowmem_reserve[ZONE_NORMAL];
- tmp += snapshot_additional_pages(zone);
+ if (populated_zone(zone)) {
+ if (is_highmem(zone)) {
+ highmem_size -= zone->free_pages;
+ } else {
+ tmp -= zone->free_pages;
+ tmp += zone->lowmem_reserve[ZONE_NORMAL];
+ tmp += snapshot_additional_pages(zone);
+ }
}
+
+ if (highmem_size < 0)
+ highmem_size = 0;
+
+ size += highmem_size;
+ tmp += highmem_size;
if (tmp > 0) {
tmp = __shrink_memory(tmp);
if (!tmp)
@@ -223,6 +233,7 @@ int swsusp_suspend(void)

if ((error = arch_prepare_suspend()))
return error;
+
local_irq_disable();
/* At this point, device_suspend() has been called, but *not*
* device_power_down(). We *must* device_power_down() now.
@@ -235,18 +246,11 @@ int swsusp_suspend(void)
goto Enable_irqs;
}

- if ((error = save_highmem())) {
- printk(KERN_ERR "swsusp: Not enough free pages for highmem\n");
- goto Restore_highmem;
- }
-
save_processor_state();
if ((error = swsusp_arch_suspend()))
printk(KERN_ERR "Error %d suspending\n", error);
/* Restore control flow magically appears here */
restore_processor_state();
-Restore_highmem:
- restore_highmem();
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
@@ -268,18 +272,23 @@ int swsusp_resume(void)
printk(KERN_ERR "Some devices failed to power down, very bad\n");
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
- error = swsusp_arch_resume();
- /* Code below is only ever reached in case of failure. Otherwise
- * execution continues at place where swsusp_arch_suspend was called
- */
- BUG_ON(!error);
+ error = restore_highmem();
+ if (!error) {
+ error = swsusp_arch_resume();
+ /* The code below is only ever reached in case of a failure.
+ * Otherwise execution continues at place where
+ * swsusp_arch_suspend() was called
+ */
+ BUG_ON(!error);
+ /* This call to restore_highmem() undos the previous one */
+ restore_highmem();
+ }
/* The only reason why swsusp_arch_resume() can fail is memory being
* very tight, so we have to free it as soon as we can to avoid
* subsequent failures
*/
swsusp_free();
restore_processor_state();
- restore_highmem();
touch_softlockup_watchdog();
device_power_up();
local_irq_enable();
Index: linux-2.6.18-rc3-mm2/mm/vmscan.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/vmscan.c
+++ linux-2.6.18-rc3-mm2/mm/vmscan.c
@@ -1213,6 +1213,9 @@ out:
}
if (!all_zones_ok) {
cond_resched();
+
+ try_to_freeze();
+
goto loop_again;
}

Index: linux-2.6.18-rc3-mm2/include/linux/suspend.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/include/linux/suspend.h
+++ linux-2.6.18-rc3-mm2/include/linux/suspend.h
@@ -9,13 +9,13 @@
#include <linux/init.h>
#include <linux/pm.h>

-/* struct pbe is used for creating the list of pages that should be restored
+/* struct pbe is used for creating lists of pages that should be restored
* atomically during the resume from disk, because the page frames they have
* occupied before the suspend are in use.
*/
struct pbe {
- unsigned long address; /* address of the copy */
- unsigned long orig_address; /* original address of page */
+ void *address; /* address of the copy */
+ void *orig_address; /* original address of a page */
struct pbe *next;
};

2006-08-09 10:31:34

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi!

> Introduce the memory bitmap data structure and make swsusp use in the suspend
> phase.
>
> The current swsusp's internal data structure is not very efficient from the
> memory usage point of view, so it seems reasonable to replace it with a data
> structure that will require less memory, such as a pair of bitmaps.

Well, 500 lines of code for what... 0.25% bigger image? I see it
enables you to do some cleanups... but could we get those cleanups
without those 500 lines? :-).


> kernel/power/power.h | 3
> kernel/power/snapshot.c | 605 ++++++++++++++++++++++++++++++++++++++++++------
> kernel/power/swsusp.c | 5
> 3 files changed, 542 insertions(+), 71 deletions(-)

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 10:34:58

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi!

Okay, I'm little out of time now, and I do not understand 2 and 3 in
the series.

> Make swsusp use memory bitmaps to store its internal information during the
> resume phase of the suspend-resume cycle.
>
> If the pfns of saveable pages are saved during the suspend phase instead of
> the kernel virtual addresses of these pages, we can use them during the resume
> phase directly to set the corresponding bits in a memory bitmap. Then, this
> bitmap is used to mark the page frames corresponding to the pages that were
> saveable before the suspend (aka "unsafe" page frames).
>
> Next, we allocate as many page frames as needed to store the entire suspend
> image and make sure that there will be some extra free "safe" page frames for
> the list of PBEs constructed later. Subsequently, the image is loaded and,
> if possible, the data loaded from it are written into their "original" page frames
> (ie. the ones they had occupied before the suspend). The image data that
> cannot be written into their "original" page frames are loaded into "safe" page
> frames and their "original" kernel virtual addresses, as well as the addresses
> of the "safe" pages containing their copies, are stored in a list of PBEs.
> Finally, the list of PBEs is used to copy the remaining image data into their
> "original" page frames (this is done atomically, by the architecture-dependent
> parts of swsusp).

So... if page in highmem is allocated during resume, you'll still need
to copy it during assembly "atomic copy", right? Unfortunately, our
assembler parts can't do it just now...?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 10:35:38

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 4/5] swsusp: Change the name of pagedir_nosave

On Wed 2006-08-09 12:11:34, Rafael J. Wysocki wrote:
> The name of the pagedir_nosave variable does not make sense any more, so it
> seems reasonable to change it to something more meaningful.

Yes, that name was wrong from day 0. ACK.

> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> arch/i386/power/swsusp.S | 2 +-
> arch/powerpc/kernel/swsusp_32.S | 4 ++--
> arch/x86_64/kernel/suspend_asm.S | 2 +-
> kernel/power/power.h | 2 --
> kernel/power/snapshot.c | 15 +++++++++------
> 5 files changed, 13 insertions(+), 12 deletions(-)
>
> Index: linux-2.6.18-rc3-mm2/arch/i386/power/swsusp.S
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/arch/i386/power/swsusp.S
> +++ linux-2.6.18-rc3-mm2/arch/i386/power/swsusp.S
> @@ -32,7 +32,7 @@ ENTRY(swsusp_arch_resume)
> movl $swsusp_pg_dir-__PAGE_OFFSET, %ecx
> movl %ecx, %cr3
>
> - movl pagedir_nosave, %edx
> + movl restore_pblist, %edx
> .p2align 4,,7
>
> copy_loop:
> Index: linux-2.6.18-rc3-mm2/arch/powerpc/kernel/swsusp_32.S
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/arch/powerpc/kernel/swsusp_32.S
> +++ linux-2.6.18-rc3-mm2/arch/powerpc/kernel/swsusp_32.S
> @@ -159,8 +159,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
> isync
>
> /* Load ptr the list of pages to copy in r3 */
> - lis r11,(pagedir_nosave - KERNELBASE)@h
> - ori r11,r11,pagedir_nosave@l
> + lis r11,(restore_pblist - KERNELBASE)@h
> + ori r11,r11,restore_pblist@l
> lwz r10,0(r11)
>
> /* Copy the pages. This is a very basic implementation, to
> Index: linux-2.6.18-rc3-mm2/arch/x86_64/kernel/suspend_asm.S
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/arch/x86_64/kernel/suspend_asm.S
> +++ linux-2.6.18-rc3-mm2/arch/x86_64/kernel/suspend_asm.S
> @@ -54,7 +54,7 @@ ENTRY(restore_image)
> movq %rcx, %cr3;
> movq %rax, %cr4; # turn PGE back on
>
> - movq pagedir_nosave(%rip), %rdx
> + movq restore_pblist(%rip), %rdx
> loop:
> testq %rdx, %rdx
> jz done
> Index: linux-2.6.18-rc3-mm2/kernel/power/power.h
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/kernel/power/power.h
> +++ linux-2.6.18-rc3-mm2/kernel/power/power.h
> @@ -38,8 +38,6 @@ extern struct subsystem power_subsys;
> /* References to section boundaries */
> extern const void __nosave_begin, __nosave_end;
>
> -extern struct pbe *pagedir_nosave;
> -
> /* Preferred image size in bytes (default 500 MB) */
> extern unsigned long image_size;
> extern int in_suspend;
> Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
> +++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
> @@ -38,8 +38,11 @@
> * the suspend and included in the suspend image, but have also been
> * allocated by the "resume" kernel, so their contents cannot be written
> * directly to their "original" page frames.
> + *
> + * NOTE: It cannot be static, because it is used by the architecture-dependent
> + * atomic restore code.
> */
> -struct pbe *pagedir_nosave;
> +struct pbe *restore_pblist;
>
> /* Pointer to an auxiliary buffer (1 page) */
> static void *buffer;
> @@ -824,7 +827,7 @@ void swsusp_free(void)
> }
> nr_copy_pages = 0;
> nr_meta_pages = 0;
> - pagedir_nosave = NULL;
> + restore_pblist = NULL;
> buffer = NULL;
> }
>
> @@ -1203,7 +1206,7 @@ load_header(struct swsusp_info *info)
> {
> int error;
>
> - pagedir_nosave = NULL;
> + restore_pblist = NULL;
> error = check_header(info);
> if (!error) {
> nr_copy_pages = info->image_pages;
> @@ -1562,8 +1565,8 @@ static void *get_buffer(struct memory_bi
> pbe->orig_address = page_address(page);
> pbe->address = safe_pages_list;
> safe_pages_list = safe_pages_list->next;
> - pbe->next = pagedir_nosave;
> - pagedir_nosave = pbe;
> + pbe->next = restore_pblist;
> + restore_pblist = pbe;
> return (void *)pbe->address;
> }
>
> @@ -1628,7 +1631,7 @@ int snapshot_write_next(struct snapshot_
>
> chain_init(&ca, GFP_ATOMIC, 1);
> bm_position_reset(&orig_bm);
> - pagedir_nosave = NULL;
> + restore_pblist = NULL;
> handle->sync_read = 0;
> handle->buffer = get_buffer(&orig_bm, &ca);
> if (!handle->buffer)

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 10:36:31

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 5/5] swsusp: Introduce some helpful constants

On Wed 2006-08-09 12:13:18, Rafael J. Wysocki wrote:
> Introduce some constants that hopefully will help improve the readability kernel/power/snapshot.c | 45 +++++++++++++++++++++++++--------------------
> 1 files changed, 25 insertions(+), 20 deletions(-)
>

You probably want to add signed-off-by:... aha, it is below the
patch. Some script went crazy inserting diff at wrong place. Otherwise
ACK.

Pavel

> Index: linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.18-rc3-mm2.orig/kernel/power/snapshot.c
> +++ linux-2.6.18-rc3-mm2/kernel/power/snapshot.c
> @@ -57,6 +57,11 @@ static void *buffer;
> * so that swsusp_free() can release it.
> */
>
> +#define PG_ANY 0
> +#define PG_SAFE 1
> +#define PG_UNSAFE_CLEAR 1
> +#define PG_UNSAFE_KEEP 0
> +
> static unsigned int allocated_unsafe_pages;
>
> static void *get_image_page(gfp_t gfp_mask, int safe_needed)
> @@ -357,7 +362,7 @@ create_memory_bm(struct memory_bitmap *b
> zone_bm = create_zone_bm_list(nr, &ca);
> bm->zone_bm_list = zone_bm;
> if (!zone_bm) {
> - chain_free(&ca, 1);
> + chain_free(&ca, PG_UNSAFE_CLEAR);
> return -ENOMEM;
> }
>
> @@ -410,7 +415,7 @@ create_memory_bm(struct memory_bitmap *b
>
> Free:
> bm->p_list = ca.chain;
> - free_memory_bm(bm, 1);
> + free_memory_bm(bm, PG_UNSAFE_CLEAR);
> return -ENOMEM;
> }
>
> @@ -936,15 +941,15 @@ swsusp_alloc(struct memory_bitmap *orig_
> {
> int error;
>
> - error = create_memory_bm(orig_bm, GFP_ATOMIC, 0);
> + error = create_memory_bm(orig_bm, GFP_ATOMIC, PG_ANY);
> if (error)
> goto Free;
>
> - error = create_memory_bm(copy_bm, GFP_ATOMIC, 0);
> + error = create_memory_bm(copy_bm, GFP_ATOMIC, PG_ANY);
> if (error)
> goto Free;
>
> - error = get_highmem_buffer(0);
> + error = get_highmem_buffer(PG_ANY);
> if (error)
> goto Free;
>
> @@ -1078,7 +1083,7 @@ int snapshot_read_next(struct snapshot_h
>
> if (!buffer) {
> /* This makes the buffer be freed by swsusp_free() */
> - buffer = get_image_page(GFP_ATOMIC, 0);
> + buffer = get_image_page(GFP_ATOMIC, PG_ANY);
> if (!buffer)
> return -ENOMEM;
> }
> @@ -1299,10 +1304,10 @@ prepare_highmem_image(struct memory_bitm
> {
> unsigned int to_alloc;
>
> - if (create_memory_bm(bm, GFP_ATOMIC, 1))
> + if (create_memory_bm(bm, GFP_ATOMIC, PG_SAFE))
> return -ENOMEM;
>
> - if (get_highmem_buffer(1))
> + if (get_highmem_buffer(PG_SAFE))
> return -ENOMEM;
>
> to_alloc = count_free_highmem_pages();
> @@ -1415,8 +1420,8 @@ static inline int last_highmem_page_copi
>
> static inline void free_highmem_data(void)
> {
> - free_memory_bm(safe_highmem_bm, 1);
> - free_image_page(buffer, 1);
> + free_memory_bm(safe_highmem_bm, PG_UNSAFE_CLEAR);
> + free_image_page(buffer, PG_UNSAFE_CLEAR);
> }
> #else
> static inline int get_safe_write_buffer(void) { return 0; }
> @@ -1465,19 +1470,19 @@ prepare_image(struct memory_bitmap *new_
> int error;
>
> /* If there is no highmem, the buffer will not be necessary */
> - free_image_page(buffer, 1);
> + free_image_page(buffer, PG_UNSAFE_CLEAR);
>
> nr_highmem = count_highmem_image_pages(bm);
> error = mark_unsafe_pages(bm);
> if (error)
> goto Free;
>
> - error = create_memory_bm(new_bm, GFP_ATOMIC, 1);
> + error = create_memory_bm(new_bm, GFP_ATOMIC, PG_SAFE);
> if (error)
> goto Free;
>
> duplicate_memory_bitmap(new_bm, bm);
> - free_memory_bm(bm, 0);
> + free_memory_bm(bm, PG_UNSAFE_KEEP);
> if (nr_highmem > 0) {
> error = prepare_highmem_image(bm, &nr_highmem);
> if (error)
> @@ -1494,7 +1499,7 @@ prepare_image(struct memory_bitmap *new_
> nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages;
> nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE);
> while (nr_pages > 0) {
> - lp = get_image_page(GFP_ATOMIC, 1);
> + lp = get_image_page(GFP_ATOMIC, PG_SAFE);
> if (!lp) {
> error = -ENOMEM;
> goto Free;
> @@ -1525,7 +1530,7 @@ prepare_image(struct memory_bitmap *new_
> /* Free the reserved safe pages so that chain_alloc() can use them */
> while (sp_list) {
> lp = sp_list->next;
> - free_image_page(sp_list, 1);
> + free_image_page(sp_list, PG_UNSAFE_CLEAR);
> sp_list = lp;
> }
> return 0;
> @@ -1604,7 +1609,7 @@ int snapshot_write_next(struct snapshot_
> if (handle->offset == 0) {
> if (!buffer)
> /* This makes the buffer be freed by swsusp_free() */
> - buffer = get_image_page(GFP_ATOMIC, 0);
> + buffer = get_image_page(GFP_ATOMIC, PG_ANY);
>
> if (!buffer)
> return -ENOMEM;
> @@ -1618,7 +1623,7 @@ int snapshot_write_next(struct snapshot_
> if (error)
> return error;
>
> - error = create_memory_bm(&copy_bm, GFP_ATOMIC, 0);
> + error = create_memory_bm(&copy_bm, GFP_ATOMIC, PG_ANY);
> if (error)
> return error;
>
> @@ -1668,7 +1673,7 @@ void snapshot_write_finalize(struct snap
> copy_last_highmem_page();
> /* Free only if we have loaded the image entirely */
> if (handle->prev && handle->cur > nr_meta_pages + nr_copy_pages) {
> - free_memory_bm(&orig_bm, 1);
> + free_memory_bm(&orig_bm, PG_UNSAFE_CLEAR);
> free_highmem_data();
> }
> }
> @@ -1710,7 +1715,7 @@ int restore_highmem(void)
> struct highmem_pbe *pbe = highmem_pblist;
> void *buf;
>
> - buf = get_image_page(GFP_ATOMIC, 1);
> + buf = get_image_page(GFP_ATOMIC, PG_SAFE);
> if (!buf)
> return -ENOMEM;
>
> @@ -1718,7 +1723,7 @@ int restore_highmem(void)
> swap_two_pages_data(pbe->copy_page, pbe->orig_page, buf);
> pbe = pbe->next;
> }
> - free_image_page(buf, 1);
> + free_image_page(buf, PG_UNSAFE_CLEAR);
> return 0;
> }
> #endif /* CONFIG_HIGHMEM */
>
> of the code.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:06:55

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 0/5] swsusp: Fix handling of highmem

Hi.

On Wednesday 09 August 2006 19:52, Rafael J. Wysocki wrote:
> Hi,
>
> Currently swsusp handles highmem in a simplistic way, by trying to store a
> copy of each saveable highmem page in the "normal" memory before creating
> the suspend image. These copies are then copied once again before saving,
> because they are treated as any other non-highmem pages with data. For
> this reason, to save one highmem page swsusp needs two free pages in the
> "normal" memory, so on a system with high memory it is practically
> impossible to create a suspend image above 400 kilobytes. Moreover, if
> there's much more highmem than the "normal" memory in the system, it may be
> impossible to suspend at all due to the lack of space for saving
> non-freeable highmem pages.
>
> This limitation may be overcome in a satisfactory way if swsusp does its
> best to store the copies of saveable highmem pages in the highmem itself.
> However, for this purpose swsusp has to be taught to use pfns, or (struct
> page *) pointers, instead of kernel virtual addresses to identify memory
> pages. Yet, if this is to be implemented, we can also attack the minor
> problem that the current swsusp's internal data structure, the list of page
> backup entries (aka PBEs), is not very efficient as far as the memory usage
> is concerned.
>
> This issue can be addressed by replacing the list of PBEs with a pair of
> memory bitmaps. Still, to remove the list of PBEs completely, we would
> have to make some complicated modifications to the architecture-dependent
> parts of the code which would be quite undesirable. However, we can use
> the observation that memory is only a limiting resource during the suspend
> phase of the suspend-resume cycle, because during the resume phase
> many image pages may be loaded directly to their "original" page frames, so
> we don't need to keep a copy of each of them in the memory. Thus the list
> of PBEs may be used safely in the last part of the resume phase without
> limitting the amount of memory we can use.
>
> The following series of patches introduces the memory bitmap data
> structure, makes swsusp use it to store its internal information and
> implements the improved handling of saveable highmem pages.
>
> Comments welcome.

Thanks for the reminder. I'd forgotten half the reason why I didn't want to
make Suspend2 into incremental patches! You're a brave man!

Nigel
--
while (1) {
size=$RANDOM * 65536 + 1
dd if=/dev/random bs=1 count=$size | patch -p0-b
make && break
}


Attachments:
(No filename) (2.46 kB)
(No filename) (189.00 B)
Download all attachments

2006-08-09 11:07:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi,

On Wednesday 09 August 2006 12:34, Pavel Machek wrote:
> Okay, I'm little out of time now, and I do not understand 2 and 3 in
> the series.

Well ...

> > Make swsusp use memory bitmaps to store its internal information during the
> > resume phase of the suspend-resume cycle.
> >
> > If the pfns of saveable pages are saved during the suspend phase instead of
> > the kernel virtual addresses of these pages, we can use them during the resume
> > phase directly to set the corresponding bits in a memory bitmap. Then, this
> > bitmap is used to mark the page frames corresponding to the pages that were
> > saveable before the suspend (aka "unsafe" page frames).
> >
> > Next, we allocate as many page frames as needed to store the entire suspend
> > image and make sure that there will be some extra free "safe" page frames for
> > the list of PBEs constructed later. Subsequently, the image is loaded and,
> > if possible, the data loaded from it are written into their "original" page frames
> > (ie. the ones they had occupied before the suspend). The image data that
> > cannot be written into their "original" page frames are loaded into "safe" page
> > frames and their "original" kernel virtual addresses, as well as the addresses
> > of the "safe" pages containing their copies, are stored in a list of PBEs.
> > Finally, the list of PBEs is used to copy the remaining image data into their
> > "original" page frames (this is done atomically, by the architecture-dependent
> > parts of swsusp).
>
> So... if page in highmem is allocated during resume, you'll still need
> to copy it during assembly "atomic copy", right?

No. It can be copied before the assembly gets called, because we are in the
kernel at that time which certainly is not in the highmem. :-)

> Unfortunately, our assembler parts can't do it just now...?

No, they can't, but that just isn't necessary. During the resume we create
two lists of PBEs - one for "normal" pages, and one for highmem pages. The
first one is handled by the "atomic copy" code as usual, but the second one
may be handled by some C code a bit earlier.

Rafael

2006-08-09 11:07:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi,

On Wednesday 09 August 2006 12:31, Pavel Machek wrote:
> > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > phase.
> >
> > The current swsusp's internal data structure is not very efficient from the
> > memory usage point of view, so it seems reasonable to replace it with a data
> > structure that will require less memory, such as a pair of bitmaps.
>
> Well, 500 lines of code for what... 0.25% bigger image? I see it
> enables you to do some cleanups... but could we get those cleanups
> without those 500 lines? :-).

Out of the 500 lines, something like 100 are comments and other 50 are
definitions of structures. ;-)

Seriously speaking, I could do that without the bitmaps, but the code wouldn't
be that much shorter. Apart from this, I would need to introduce yet another
type of PBEs (for storing pfns) and try not to get lost in the resulting mess.

Instead of doing this I prefer to add some extra code to set up a decent data
structure and just use it.

Rafael

2006-08-09 11:27:20

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi!

> > > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > > phase.
> > >
> > > The current swsusp's internal data structure is not very efficient from the
> > > memory usage point of view, so it seems reasonable to replace it with a data
> > > structure that will require less memory, such as a pair of bitmaps.
> >
> > Well, 500 lines of code for what... 0.25% bigger image? I see it
> > enables you to do some cleanups... but could we get those cleanups
> > without those 500 lines? :-).
>
> Out of the 500 lines, something like 100 are comments and other 50 are
> definitions of structures. ;-)

Yes, and of the 100 lines of comments, 10 are fixmes :-).

> Seriously speaking, I could do that without the bitmaps, but the code wouldn't
> be that much shorter. Apart from this, I would need to introduce yet another
> type of PBEs (for storing pfns) and try not to get lost in the resulting mess.
>
> Instead of doing this I prefer to add some extra code to set up a decent data
> structure and just use it.

Okay, I guess that if we need to change the structure anyway, we may
well use the effective structure...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:34:06

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi!

> > Okay, I'm little out of time now, and I do not understand 2 and 3 in
> > the series.
>
> Well ...
>
> > > Make swsusp use memory bitmaps to store its internal information during the
> > > resume phase of the suspend-resume cycle.
> > >
> > > If the pfns of saveable pages are saved during the suspend phase instead of
> > > the kernel virtual addresses of these pages, we can use them during the resume
> > > phase directly to set the corresponding bits in a memory bitmap. Then, this
> > > bitmap is used to mark the page frames corresponding to the pages that were
> > > saveable before the suspend (aka "unsafe" page frames).
> > >
> > > Next, we allocate as many page frames as needed to store the entire suspend
> > > image and make sure that there will be some extra free "safe" page frames for
> > > the list of PBEs constructed later. Subsequently, the image is loaded and,
> > > if possible, the data loaded from it are written into their "original" page frames
> > > (ie. the ones they had occupied before the suspend). The image data that
> > > cannot be written into their "original" page frames are loaded into "safe" page
> > > frames and their "original" kernel virtual addresses, as well as the addresses
> > > of the "safe" pages containing their copies, are stored in a list of PBEs.
> > > Finally, the list of PBEs is used to copy the remaining image data into their
> > > "original" page frames (this is done atomically, by the architecture-dependent
> > > parts of swsusp).
> >
> > So... if page in highmem is allocated during resume, you'll still need
> > to copy it during assembly "atomic copy", right?
>
> No. It can be copied before the assembly gets called, because we are in the
> kernel at that time which certainly is not in the highmem. :-)

Ahha, okay, nice trick.

> > Unfortunately, our assembler parts can't do it just now...?
>
> No, they can't, but that just isn't necessary. During the resume we create
> two lists of PBEs - one for "normal" pages, and one for highmem pages. The
> first one is handled by the "atomic copy" code as usual, but the second one
> may be handled by some C code a bit earlier.

I'm still not sure if highmem support is worth the complexity -- I
hope highmem dies painful death in next 3 weeks or so.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:37:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi,

On Wednesday 09 August 2006 13:27, Pavel Machek wrote:
> > > > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > > > phase.
> > > >
> > > > The current swsusp's internal data structure is not very efficient from the
> > > > memory usage point of view, so it seems reasonable to replace it with a data
> > > > structure that will require less memory, such as a pair of bitmaps.
> > >
> > > Well, 500 lines of code for what... 0.25% bigger image?

BTW, that depends on the total size of RAM. On a 1.5 GB i386 box that would
be something like 100%.

> > > I see it enables you to do some cleanups... but could we get those
> > > cleanups without those 500 lines? :-).
> >
> > Out of the 500 lines, something like 100 are comments and other 50 are
> > definitions of structures. ;-)
>
> Yes, and of the 100 lines of comments, 10 are fixmes :-).

No, no, they are just notes. ;-)

> > Seriously speaking, I could do that without the bitmaps, but the code wouldn't
> > be that much shorter. Apart from this, I would need to introduce yet another
> > type of PBEs (for storing pfns) and try not to get lost in the resulting mess.
> >
> > Instead of doing this I prefer to add some extra code to set up a decent data
> > structure and just use it.
>
> Okay, I guess that if we need to change the structure anyway, we may
> well use the effective structure...

That's exactly what I thought when I was writing this code. :-)

Rafael

2006-08-09 11:38:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 0/5] swsusp: Fix handling of highmem

Hi!

> > Comments welcome.
>
> Thanks for the reminder. I'd forgotten half the reason why I didn't want to
> make Suspend2 into incremental patches! You're a brave man!

Why does this serve as a reminder? No, it is not easy to merge big
patches to mainline. But it is actually a feature.

> while (1) {
> size=$RANDOM * 65536 + 1
> dd if=/dev/random bs=1 count=$size | patch -p0-b
> make && break
>}

Is this what you use to generate suspend2 patches? :-)))))

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:46:50

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi!

> Introduce the memory bitmap data structure and make swsusp use in the suspend
> phase.
>
> The current swsusp's internal data structure is not very efficient from the
> memory usage point of view, so it seems reasonable to replace it with a data
> structure that will require less memory, such as a pair of bitmaps.
>
> The idea is to use bitmaps that may be allocated as sets of individual pages,
> so that we can avoid making allocations of order greater than 0. For this
> reason the memory bitmap structure consists of several linked lists of objects
> that contain pointers to memory pages with the actual bitmap data. Still, for
> a typical system all of these lists fit in a single page, so it's reasonable
> to introduce an additional mechanism allowing us to allocate all of them
> efficiently without sacrificing the generality of the design. This is done
> with the help of the chain_allocator structure and associated functions.
>
> We need to use two memory bitmaps during the suspend phase of the
> suspend-resume cycle. One of them is necessary for marking the saveable
> pages, and the second is used to mark the pages in which to store the copies
> of them (aka image pages).
>
> First, the bitmaps are created and we allocate as many image pages as needed
> (the corresponding bits in the second bitmap are set as soon as the pages are
> allocated). Second, the bits corresponding to the saveable pages are set in
> the first bitmap and the saveable pages are copied to the image pages.
> Finally, the first bitmap is used to save the kernel virtual addresses of the
> saveable pages and the second one is used to save the contents of the image
> pages.

Maybe that bitmap code should go to kernel/power/bitmaps.c or
something?

> +static inline void bm_position_reset_chunk(struct memory_bitmap *bm)
> +{
> + bm->cur.chunk = 0;
> + bm->cur.bit = -1;
> +}
> +
> +static void bm_position_reset(struct memory_bitmap *bm)
> +{
> + struct zone_bitmap *zone_bm;
> +
> + zone_bm = bm->zone_bm_list;
> + bm->cur.zone_bm = zone_bm;
> + bm->cur.block = zone_bm->bm_blocks;
> + bm_position_reset_chunk(bm);
> +}
> +
> +static void free_memory_bm(struct memory_bitmap *bm, int
> clear_nosave_free);

All your other functions use bm_XXX, this is XXX_bm. Well, you are
mixing it rather freely..

> +/**
> + * memory_bm_set_bit - set the bit in the bitmap @bm that corresponds
> + * to given pfn. The cur_zone_bm member of @bm and the cur_block member
> + * of @bm->cur_zone_bm are updated.
> + *
> + * If the bit cannot be set, the function returns -EINVAL .
> + */
> +
> +static int
> +memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn)
> +{
> + struct zone_bitmap *zone_bm;
> + struct bm_block *bb;
> +
> + /* Check if the pfn is from the current zone */
> + zone_bm = bm->cur.zone_bm;
> + if (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
> + zone_bm = bm->zone_bm_list;
> + /* We don't assume that the zones are sorted by pfns */
> + while (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
> + zone_bm = zone_bm->next;
> + if (unlikely(!zone_bm))
> + return -EINVAL;
> + }
> + bm->cur.zone_bm = zone_bm;
> + }
> + /* Check if the pfn corresponds to the current bitmap block */
> + bb = zone_bm->cur_block;
> + if (pfn < bb->start_pfn)
> + bb = zone_bm->bm_blocks;
> +
> + while (pfn >= bb->end_pfn) {
> + bb = bb->next;
> + if (unlikely(!bb))
> + return -EINVAL;
> + }
> + zone_bm->cur_block = bb;
> + pfn -= bb->start_pfn;
> + set_bit(pfn % BM_BITS_PER_CHUNK, bb->data + pfn / BM_BITS_PER_CHUNK);
> + return 0;
> +}

It seems like this will not really introduce O(n^2) behaviour, since
you are starting from last position each time?

You have my ACK here, but maybe it should go _after_ 4/5 and 5/5?
Those are simple cleanups, this has break-something-potential and
should rest in -mm for a while.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:50:00

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi!

> Make swsusp use memory bitmaps to store its internal information during the
> resume phase of the suspend-resume cycle.
>
> If the pfns of saveable pages are saved during the suspend phase instead of
> the kernel virtual addresses of these pages, we can use them during the resume
> phase directly to set the corresponding bits in a memory bitmap. Then, this
> bitmap is used to mark the page frames corresponding to the pages that were
> saveable before the suspend (aka "unsafe" page frames).
>
> Next, we allocate as many page frames as needed to store the entire suspend
> image and make sure that there will be some extra free "safe" page frames for
> the list of PBEs constructed later. Subsequently, the image is loaded and,
> if possible, the data loaded from it are written into their "original" page frames
> (ie. the ones they had occupied before the suspend). The image data that
> cannot be written into their "original" page frames are loaded into "safe" page
> frames and their "original" kernel virtual addresses, as well as the addresses
> of the "safe" pages containing their copies, are stored in a list of PBEs.
> Finally, the list of PBEs is used to copy the remaining image data into their
> "original" page frames (this is done atomically, by the architecture-dependent
> parts of swsusp).
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK. If we get bitmap code we may as well use it. Should wait in -mm
for a while.

> @@ -53,7 +40,7 @@ static inline void pm_restore_console(vo
> static inline int software_suspend(void)
> {
> printk("Warning: fake suspend called\n");
> - return -EPERM;
> + return -ENOSYS;
> }
> #endif /* CONFIG_PM */
>

Heh, yes, it is right.. it is also totally unrelated and changes
userland interface ;-)))... which is probably okay here. But separate
would be nice.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:51:57

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 0/5] swsusp: Fix handling of highmem

On Wednesday 09 August 2006 21:38, Pavel Machek wrote:
> Hi!
>
> > > Comments welcome.
> >
> > Thanks for the reminder. I'd forgotten half the reason why I didn't want
> > to make Suspend2 into incremental patches! You're a brave man!
>
> Why does this serve as a reminder? No, it is not easy to merge big
> patches to mainline. But it is actually a feature.

It serves as a reminder because it shows (just the description, I mean), how
inter-related all the changes that are needed are.

I don't get the "it is actually a feature" bit.

> > while (1) {
> > size=$RANDOM * 65536 + 1
> > dd if=/dev/random bs=1 count=$size | patch -p0-b
> > make && break
> >}
>
> Is this what you use to generate suspend2 patches? :-)))))

:) Actually, given Greg's OLS keynote, I was wondering if it was what he used
to generate them.

Regards,

Nigel
--
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia


Attachments:
(No filename) (938.00 B)
(No filename) (189.00 B)
Download all attachments

2006-08-09 11:52:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi,

On Wednesday 09 August 2006 13:33, Pavel Machek wrote:
]--snip--[
>
> I'm still not sure if highmem support is worth the complexity -- I
> hope highmem dies painful death in next 3 weeks or so.

metoo, but currently quite a lot of Core Duo-based notebooks with 1 GB of RAM
and more are still being sold, let alone the Celerons, Semprons etc.

The patch is designed so that the higmem-related parts are just dropped by the
compiler if CONFIG_HIGHMEM is not set. That makes it a bit larger, but then
they don't get in the way when they are not needed.

[Well, I've been using 64-bit machines only for quite some time anyway, but
I thought it would be nice to do something for the others, too. ;-) ]

Rafael

2006-08-09 11:52:18

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 3/5] swsusp: Fix handling of highmem

Hi!

> Make swsusp handle highmem pages similarly to the pages of the "normal"
> memory.

Is it feasible to create kernel/power/highmem.c?

> include/linux/suspend.h | 6
> kernel/power/power.h | 2
> kernel/power/snapshot.c | 822 ++++++++++++++++++++++++++++++++++++------------
> kernel/power/swap.c | 2
> kernel/power/swsusp.c | 53 +--
> kernel/power/user.c | 2
> mm/vmscan.c | 3
> 7 files changed, 658 insertions(+), 232 deletions(-)

+400 lines for highmem... I hope highmem dies, dies, dies. Anyway, I'd
hate to debug this at the same time as the bitmap code. Can we get the
bitmaps in, wait for a while, and only then change highmem handling?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 11:53:53

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

On Wed 2006-08-09 13:36:16, Rafael J. Wysocki wrote:
> Hi,
>
> On Wednesday 09 August 2006 13:27, Pavel Machek wrote:
> > > > > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > > > > phase.
> > > > >
> > > > > The current swsusp's internal data structure is not very efficient from the
> > > > > memory usage point of view, so it seems reasonable to replace it with a data
> > > > > structure that will require less memory, such as a pair of bitmaps.
> > > >
> > > > Well, 500 lines of code for what... 0.25% bigger image?
>
> BTW, that depends on the total size of RAM. On a 1.5 GB i386 box that would
> be something like 100%.

Well, well, but 99.75% of that is from 3/5 patch, and we could still
get those 99.75% without bitmaps, right?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 12:08:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi,

On Wednesday 09 August 2006 13:46, Pavel Machek wrote:
> > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > phase.
]--snip--[
>
> Maybe that bitmap code should go to kernel/power/bitmaps.c or
> something?

Then we'll also need another header or put the definitions in power.h, but
they will only be used by the code in snapshot.c anyway. Apart from this,
the "bitmap" functions refer to alloc_image_page() etc. that are internal
to snapshot.c.

I thought it would be better to add this code to snapshot.c, because it's not
needed anywhere else and the separation of it would increase the overall
complexity for a little real gain.

> > +static inline void bm_position_reset_chunk(struct memory_bitmap *bm)
> > +{
> > + bm->cur.chunk = 0;
> > + bm->cur.bit = -1;
> > +}
> > +
> > +static void bm_position_reset(struct memory_bitmap *bm)
> > +{
> > + struct zone_bitmap *zone_bm;
> > +
> > + zone_bm = bm->zone_bm_list;
> > + bm->cur.zone_bm = zone_bm;
> > + bm->cur.block = zone_bm->bm_blocks;
> > + bm_position_reset_chunk(bm);
> > +}
> > +
> > +static void free_memory_bm(struct memory_bitmap *bm, int
> > clear_nosave_free);
>
> All your other functions use bm_XXX, this is XXX_bm. Well, you are
> mixing it rather freely..

OK, I'll change that to XXX_bm.

> > +/**
> > + * memory_bm_set_bit - set the bit in the bitmap @bm that corresponds
> > + * to given pfn. The cur_zone_bm member of @bm and the cur_block member
> > + * of @bm->cur_zone_bm are updated.
> > + *
> > + * If the bit cannot be set, the function returns -EINVAL .
> > + */
> > +
> > +static int
> > +memory_bm_set_bit(struct memory_bitmap *bm, unsigned long pfn)
> > +{
> > + struct zone_bitmap *zone_bm;
> > + struct bm_block *bb;
> > +
> > + /* Check if the pfn is from the current zone */
> > + zone_bm = bm->cur.zone_bm;
> > + if (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
> > + zone_bm = bm->zone_bm_list;
> > + /* We don't assume that the zones are sorted by pfns */
> > + while (pfn < zone_bm->start_pfn || pfn >= zone_bm->end_pfn) {
> > + zone_bm = zone_bm->next;
> > + if (unlikely(!zone_bm))
> > + return -EINVAL;
> > + }
> > + bm->cur.zone_bm = zone_bm;
> > + }
> > + /* Check if the pfn corresponds to the current bitmap block */
> > + bb = zone_bm->cur_block;
> > + if (pfn < bb->start_pfn)
> > + bb = zone_bm->bm_blocks;
> > +
> > + while (pfn >= bb->end_pfn) {
> > + bb = bb->next;
> > + if (unlikely(!bb))
> > + return -EINVAL;
> > + }
> > + zone_bm->cur_block = bb;
> > + pfn -= bb->start_pfn;
> > + set_bit(pfn % BM_BITS_PER_CHUNK, bb->data + pfn / BM_BITS_PER_CHUNK);
> > + return 0;
> > +}
>
> It seems like this will not really introduce O(n^2) behaviour, since
> you are starting from last position each time?

Exactly.

> You have my ACK here,

Thanks. :-)

> but maybe it should go _after_ 4/5 and 5/5? Those are simple cleanups, this
> has break-something-potential and should rest in -mm for a while.

OK, I'll redo the series this way.

Rafael

2006-08-09 12:08:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 3/5] swsusp: Fix handling of highmem

Hi,

On Wednesday 09 August 2006 13:51, Pavel Machek wrote:
> > Make swsusp handle highmem pages similarly to the pages of the "normal"
> > memory.
>
> Is it feasible to create kernel/power/highmem.c?

Not really ...

> > include/linux/suspend.h | 6
> > kernel/power/power.h | 2
> > kernel/power/snapshot.c | 822 ++++++++++++++++++++++++++++++++++++------------
> > kernel/power/swap.c | 2
> > kernel/power/swsusp.c | 53 +--
> > kernel/power/user.c | 2
> > mm/vmscan.c | 3
> > 7 files changed, 658 insertions(+), 232 deletions(-)
>
> +400 lines for highmem... I hope highmem dies, dies, dies. Anyway, I'd
> hate to debug this at the same time as the bitmap code. Can we get the
> bitmaps in, wait for a while, and only then change highmem handling?

Sure, why not. I'll prepare a series without the highmem patch for now.

Rafael

2006-08-09 12:08:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi,

On Wednesday 09 August 2006 13:49, Pavel Machek wrote:
> > Make swsusp use memory bitmaps to store its internal information during the
> > resume phase of the suspend-resume cycle.
> >
> > If the pfns of saveable pages are saved during the suspend phase instead of
> > the kernel virtual addresses of these pages, we can use them during the resume
> > phase directly to set the corresponding bits in a memory bitmap. Then, this
> > bitmap is used to mark the page frames corresponding to the pages that were
> > saveable before the suspend (aka "unsafe" page frames).
> >
> > Next, we allocate as many page frames as needed to store the entire suspend
> > image and make sure that there will be some extra free "safe" page frames for
> > the list of PBEs constructed later. Subsequently, the image is loaded and,
> > if possible, the data loaded from it are written into their "original" page frames
> > (ie. the ones they had occupied before the suspend). The image data that
> > cannot be written into their "original" page frames are loaded into "safe" page
> > frames and their "original" kernel virtual addresses, as well as the addresses
> > of the "safe" pages containing their copies, are stored in a list of PBEs.
> > Finally, the list of PBEs is used to copy the remaining image data into their
> > "original" page frames (this is done atomically, by the architecture-dependent
> > parts of swsusp).
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> ACK. If we get bitmap code we may as well use it. Should wait in -mm
> for a while.

Sure.

> > @@ -53,7 +40,7 @@ static inline void pm_restore_console(vo
> > static inline int software_suspend(void)
> > {
> > printk("Warning: fake suspend called\n");
> > - return -EPERM;
> > + return -ENOSYS;
> > }
> > #endif /* CONFIG_PM */
> >
>
> Heh, yes, it is right.. it is also totally unrelated and changes
> userland interface ;-)))... which is probably okay here. But separate
> would be nice.

Ah, well, that's a "btw" thing. ;-) Will separate.

Rafael

2006-08-09 12:10:49

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi!

> > > @@ -53,7 +40,7 @@ static inline void pm_restore_console(vo
> > > static inline int software_suspend(void)
> > > {
> > > printk("Warning: fake suspend called\n");
> > > - return -EPERM;
> > > + return -ENOSYS;
> > > }
> > > #endif /* CONFIG_PM */
> > >
> >
> > Heh, yes, it is right.. it is also totally unrelated and changes
> > userland interface ;-)))... which is probably okay here. But separate
> > would be nice.
>
> Ah, well, that's a "btw" thing. ;-) Will separate.

...and probably should go _before_ the hard patches.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 12:12:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

On Wednesday 09 August 2006 13:53, Pavel Machek wrote:
> On Wed 2006-08-09 13:36:16, Rafael J. Wysocki wrote:
> > Hi,
> >
> > On Wednesday 09 August 2006 13:27, Pavel Machek wrote:
> > > > > > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > > > > > phase.
> > > > > >
> > > > > > The current swsusp's internal data structure is not very efficient from the
> > > > > > memory usage point of view, so it seems reasonable to replace it with a data
> > > > > > structure that will require less memory, such as a pair of bitmaps.
> > > > >
> > > > > Well, 500 lines of code for what... 0.25% bigger image?
> >
> > BTW, that depends on the total size of RAM. On a 1.5 GB i386 box that would
> > be something like 100%.
>
> Well, well, but 99.75% of that is from 3/5 patch, and we could still
> get those 99.75% without bitmaps, right?

Yes. I meant the total gain from all of the changes. [The highmem one would
be quite difficult without bitmaps, I think, because the bitmaps give us the
right ordering of pages automatically.]

Rafael

2006-08-09 13:33:43

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 2/5] swsusp: Use memory bitmaps during resume

Hi!

> > I'm still not sure if highmem support is worth the complexity -- I
> > hope highmem dies painful death in next 3 weeks or so.
>
> metoo, but currently quite a lot of Core Duo-based notebooks with 1 GB of RAM
> and more are still being sold, let alone the Celerons, Semprons etc.

Well, 1GB should still be reasonably well supported, even with current
code. 3GB+ machines will be different story.

> The patch is designed so that the higmem-related parts are just dropped by the
> compiler if CONFIG_HIGHMEM is not set. That makes it a bit larger, but then
> they don't get in the way when they are not needed.
>
> [Well, I've been using 64-bit machines only for quite some time anyway, but
> I thought it would be nice to do something for the others, too. ;-) ]

Okay.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 13:35:24

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 1/5] swsusp: Introduce memory bitmaps

Hi!

> > > Introduce the memory bitmap data structure and make swsusp use in the suspend
> > > phase.
> ]--snip--[
> >
> > Maybe that bitmap code should go to kernel/power/bitmaps.c or
> > something?
>
> Then we'll also need another header or put the definitions in power.h, but
> they will only be used by the code in snapshot.c anyway. Apart from this,
> the "bitmap" functions refer to alloc_image_page() etc. that are internal
> to snapshot.c.
>
> I thought it would be better to add this code to snapshot.c, because it's not
> needed anywhere else and the separation of it would increase the overall
> complexity for a little real gain.

Okay, then. (I'd like to keep files at reasonable sizes, but...)

> > but maybe it should go _after_ 4/5 and 5/5? Those are simple cleanups, this
> > has break-something-potential and should rest in -mm for a while.
>
> OK, I'll redo the series this way.

Thanks.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-09 13:36:03

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 3/5] swsusp: Fix handling of highmem

Hi!

> > > include/linux/suspend.h | 6
> > > kernel/power/power.h | 2
> > > kernel/power/snapshot.c | 822 ++++++++++++++++++++++++++++++++++++------------
> > > kernel/power/swap.c | 2
> > > kernel/power/swsusp.c | 53 +--
> > > kernel/power/user.c | 2
> > > mm/vmscan.c | 3
> > > 7 files changed, 658 insertions(+), 232 deletions(-)
> >
> > +400 lines for highmem... I hope highmem dies, dies, dies. Anyway, I'd
> > hate to debug this at the same time as the bitmap code. Can we get the
> > bitmaps in, wait for a while, and only then change highmem handling?
>
> Sure, why not. I'll prepare a series without the highmem patch for now.

Thanks.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-13 22:35:09

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH -mm 0/5] swsusp: Fix handling of highmem

On Wed 2006-08-09 21:52:10, Nigel Cunningham wrote:
> On Wednesday 09 August 2006 21:38, Pavel Machek wrote:
> > Hi!
> >
> > > > Comments welcome.
> > >
> > > Thanks for the reminder. I'd forgotten half the reason why I didn't want
> > > to make Suspend2 into incremental patches! You're a brave man!
> >
> > Why does this serve as a reminder? No, it is not easy to merge big
> > patches to mainline. But it is actually a feature.
>
> It serves as a reminder because it shows (just the description, I mean), how
> inter-related all the changes that are needed are.
>
> I don't get the "it is actually a feature" bit.

Well, it is good that submitting patches is hard ... because kernel
ends cleaner that way.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html