LinuxLists.cc - [PATCH 0/7] Memory Compaction v2

2007-06-18 09:28:31

Subject: [PATCH 0/7] Memory Compaction v2

This is V2 for the memory compaction patches. They depend on the two starting
patches from the memory hot-remove patchset which I've included here as the
first patch. All comments are welcome and they should be in a state useful
for wider testing.

Changelog since V1
o Bug fix when checking if a given node ID is valid or not
o Using latest patch from Kame-san to compact memory in-kernel
o Added trigger for direct compaction instead of direct reclaim
o Obey watermarks in split_pagebuddy_pages()
o Do not call lru_add_drain_all() frequently

The patchset implements memory compaction for the page allocator reducing
external fragmentation so that free memory exists as fewer, but larger
contiguous blocks. Instead of being a full defragmentation solution,
this focuses exclusively on pages that are movable via the page migration
mechanism.

The compaction mechanism operates within a zone and moves movable pages
towards the higher PFNs. Grouping pages by mobility biases the location
of unmovable pages is biased towards the lower addresses, so the strategies
work in conjunction.

A full compaction run involves two scanners operating within a zone - a
migration and a free scanner. The migration scanner starts at the beginning
of a zone and finds all movable pages within one pageblock_nr_pages-sized
area and isolates them on a migratepages list. The free scanner begins at
the end of the zone and searches on a per-area basis for enough free pages to
migrate all the pages on the migratepages list. As each area is respecively
migrated or exhaused of free pages, the scanners are advanced one area.
A compaction run completes within a zone when the two scanners meet.

This is what /proc/buddyinfo looks like before and after a compaction run.

mel@arnold:~/results$ cat before-buddyinfo.txt
Node 0, zone DMA 150 33 6 4 2 1 1 1 1 0 0
Node 0, zone Normal 7901 3005 2205 1511 758 245 34 3 0 1 0

mel@arnold:~/results$ cat after-buddyinfo.txt
Node 0, zone DMA 150 33 6 4 2 1 1 1 1 0 0
Node 0, zone Normal 1900 1187 609 325 228 178 110 32 6 4 24

Memory compaction may be triggered explicitly by writing a node number to
/proc/sys/vm/compact_node. When a process fails to allocate a high-order
page, it may compact memory in an attempt to satisfy the allocation. Explicit
compaction does not finish until the two scanners meet. Direct compaction
ends if a suitable page becomes available.

The first patch is a rollup from the memory hot-remove patchset. The two
patches after that are changes to page migration. The second patch allows
CONFIG_MIGRATION to be set without CONFIG_NUMA. The third patch allows
LRU pages to be isolated in batch instead of acquiring and releasing the
LRU lock a lot.

The fourth patch exports some metrics on external fragmentation which
are relevant to memory compaction. The fifth patch is what implements
memory compaction for a single zone. The sixth patch enables a node to be
compacted explicitly by writing to a special file in /proc and the final
patch implements direct compaction.

This version of the patchset should be usable on all machines and I
consider it ready for testing. It's passed tests here on x86, x86_64 and
ppc64 machines.

Here are some outstanding items on a TODO list in
no particular order.

o Have split_pagebuddy_order make blocks MOVABLE when the free page order
is greater than pageblock_order
o Avoid racing with other allocators when direct compaction by taking the page
the moment it becomes free
o Implement compaction_debug boot-time option like slub_debug
o Implement compaction_disable boot-time option just in case
o Investigate using debugfs as the manual compaction trigger instead of proc

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-18 09:29:20

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA

CONFIG_MIGRATION currently depends on CONFIG_NUMA. move_pages() is the only
user of migration today and as this system call is only meaningful on NUMA,
it makes sense. However, memory compaction will operate within a zone and is
useful on both NUMA and non-NUMA systems. This patch allows CONFIG_MIGRATION
to be used in all memory models. To preserve existing behaviour, move_pages()
is only available when CONFIG_NUMA is set.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Andy Whitcroft <[email protected]>
---

include/linux/migrate.h | 6 +++---
include/linux/mm.h | 2 ++
mm/Kconfig | 1 -
3 files changed, 5 insertions(+), 4 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/migrate.h linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h
--- linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/migrate.h 2007-06-05 01:57:25.000000000 +0100
+++ linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h 2007-06-15 16:25:37.000000000 +0100
@@ -7,7 +7,7 @@

typedef struct page *new_page_t(struct page *, unsigned long private, int **);

-#ifdef CONFIG_MIGRATION
+#ifdef CONFIG_NUMA
/* Check if a vma is migratable */
static inline int vma_migratable(struct vm_area_struct *vma)
{
@@ -24,7 +24,9 @@ static inline int vma_migratable(struct
return 0;
return 1;
}
+#endif

+#ifdef CONFIG_MIGRATION
extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
extern int putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
@@ -39,8 +41,6 @@ extern int migrate_vmas(struct mm_struct
const nodemask_t *from, const nodemask_t *to,
unsigned long flags);
#else
-static inline int vma_migratable(struct vm_area_struct *vma)
- { return 0; }

static inline int isolate_lru_page(struct page *p, struct list_head *list)
{ return -ENOSYS; }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/mm.h linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/mm.h
--- linux-2.6.22-rc4-mm2-005_migrationkernel/include/linux/mm.h 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/mm.h 2007-06-15 16:25:37.000000000 +0100
@@ -241,6 +241,8 @@ struct vm_operations_struct {
int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
struct mempolicy *(*get_policy)(struct vm_area_struct *vma,
unsigned long addr);
+#endif /* CONFIG_NUMA */
+#ifdef CONFIG_MIGRATION
int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from,
const nodemask_t *to, unsigned long flags);
#endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-005_migrationkernel/mm/Kconfig linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/Kconfig
--- linux-2.6.22-rc4-mm2-005_migrationkernel/mm/Kconfig 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/Kconfig 2007-06-15 16:25:37.000000000 +0100
@@ -145,7 +145,6 @@ config SPLIT_PTLOCK_CPUS
config MIGRATION
bool "Page migration"
def_bool y
- depends on NUMA
help
Allows the migration of the physical location of pages of processes
while the virtual addresses are not changed. This is useful for

2007-06-18 09:28:52

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches

This is a rollup of two patches from KAMEZAWA Hiroyuki. A slightly later
version exists but this is the one I tested with and it checks page_mapped()
with the RCU lock held.

Patch 1 is "page migration by kernel v5."
Patch 2 is "isolate lru page race fix."

Changelog V5->V6
- removed dummy_vma and uses rcu_read_lock().

In usual, migrate_pages(page,,) is called with holoding mm->sem by systemcall.
(mm here is a mm_struct which maps the migration target page.)
This semaphore helps avoiding some race conditions.

But, if we want to migrate a page by some kernel codes, we have to avoid
some races. This patch adds check code for following race condition.

1. A page which is not mapped can be target of migration. Then, we have
to check page_mapped() before calling try_to_unmap().

2. anon_vma can be freed while page is unmapped, but page->mapping remains as
it was. We drop page->mapcount to be 0. Then we cannot trust page->mapping.
So, use rcu_read_lock() to prevent anon_vma pointed by page->mapping will
not be freed during migration.

release_pages() in mm/swap.c changes page_count() to be 0
without removing PageLRU flag...

This means isolate_lru_page() can see a page, PageLRU() && page_count(page)==0..
This is BUG. (get_page() will be called against count=0 page.)

Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
---

migrate.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-clean/mm/migrate.c linux-2.6.22-rc4-mm2-005_migrationkernel/mm/migrate.c
--- linux-2.6.22-rc4-mm2-clean/mm/migrate.c 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-005_migrationkernel/mm/migrate.c 2007-06-15 16:25:31.000000000 +0100
@@ -49,9 +49,8 @@ int isolate_lru_page(struct page *page,
struct zone *zone = page_zone(page);

spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page)) {
+ if (PageLRU(page) && get_page_unless_zero(page)) {
ret = 0;
- get_page(page);
ClearPageLRU(page);
if (PageActive(page))
del_page_from_active_list(zone, page);
@@ -612,6 +611,7 @@ static int unmap_and_move(new_page_t get
int rc = 0;
int *result = NULL;
struct page *newpage = get_new_page(page, private, &result);
+ int rcu_locked = 0;

if (!newpage)
return -ENOMEM;
@@ -632,18 +632,27 @@ static int unmap_and_move(new_page_t get
goto unlock;
wait_on_page_writeback(page);
}
-
+ /* anon_vma should not be freed while migration. */
+ if (PageAnon(page)) {
+ rcu_read_lock();
+ rcu_locked = 1;
+ }
/*
* Establish migration ptes or remove ptes
*/
- try_to_unmap(page, 1);
if (!page_mapped(page))
- rc = move_to_new_page(newpage, page);
+ goto unlock;
+
+ try_to_unmap(page, 1);
+ rc = move_to_new_page(newpage, page);

if (rc)
remove_migration_ptes(page, page);

unlock:
+ if (rcu_locked)
+ rcu_read_unlock();
+
unlock_page(page);

if (rc != -EAGAIN) {

2007-06-18 09:29:33

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page()

Migration uses isolate_lru_page() to isolate an LRU page. This acquires
the zone->lru_lock to safely remove the page and place it on a private
list. However, this prevents the caller from batching up isolation of
multiple pages. This patch introduces a nolock version of isolate_lru_page()
for callers that are aware of the locking requirements.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Andy Whitcroft <[email protected]>
---

include/linux/migrate.h | 8 +++++++-
mm/migrate.c | 36 +++++++++++++++++++++++++++---------
2 files changed, 34 insertions(+), 10 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h linux-2.6.22-rc4-mm2-020_isolate_nolock/include/linux/migrate.h
--- linux-2.6.22-rc4-mm2-015_migration_flatmem/include/linux/migrate.h 2007-06-15 16:25:37.000000000 +0100
+++ linux-2.6.22-rc4-mm2-020_isolate_nolock/include/linux/migrate.h 2007-06-15 16:25:46.000000000 +0100
@@ -27,6 +27,8 @@ static inline int vma_migratable(struct
#endif

#ifdef CONFIG_MIGRATION
+extern int locked_isolate_lru_page(struct zone *zone, struct page *p,
+ struct list_head *pagelist);
extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
extern int putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
@@ -41,7 +43,11 @@ extern int migrate_vmas(struct mm_struct
const nodemask_t *from, const nodemask_t *to,
unsigned long flags);
#else
-
+static inline int locked_isolate_lru_page(struct zone *zone, struct page *p,
+ struct list_head *list)
+{
+ return -ENOSYS;
+}
static inline int isolate_lru_page(struct page *p, struct list_head *list)
{ return -ENOSYS; }
static inline int putback_lru_pages(struct list_head *l) { return 0; }
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/migrate.c linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/migrate.c
--- linux-2.6.22-rc4-mm2-015_migration_flatmem/mm/migrate.c 2007-06-15 16:25:31.000000000 +0100
+++ linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/migrate.c 2007-06-15 16:25:46.000000000 +0100
@@ -41,6 +41,32 @@
* -EBUSY: page not on LRU list
* 0: page removed from LRU list and added to the specified list.
*/
+int locked_isolate_lru_page(struct zone *zone, struct page *page,
+ struct list_head *pagelist)
+{
+ int ret = -EBUSY;
+
+ if (PageLRU(page) && get_page_unless_zero(page)) {
+ ret = 0;
+ ClearPageLRU(page);
+ if (PageActive(page))
+ del_page_from_active_list(zone, page);
+ else
+ del_page_from_inactive_list(zone, page);
+ list_add_tail(&page->lru, pagelist);
+ }
+
+ return ret;
+}
+
+/*
+ * Acquire the zone->lru_lock and isolate one page from the LRU lists. If
+ * successful put it onto the indicated list with elevated page count.
+ *
+ * Result:
+ * -EBUSY: page not on LRU list
+ * 0: page removed from LRU list and added to the specified list.
+ */
int isolate_lru_page(struct page *page, struct list_head *pagelist)
{
int ret = -EBUSY;
@@ -49,15 +75,7 @@ int isolate_lru_page(struct page *page,
struct zone *zone = page_zone(page);

spin_lock_irq(&zone->lru_lock);
- if (PageLRU(page) && get_page_unless_zero(page)) {
- ret = 0;
- ClearPageLRU(page);
- if (PageActive(page))
- del_page_from_active_list(zone, page);
- else
- del_page_from_inactive_list(zone, page);
- list_add_tail(&page->lru, pagelist);
- }
+ ret = locked_isolate_lru_page(zone, page, pagelist);
spin_unlock_irq(&zone->lru_lock);
}
return ret;

2007-06-18 09:29:54

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 4/7] Provide metrics on the extent of fragmentation in zones

It is useful to know the state of external fragmentation in the system
and whether allocation failures are due to low memory or external
fragmentation. This patch introduces two metrics for evaluation the state
of fragmentation and exports the information to /proc/pagetypeinfo. The
metrics will be used later to determine if it is better to compact memory
or directly reclaim for a high-order allocation to succeed.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Andy Whitcroft <[email protected]>
---

vmstat.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 131 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/vmstat.c linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/vmstat.c
--- linux-2.6.22-rc4-mm2-020_isolate_nolock/mm/vmstat.c 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/vmstat.c 2007-06-15 16:25:55.000000000 +0100
@@ -625,6 +625,135 @@ static void pagetypeinfo_showmixedcount_
#endif /* CONFIG_PAGE_OWNER */

/*
+ * Calculate the number of free pages in a zone and how many contiguous
+ * pages are free and how many are large enough to satisfy an allocation of
+ * the target size
+ */
+void calculate_freepages(struct zone *zone, unsigned int target_order,
+ unsigned long *ret_freepages,
+ unsigned long *ret_areas_free,
+ unsigned long *ret_suitable_areas_free)
+{
+ unsigned int order;
+ unsigned long freepages;
+ unsigned long areas_free;
+ unsigned long suitable_areas_free;
+
+ freepages = areas_free = suitable_areas_free = 0;
+ for (order = 0; order < MAX_ORDER; order++) {
+ unsigned long order_areas_free;
+
+ /* Count number of free blocks */
+ order_areas_free = zone->free_area[order].nr_free;
+ areas_free += order_areas_free;
+
+ /* Count free base pages */
+ freepages += order_areas_free << order;
+
+ /* Count the number of target_order sized free blocks */
+ if (order >= target_order)
+ suitable_areas_free += order_areas_free <<
+ (order - target_order);
+ }
+
+ *ret_freepages = freepages;
+ *ret_areas_free = areas_free;
+ *ret_suitable_areas_free = suitable_areas_free;
+}
+
+/*
+ * Return an index indicating how much of the available free memory is
+ * unusable for an allocation of the requested size. A value towards 100
+ * implies that the majority of free memory is unusable and compaction
+ * may be required.
+ */
+int unusable_free_index(struct zone *zone, unsigned int target_order)
+{
+ unsigned long freepages, areas_free, suitable_areas_free;
+
+ calculate_freepages(zone, target_order,
+ &freepages, &areas_free, &suitable_areas_free);
+
+ /* No free memory is interpreted as all free memory is unusable */
+ if (freepages == 0)
+ return 100;
+
+ return ((freepages - (suitable_areas_free << target_order)) * 100) /
+ freepages;
+}
+
+/*
+ * Return the external fragmentation index for a zone. Values towards 100
+ * imply the allocation failure was due to external fragmentation. Values
+ * towards 0 imply the failure was due to lack of memory. The value is only
+ * useful when an allocation of the requested order would fail and it does
+ * not take into account pages free on the pcp list.
+ */
+int fragmentation_index(struct zone *zone, unsigned int target_order)
+{
+ unsigned long freepages, areas_free, suitable_areas_free;
+
+ calculate_freepages(zone, target_order,
+ &freepages, &areas_free, &suitable_areas_free);
+
+ /* An allocation succeeding implies this index has no meaning */
+ if (suitable_areas_free)
+ return -1;
+
+ return 100 - ((freepages / (1 << target_order)) * 100) / areas_free;
+}
+
+static void pagetypeinfo_showunusable_print(struct seq_file *m,
+ pg_data_t *pgdat, struct zone *zone)
+{
+ unsigned int order;
+
+ seq_printf(m, "Node %4d, zone %8s %19s",
+ pgdat->node_id,
+ zone->name, " ");
+ for (order = 0; order < MAX_ORDER; ++order)
+ seq_printf(m, "%6d ", unusable_free_index(zone, order));
+
+ seq_putc(m, '\n');
+}
+
+/* Print out percentage of unusable free memory at each order */
+static int pagetypeinfo_showunusable(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ seq_printf(m, "\nPercentage unusable free memory at order\n");
+ walk_zones_in_node(m, pgdat, pagetypeinfo_showunusable_print);
+
+ return 0;
+}
+
+static void pagetypeinfo_showfragmentation_print(struct seq_file *m,
+ pg_data_t *pgdat, struct zone *zone)
+{
+ unsigned int order;
+
+ seq_printf(m, "Node %4d, zone %8s %19s",
+ pgdat->node_id,
+ zone->name, " ");
+ for (order = 0; order < MAX_ORDER; ++order)
+ seq_printf(m, "%6d ", fragmentation_index(zone, order));
+
+ seq_putc(m, '\n');
+}
+
+/* Print the fragmentation index at each order */
+static int pagetypeinfo_showfragmentation(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ seq_printf(m, "\nFragmentation index\n");
+ walk_zones_in_node(m, pgdat, pagetypeinfo_showfragmentation_print);
+
+ return 0;
+}
+
+/*
* Print out the number of pageblocks for each migratetype that contain pages
* of other types. This gives an indication of how well fallbacks are being
* contained by rmqueue_fallback(). It requires information from PAGE_OWNER
@@ -656,6 +785,8 @@ static int pagetypeinfo_show(struct seq_
seq_printf(m, "Pages per block: %lu\n", pageblock_nr_pages);
seq_putc(m, '\n');
pagetypeinfo_showfree(m, pgdat);
+ pagetypeinfo_showunusable(m, pgdat);
+ pagetypeinfo_showfragmentation(m, pgdat);
pagetypeinfo_showblockcount(m, pgdat);
pagetypeinfo_showmixedcount(m, pgdat);

2007-06-18 09:30:26

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 5/7] Introduce a means of compacting memory within a zone

This patch is the core of the memory compaction mechanism. It compacts memory
in a zone such that movable pages are relocated towards the end of the zone.

A single compaction run involves a migration scanner and a free scanner.
Both scanners operate on pageblock-sized areas in the zone. The migration
scanner starts at the bottom of the zone and searches for all movable pages
within each area, isolating them onto a private list called migratelist.
The free scanner starts at the top of the zone and searches for suitable
areas and consumes the free pages within making them available for the
migration scanner. The pages isolated for migration are then migrated to
the newly isolated free pages.

Note that after this patch is applied there is still no means of triggering
a compaction run. Later patches will introduce the triggers, initially a
manual trigger.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Andy Whitcroft <[email protected]>
---

include/linux/compaction.h | 8 +
include/linux/mm.h | 1
mm/Makefile | 2
mm/compaction.c | 297 ++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 38 +++++
5 files changed, 345 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/compaction.h linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/compaction.h 2007-06-14 00:08:58.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h 2007-06-15 16:28:59.000000000 +0100
@@ -0,0 +1,8 @@
+#ifndef _LINUX_COMPACTION_H
+#define _LINUX_COMPACTION_H
+
+/* Return values for compact_zone() */
+#define COMPACT_INCOMPLETE 0
+#define COMPACT_COMPLETE 1
+
+#endif /* _LINUX_COMPACTION_H */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/mm.h linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/mm.h
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/include/linux/mm.h 2007-06-15 16:25:37.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/mm.h 2007-06-15 16:28:59.000000000 +0100
@@ -336,6 +336,7 @@ void put_page(struct page *page);
void put_pages_list(struct list_head *pages);

void split_page(struct page *page, unsigned int order);
+int split_free_page(struct page *page);

/*
* Compound pages have a destructor function. Provide a
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/compaction.c linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/compaction.c 2007-06-14 00:08:58.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c 2007-06-15 16:28:59.000000000 +0100
@@ -0,0 +1,297 @@
+/*
+ * linux/mm/compaction.c
+ *
+ * Memory compaction for the reduction of external fragmentation
+ * Copyright IBM Corp. 2007 Mel Gorman <[email protected]>
+ */
+#include <linux/migrate.h>
+#include <linux/compaction.h>
+#include "internal.h"
+
+/*
+ * compact_control is used to track pages being migrated and the free pages
+ * they are being migrated to during memory compaction. The free_pfn starts
+ * at the end of a zone and migrate_pfn begins at the start. Movable pages
+ * are moved to the end of a zone during a compaction run and the run
+ * completes when free_pfn <= migrate_pfn
+ */
+struct compact_control {
+ struct list_head freepages; /* List of free pages to migrate to */
+ struct list_head migratepages; /* List of pages being migrated */
+ unsigned long nr_freepages; /* Number of isolated free pages */
+ unsigned long nr_migratepages; /* Number of pages to migrate */
+ unsigned long free_pfn; /* isolate_freepages search base */
+ unsigned long migrate_pfn; /* isolate_migratepages search base */
+};
+
+static int release_freepages(struct zone *zone, struct list_head *freelist)
+{
+ struct page *page, *next;
+ int count = 0;
+
+ list_for_each_entry_safe(page, next, freelist, lru) {
+ list_del(&page->lru);
+ __free_page(page);
+ count++;
+ }
+
+ return count;
+}
+
+/* Isolate free pages onto a private freelist. Must hold zone->lock */
+static int isolate_freepages_block(struct zone *zone,
+ unsigned long blockpfn,
+ struct list_head *freelist)
+{
+ unsigned long zone_end_pfn, end_pfn;
+ int total_isolated = 0;
+
+ /* Get the last PFN we should scan for free pages at */
+ zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ end_pfn = blockpfn + pageblock_nr_pages;
+ if (end_pfn > zone_end_pfn)
+ end_pfn = zone_end_pfn;
+
+ /* Isolate free pages. This assumes the block is valid */
+ for (; blockpfn < end_pfn; blockpfn++) {
+ struct page *page;
+ int isolated, i;
+
+ if (!pfn_valid_within(blockpfn))
+ continue;
+
+ page = pfn_to_page(blockpfn);
+ if (!PageBuddy(page))
+ continue;
+
+ /* Found a free page, break it into order-0 pages */
+ isolated = split_free_page(page);
+ total_isolated += isolated;
+ for (i = 0; i < isolated; i++) {
+ list_add(&page->lru, freelist);
+ page++;
+ }
+ blockpfn += isolated - 1;
+ }
+
+ return total_isolated;
+}
+
+/* Returns 1 if the page is within a block suitable for migration to */
+static int pageblock_migratable(struct page *page)
+{
+ /* If the page is a large free page, then allow migration */
+ if (PageBuddy(page) && page_order(page) >= pageblock_order)
+ return 1;
+
+ /* If the block is MIGRATE_MOVABLE, allow migration */
+ if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+ return 1;
+
+ /* Otherwise skip the block */
+ return 0;
+}
+
+/*
+ * Based on information in the current compact_control, find blocks
+ * suitable for isolating free pages from
+ */
+static void isolate_freepages(struct zone *zone,
+ struct compact_control *cc)
+{
+ struct page *page;
+ unsigned long high_pfn, low_pfn, pfn;
+ int nr_freepages = cc->nr_freepages;
+ struct list_head *freelist = &cc->freepages;
+
+ pfn = cc->free_pfn;
+ low_pfn = cc->migrate_pfn + pageblock_nr_pages;
+ high_pfn = low_pfn;
+
+ /*
+ * Isolate free pages until enough are available to migrate the
+ * pages on cc->migratepages. We stop searching if the migrate
+ * and free page scanners meet or enough free pages are isolated.
+ */
+ spin_lock_irq(&zone->lock);
+ for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+ pfn -= pageblock_nr_pages) {
+ int isolated;
+
+ if (!pfn_valid(pfn))
+ continue;
+
+ /* Check for overlapping nodes/zones */
+ page = pfn_to_page(pfn);
+ if (page_zone(page) != zone)
+ continue;
+
+ /* Check the block is suitable for migration */
+ if (!pageblock_migratable(page))
+ continue;
+
+ /* Found a block suitable for isolating free pages from */
+ isolated = isolate_freepages_block(zone, pfn, freelist);
+ nr_freepages += isolated;
+
+ /*
+ * Record the highest PFN we isolated pages from. When next
+ * looking for free pages, the search will restart here as
+ * page migration may have returned some pages to the allocator
+ */
+ if (isolated)
+ high_pfn = max(high_pfn, pfn);
+ }
+ spin_unlock_irq(&zone->lock);
+
+ cc->free_pfn = high_pfn;
+ cc->nr_freepages = nr_freepages;
+}
+
+/*
+ * Isolate all pages that can be migrated from the block pointed to by
+ * the migrate scanner within compact_control. We migrate pages from
+ * all block-types as the intention is to have all movable pages towards
+ * the end of the zone.
+ */
+static int isolate_migratepages(struct zone *zone,
+ struct compact_control *cc)
+{
+ unsigned long high_pfn, low_pfn, end_pfn, start_pfn;
+ struct page *page;
+ int isolated = 0;
+ struct list_head *migratelist;
+
+ high_pfn = cc->free_pfn;
+ low_pfn = ALIGN(cc->migrate_pfn, pageblock_nr_pages);
+ migratelist = &cc->migratepages;
+
+ /* Do not scan outside zone boundaries */
+ if (low_pfn < zone->zone_start_pfn)
+ low_pfn = zone->zone_start_pfn;
+
+ /* Setup to scan one block but not past where we are migrating to */
+ end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
+ if (end_pfn > high_pfn)
+ end_pfn = high_pfn;
+ start_pfn = low_pfn;
+
+ /* Time to isolate some pages for migration */
+ spin_lock_irq(&zone->lru_lock);
+ for (; low_pfn < end_pfn; low_pfn++) {
+ if (!pfn_valid_within(low_pfn))
+ continue;
+
+ /* Get the page and skip if free */
+ page = pfn_to_page(low_pfn);
+ if (PageBuddy(page)) {
+ low_pfn += (1 << page_order(page)) - 1;
+ continue;
+ }
+
+ /* Try isolate the page */
+ if (locked_isolate_lru_page(zone, page, migratelist) == 0)
+ isolated++;
+ }
+ spin_unlock_irq(&zone->lru_lock);
+
+ cc->migrate_pfn = end_pfn;
+ cc->nr_migratepages += isolated;
+ return isolated;
+}
+
+/*
+ * This is a migrate-callback that "allocates" freepages by taking pages
+ * from the isolated freelists in the block we are migrating to.
+ */
+static struct page *compaction_alloc(struct page *migratepage,
+ unsigned long data,
+ int **result)
+{
+ struct compact_control *cc = (struct compact_control *)data;
+ struct page *freepage;
+
+ VM_BUG_ON(cc == NULL);
+ if (list_empty(&cc->freepages))
+ return NULL;
+
+ freepage = list_entry(cc->freepages.next, struct page, lru);
+ list_del(&freepage->lru);
+ cc->nr_freepages--;
+
+#ifdef CONFIG_PAGE_OWNER
+ freepage->order = migratepage->order;
+ freepage->gfp_mask = migratepage->gfp_mask;
+ memcpy(freepage->trace, migratepage->trace, sizeof(freepage->trace));
+#endif
+
+ return freepage;
+}
+
+/*
+ * We cannot control nr_migratepages and nr_freepages fully when migration is
+ * running as migrate_pages() has no knowledge of compact_control. When
+ * migration is complete, we count the number of pages on the lists by hand.
+ */
+static void update_nr_listpages(struct compact_control *cc)
+{
+ int nr_migratepages = 0;
+ int nr_freepages = 0;
+ struct page *page;
+ list_for_each_entry(page, &cc->migratepages, lru)
+ nr_migratepages++;
+ list_for_each_entry(page, &cc->freepages, lru)
+ nr_freepages++;
+
+ cc->nr_migratepages = nr_migratepages;
+ cc->nr_freepages = nr_freepages;
+}
+
+static inline int compact_finished(struct zone *zone,
+ struct compact_control *cc)
+{
+ /* Compaction run completes if the migrate and free scanner meet */
+ if (cc->free_pfn <= cc->migrate_pfn)
+ return COMPACT_COMPLETE;
+
+ return COMPACT_INCOMPLETE;
+}
+
+static int compact_zone(struct zone *zone, struct compact_control *cc)
+{
+ int ret = COMPACT_INCOMPLETE;
+
+ /* Setup to move all movable pages to the end of the zone */
+ cc->migrate_pfn = zone->zone_start_pfn;
+ cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
+ cc->free_pfn &= ~(pageblock_nr_pages-1);
+
+ for (; ret == COMPACT_INCOMPLETE; ret = compact_finished(zone, cc)) {
+ isolate_migratepages(zone, cc);
+
+ if (!cc->nr_migratepages)
+ continue;
+
+ /* Isolate free pages if necessary */
+ if (cc->nr_freepages < cc->nr_migratepages)
+ isolate_freepages(zone, cc);
+
+ /* Stop compacting if we cannot get enough free pages */
+ if (cc->nr_freepages < cc->nr_migratepages)
+ break;
+
+ migrate_pages(&cc->migratepages, compaction_alloc,
+ (unsigned long)cc);
+ update_nr_listpages(cc);
+ }
+
+ /* Release free pages and check accounting */
+ cc->nr_freepages -= release_freepages(zone, &cc->freepages);
+ WARN_ON(cc->nr_freepages != 0);
+
+ /* Release LRU pages not migrated */
+ if (!list_empty(&cc->migratepages))
+ putback_lru_pages(&cc->migratepages);
+
+ return ret;
+}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/Makefile linux-2.6.22-rc4-mm2-110_compact_zone/mm/Makefile
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/Makefile 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/mm/Makefile 2007-06-15 16:28:59.000000000 +0100
@@ -27,7 +27,7 @@ obj-$(CONFIG_SLAB) += slab.o
obj-$(CONFIG_SLUB) += slub.o
obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
obj-$(CONFIG_FS_XIP) += filemap_xip.o
-obj-$(CONFIG_MIGRATION) += migrate.o
+obj-$(CONFIG_MIGRATION) += migrate.o compaction.o
obj-$(CONFIG_SMP) += allocpercpu.o
obj-$(CONFIG_QUICKLIST) += quicklist.o

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/page_alloc.c linux-2.6.22-rc4-mm2-110_compact_zone/mm/page_alloc.c
--- linux-2.6.22-rc4-mm2-105_measure_fragmentation/mm/page_alloc.c 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-110_compact_zone/mm/page_alloc.c 2007-06-15 16:28:59.000000000 +0100
@@ -1060,6 +1060,44 @@ void split_page(struct page *page, unsig
set_page_refcounted(page + i);
}

+/* Similar to split_page except the page is already free */
+int split_free_page(struct page *page)
+{
+ int order;
+ struct zone *zone;
+
+ /* Should never happen but handle it anyway */
+ if (!page || !PageBuddy(page))
+ return 0;
+
+ zone = page_zone(page);
+ order = page_order(page);
+
+ /* Obey watermarks or the system could deadlock */
+ if (!zone_watermark_ok(zone, 0, zone->pages_low + (1 << order), 0, 0))
+ return 0;
+
+ /* Remove page from free list */
+ list_del(&page->lru);
+ zone->free_area[order].nr_free--;
+ rmv_page_order(page);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+
+ /* Split into individual pages */
+ set_page_refcounted(page);
+ split_page(page, order);
+
+ /* Set the migratetype of the block if necessary */
+ if (order >= pageblock_order - 1 &&
+ get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+ struct page *endpage = page + (1 << order) - 1;
+ for (; page < endpage; page += pageblock_nr_pages)
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ }
+
+ return 1 << order;
+}
+
/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch

2007-06-18 09:30:46

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 6/7] Add /proc/sys/vm/compact_node for the explicit compaction of a node

This patch adds a special file /proc/sys/vm/compact_node. When a number is
written to this file, each zone in that node will be compacted.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Andy Whitcroft <[email protected]>
---

include/linux/compaction.h | 7 +++++
kernel/sysctl.c | 13 +++++++++
mm/compaction.c | 54 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 74 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h
--- linux-2.6.22-rc4-mm2-110_compact_zone/include/linux/compaction.h 2007-06-15 16:28:59.000000000 +0100
+++ linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h 2007-06-15 16:29:08.000000000 +0100
@@ -5,4 +5,11 @@
#define COMPACT_INCOMPLETE 0
#define COMPACT_COMPLETE 1

+#ifdef CONFIG_MIGRATION
+
+extern int sysctl_compaction_handler(struct ctl_table *table, int write,
+ struct file *file, void __user *buffer,
+ size_t *length, loff_t *ppos);
+
+#endif /* CONFIG_MIGRATION */
#endif /* _LINUX_COMPACTION_H */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-110_compact_zone/kernel/sysctl.c linux-2.6.22-rc4-mm2-115_compact_viaproc/kernel/sysctl.c
--- linux-2.6.22-rc4-mm2-110_compact_zone/kernel/sysctl.c 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-115_compact_viaproc/kernel/sysctl.c 2007-06-15 16:29:08.000000000 +0100
@@ -47,6 +47,7 @@
#include <linux/nfs_fs.h>
#include <linux/acpi.h>
#include <linux/reboot.h>
+#include <linux/compaction.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -77,6 +78,7 @@ extern int printk_ratelimit_jiffies;
extern int printk_ratelimit_burst;
extern int pid_max_min, pid_max_max;
extern int sysctl_drop_caches;
+extern int sysctl_compact_node;
extern int percpu_pagelist_fraction;
extern int compat_log;
extern int maps_protect;
@@ -858,6 +860,17 @@ static ctl_table vm_table[] = {
.proc_handler = drop_caches_sysctl_handler,
.strategy = &sysctl_intvec,
},
+#ifdef CONFIG_MIGRATION
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "compact_node",
+ .data = &sysctl_compact_node,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = sysctl_compaction_handler,
+ .strategy = &sysctl_intvec,
+ },
+#endif /* CONFIG_MIGRATION */
{
.ctl_name = VM_MIN_FREE_KBYTES,
.procname = "min_free_kbytes",
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c
--- linux-2.6.22-rc4-mm2-110_compact_zone/mm/compaction.c 2007-06-15 16:28:59.000000000 +0100
+++ linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c 2007-06-15 16:29:08.000000000 +0100
@@ -6,6 +6,8 @@
*/
#include <linux/migrate.h>
#include <linux/compaction.h>
+#include <linux/swap.h>
+#include <linux/sysctl.h>
#include "internal.h"

/*
@@ -295,3 +297,55 @@ static int compact_zone(struct zone *zon

return ret;
}
+
+/* Compact all zones within a node */
+int compact_node(int nodeid)
+{
+ int zoneid;
+ pg_data_t *pgdat;
+ struct zone *zone;
+
+ if (nodeid < 0 || nodeid > nr_node_ids || !node_online(nodeid))
+ return -EINVAL;
+ pgdat = NODE_DATA(nodeid);
+
+ /* Flush pending updates to the LRU lists */
+ lru_add_drain_all();
+
+ printk(KERN_INFO "Compacting memory in node %d\n", nodeid);
+ for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+ struct compact_control cc;
+
+ zone = &pgdat->node_zones[zoneid];
+ if (!populated_zone(zone))
+ continue;
+
+ cc.nr_freepages = 0;
+ cc.nr_migratepages = 0;
+ INIT_LIST_HEAD(&cc.freepages);
+ INIT_LIST_HEAD(&cc.migratepages);
+
+ compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+ }
+ printk(KERN_INFO "Compaction of node %d complete\n", nodeid);
+
+ return 0;
+}
+
+/* This is global and fierce ugly but it's straight-forward */
+int sysctl_compact_node;
+
+/* This is the entry point for compacting nodes via /proc/sys/vm */
+int sysctl_compaction_handler(struct ctl_table *table, int write,
+ struct file *file, void __user *buffer,
+ size_t *length, loff_t *ppos)
+{
+ proc_dointvec(table, write, file, buffer, length, ppos);
+ if (write)
+ return compact_node(sysctl_compact_node);
+
+ return 0;
+}

2007-06-18 09:31:26

by Mel Gorman

[permalink] [raw]

Subject: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails

Ordinarily when a high-order allocation fails, direct reclaim is entered to
free pages to satisfy the allocation. With this patch, it is determined if
an allocation failed due to external fragmentation instead of low memory
and if so, the calling process will compact until a suitable page is
freed. Compaction by moving pages in memory is considerably cheaper than
paging out to disk and works where there are locked pages or no swap. If
compaction fails to free a page of a suitable size, then reclaim will
still occur.

Direct compaction returns as soon as possible. As each block is compacted,
it is checked if a suitable page has been freed and if so, it returns.
Signed-off-by: Mel Gorman <[email protected]>
---

include/linux/compaction.h | 12 ++++
include/linux/vmstat.h | 1
mm/compaction.c | 103 ++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 21 ++++++++
mm/vmstat.c | 4 +
5 files changed, 140 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/compaction.h
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/compaction.h 2007-06-15 16:29:08.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/compaction.h 2007-06-15 16:29:20.000000000 +0100
@@ -1,15 +1,25 @@
#ifndef _LINUX_COMPACTION_H
#define _LINUX_COMPACTION_H

-/* Return values for compact_zone() */
+/* Return values for compact_zone() and try_to_compact_pages() */
#define COMPACT_INCOMPLETE 0
#define COMPACT_COMPLETE 1
+#define COMPACT_PARTIAL 2

#ifdef CONFIG_MIGRATION

+extern int fragmentation_index(struct zone *zone, unsigned int target_order);
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
struct file *file, void __user *buffer,
size_t *length, loff_t *ppos);
+extern unsigned long try_to_compact_pages(struct zone **zones,
+ int order, gfp_t gfp_mask);

+#else
+static inline unsigned long try_to_compact_pages(struct zone **zones,
+ int order, gfp_t gfp_mask)
+{
+ return COMPACT_COMPLETE;
+}
#endif /* CONFIG_MIGRATION */
#endif /* _LINUX_COMPACTION_H */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/vmstat.h linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/vmstat.h
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/include/linux/vmstat.h 2007-06-13 23:43:12.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/include/linux/vmstat.h 2007-06-15 16:29:20.000000000 +0100
@@ -37,6 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
FOR_ALL_ZONES(PGSCAN_DIRECT),
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ COMPACTSTALL, COMPACTSUCCESS, COMPACTRACE,
NR_VM_EVENT_ITEMS
};

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c linux-2.6.22-rc4-mm2-120_compact_direct/mm/compaction.c
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/compaction.c 2007-06-15 16:29:08.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/mm/compaction.c 2007-06-15 16:32:27.000000000 +0100
@@ -24,6 +24,8 @@ struct compact_control {
unsigned long nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
+ int required_order; /* order a direct compactor needs */
+ int mtype; /* type of high-order page required */
};

static int release_freepages(struct zone *zone, struct list_head *freelist)
@@ -252,10 +254,29 @@ static void update_nr_listpages(struct c
static inline int compact_finished(struct zone *zone,
struct compact_control *cc)
{
+ int order;
+
/* Compaction run completes if the migrate and free scanner meet */
if (cc->free_pfn <= cc->migrate_pfn)
return COMPACT_COMPLETE;

+ if (cc->required_order == -1)
+ return COMPACT_INCOMPLETE;
+
+ /* Check for page of the appropriate type when direct compacting */
+ for (order = cc->required_order; order < MAX_ORDER; order++) {
+ /*
+ * If the current order is greater than pageblock_order, then
+ * the block is eligible for allocation
+ */
+ if (order >= pageblock_order && zone->free_area[order].nr_free)
+ return COMPACT_PARTIAL;
+
+ /* Otherwise use a page is free and of the right type */
+ if (!list_empty(&zone->free_area[order].free_list[cc->mtype]))
+ return COMPACT_PARTIAL;
+ }
+
return COMPACT_INCOMPLETE;
}

@@ -298,6 +319,87 @@ static int compact_zone(struct zone *zon
return ret;
}

+static inline unsigned long compact_zone_order(struct zone *zone,
+ int order, gfp_t gfp_mask)
+{
+ struct compact_control cc = {
+ .nr_freepages = 0,
+ .nr_migratepages = 0,
+ .required_order = order,
+ .mtype = allocflags_to_migratetype(gfp_mask),
+ };
+ INIT_LIST_HEAD(&cc.freepages);
+ INIT_LIST_HEAD(&cc.migratepages);
+
+ return compact_zone(zone, &cc);
+}
+
+/**
+ * try_to_compact_pages - Compact memory directly to satisfy a high-order allocation
+ * @zones: The zonelist used for the current allocation
+ * @order: The order of the current allocation
+ * @gfp_mask: The GFP mask of the current allocation
+ *
+ * This is the main entry point for direct page compaction.
+ *
+ * Returns 0 if compaction fails to free a page of the required size and type
+ * Returns non-zero on success
+ */
+unsigned long try_to_compact_pages(struct zone **zones,
+ int order, gfp_t gfp_mask)
+{
+ unsigned long watermark;
+ int may_enter_fs = gfp_mask & __GFP_FS;
+ int may_perform_io = gfp_mask & __GFP_IO;
+ int i;
+ int status = COMPACT_INCOMPLETE;
+
+ /* Check whether it is worth even starting compaction */
+ if (order == 0 || !may_enter_fs || !may_perform_io)
+ return status;
+
+ /* Flush pending updates to the LRU lists on the local CPU */
+ lru_add_drain();
+
+ /* Compact each zone in the list */
+ for (i = 0; zones[i] != NULL; i++) {
+ struct zone *zone = zones[i];
+ int fragindex;
+
+ /*
+ * If watermarks are not met, compaction will not help.
+ * Note that we check the watermarks at order-0 as we
+ * are assuming some free pages will coalesce
+ */
+ watermark = zone->pages_low + (1 << order);
+ if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+ continue;
+
+ /*
+ * fragmentation index determines if allocation failures are
+ * due to low memory or external fragmentation
+ *
+ * index of -1 implies allocations would succeed
+ * index < 50 implies alloc failure is due to lack of memory
+ */
+ fragindex = fragmentation_index(zone, order);
+ if (fragindex < 50)
+ continue;
+
+ status = compact_zone_order(zone, order, gfp_mask);
+ if (status == COMPACT_PARTIAL) {
+ count_vm_event(COMPACTSUCCESS);
+ break;
+ }
+ }
+
+ /* Account for it if we stalled due to compaction */
+ if (status != COMPACT_INCOMPLETE)
+ count_vm_event(COMPACTSTALL);
+
+ return status;
+}
+
/* Compact all zones within a node */
int compact_node(int nodeid)
{
@@ -322,6 +424,7 @@ int compact_node(int nodeid)

cc.nr_freepages = 0;
cc.nr_migratepages = 0;
+ cc.required_order = -1;
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/page_alloc.c linux-2.6.22-rc4-mm2-120_compact_direct/mm/page_alloc.c
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/page_alloc.c 2007-06-15 16:28:59.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/mm/page_alloc.c 2007-06-15 16:29:20.000000000 +0100
@@ -41,6 +41,7 @@
#include <linux/pfn.h>
#include <linux/backing-dev.h>
#include <linux/fault-inject.h>
+#include <linux/compaction.h>

#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -1670,6 +1671,26 @@ nofail_alloc:

cond_resched();

+ /* Try memory compaction for high-order allocations before reclaim */
+ if (order != 0) {
+ drain_all_local_pages();
+ did_some_progress = try_to_compact_pages(zonelist->zones,
+ order, gfp_mask);
+ if (did_some_progress == COMPACT_PARTIAL) {
+ page = get_page_from_freelist(gfp_mask, order,
+ zonelist, alloc_flags);
+
+ if (page)
+ goto got_pg;
+
+ /*
+ * It's a race if compaction frees a suitable page but
+ * someone else allocates it
+ */
+ count_vm_event(COMPACTRACE);
+ }
+ }
+
/* We now go into synchronous reclaim */
cpuset_memory_pressure_bump();
p->flags |= PF_MEMALLOC;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/vmstat.c linux-2.6.22-rc4-mm2-120_compact_direct/mm/vmstat.c
--- linux-2.6.22-rc4-mm2-115_compact_viaproc/mm/vmstat.c 2007-06-15 16:25:55.000000000 +0100
+++ linux-2.6.22-rc4-mm2-120_compact_direct/mm/vmstat.c 2007-06-15 16:29:20.000000000 +0100
@@ -882,6 +882,10 @@ static const char * const vmstat_text[]
"allocstall",

"pgrotated",
+
+ "compact_stall",
+ "compact_success",
+ "compact_race",
#endif
};

2007-06-18 16:56:28

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches

On Mon, 18 Jun 2007, Mel Gorman wrote:

> @@ -632,18 +632,27 @@ static int unmap_and_move(new_page_t get
> goto unlock;
> wait_on_page_writeback(page);
> }
> -
> + /* anon_vma should not be freed while migration. */
> + if (PageAnon(page)) {
> + rcu_read_lock();
> + rcu_locked = 1;
> + }

We agreed on doing rcu_read_lock removing the status variable
and checking for PageAnon(). Doing so deuglifies the
function.

2007-06-18 17:04:23

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA

On Mon, 18 Jun 2007, Mel Gorman wrote:

>
> CONFIG_MIGRATION currently depends on CONFIG_NUMA. move_pages() is the only
> user of migration today and as this system call is only meaningful on NUMA,
> it makes sense. However, memory compaction will operate within a zone and is

There are more user of migration. move_pages is one of them, then there is
cpuset process migration, MPOL_BIND page migration and sys_migrate_pages
for explicit process migration.

> useful on both NUMA and non-NUMA systems. This patch allows CONFIG_MIGRATION
> to be used in all memory models. To preserve existing behaviour, move_pages()
> is only available when CONFIG_NUMA is set.

What does this have to do with memory models? A bit unclear.

Otherwise

Acked-by: Christoph Lameter <[email protected]>

2007-06-18 17:05:18

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 3/7] Introduce isolate_lru_page_nolock() as a lockless version of isolate_lru_page()

Acked-by: Christoph Lameter <[email protected]>

2007-06-18 17:07:29

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 4/7] Provide metrics on the extent of fragmentation in zones

Good idea.

Signed-off-by: Christoph Lameter <[email protected]>

2007-06-18 17:18:57

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 5/7] Introduce a means of compacting memory within a zone

On Mon, 18 Jun 2007, Mel Gorman wrote:

> + /* Isolate free pages. This assumes the block is valid */
> + for (; blockpfn < end_pfn; blockpfn++) {
> + struct page *page;
> + int isolated, i;
> +
> + if (!pfn_valid_within(blockpfn))
> + continue;
> +
> + page = pfn_to_page(blockpfn);
> + if (!PageBuddy(page))
> + continue;

The name PageBuddy is getting to be misleading. Maybe rename this to
PageFree or so?

> +
> + /* Found a free page, break it into order-0 pages */
> + isolated = split_free_page(page);
> + total_isolated += isolated;
> + for (i = 0; i < isolated; i++) {
> + list_add(&page->lru, freelist);
> + page++;
> + }

Why do you need to break them all up? Easier to coalesce later?

> +/* Returns 1 if the page is within a block suitable for migration to */
> +static int pageblock_migratable(struct page *page)
> +{
> + /* If the page is a large free page, then allow migration */
> + if (PageBuddy(page) && page_order(page) >= pageblock_order)
> + return 1;

if (PageSlab(page) && page->slab->ops->kick) {
migratable slab
}

if (page table page) {
migratable page table page?
}

etc?

> + /* Try isolate the page */
> + if (locked_isolate_lru_page(zone, page, migratelist) == 0)
> + isolated++;

Support for other ways of migrating a page?

> +static int compact_zone(struct zone *zone, struct compact_control *cc)
> +{
> + int ret = COMPACT_INCOMPLETE;
> +
> + /* Setup to move all movable pages to the end of the zone */
> + cc->migrate_pfn = zone->zone_start_pfn;
> + cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> + cc->free_pfn &= ~(pageblock_nr_pages-1);
> +
> + for (; ret == COMPACT_INCOMPLETE; ret = compact_finished(zone, cc)) {
> + isolate_migratepages(zone, cc);
> +
> + if (!cc->nr_migratepages)
> + continue;
> +
> + /* Isolate free pages if necessary */
> + if (cc->nr_freepages < cc->nr_migratepages)
> + isolate_freepages(zone, cc);
> +
> + /* Stop compacting if we cannot get enough free pages */
> + if (cc->nr_freepages < cc->nr_migratepages)
> + break;
> +
> + migrate_pages(&cc->migratepages, compaction_alloc,
> + (unsigned long)cc);

You do not need to check the result of migration? Page migration is a best
effort that may fail.

Looks good otherwise.

Acked-by: Christoph Lameter <[email protected]>

2007-06-18 17:22:29

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails

You are amazing.

Acked-by: Christoph Lameter <[email protected]>

2007-06-18 17:24:53

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 0/7] Memory Compaction v2

On Mon, 18 Jun 2007, Mel Gorman wrote:

> The patchset implements memory compaction for the page allocator reducing
> external fragmentation so that free memory exists as fewer, but larger
> contiguous blocks. Instead of being a full defragmentation solution,
> this focuses exclusively on pages that are movable via the page migration
> mechanism.

We need an additional facility at some point that allows the moving of
pages that are not on the LRU. Such support seems to be possible
for page table pages and slab pages.

2007-06-19 12:56:04

by Yasunori Goto

[permalink] [raw]

Subject: Re: [PATCH 5/7] Introduce a means of compacting memory within a zone

Hi Mel-san.
This is very interesting feature.

Now, I'm testing your patches.

> +static int isolate_migratepages(struct zone *zone,
> + struct compact_control *cc)
> +{
> + unsigned long high_pfn, low_pfn, end_pfn, start_pfn;

(snip)

> + /* Time to isolate some pages for migration */
> + spin_lock_irq(&zone->lru_lock);
> + for (; low_pfn < end_pfn; low_pfn++) {
> + if (!pfn_valid_within(low_pfn))
> + continue;
> +
> + /* Get the page and skip if free */
> + page = pfn_to_page(low_pfn);

I met panic at here on my tiger4.

I compiled with CONFIG_SPARSEMEM. So, CONFIG_HOLES_IN_ZONE is not set.
pfn_valid_within() returns 1 every time on this configuration.
(This config is for only virtual memmap)
But, my tiger4 box has memory holes in normal zone.

When it is changed to normal pfn_valid(), no panic occurs.

Hmmm.

Bye.
--
Yasunori Goto

2007-06-19 15:52:48

by mel

[permalink] [raw]

Subject: Re: [PATCH 1/7] KAMEZAWA Hiroyuki hot-remove patches

On (18/06/07 09:56), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
>
> > @@ -632,18 +632,27 @@ static int unmap_and_move(new_page_t get
> > goto unlock;
> > wait_on_page_writeback(page);
> > }
> > -
> > + /* anon_vma should not be freed while migration. */
> > + if (PageAnon(page)) {
> > + rcu_read_lock();
> > + rcu_locked = 1;
> > + }
>
> We agreed on doing rcu_read_lock removing the status variable
> and checking for PageAnon(). Doing so deuglifies the
> function.

It makes it less ugly but when making the retry-logic for migration better I
was also routinely locking up my test-box hard. I intend to run this inside
a simulator so I can use gdb to figure out what is going wrong but for the
moment I've actually gone back to using a slightly modified anon_vma patch.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-19 15:59:49

by mel

[permalink] [raw]

Subject: Re: [PATCH 2/7] Allow CONFIG_MIGRATION to be set without CONFIG_NUMA

On (18/06/07 10:04), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
>
> >
> > CONFIG_MIGRATION currently depends on CONFIG_NUMA. move_pages() is the only
> > user of migration today and as this system call is only meaningful on NUMA,
> > it makes sense. However, memory compaction will operate within a zone and is
>
> There are more user of migration. move_pages is one of them, then there is
> cpuset process migration, MPOL_BIND page migration and sys_migrate_pages
> for explicit process migration.

Ok, this was poor phrasing. Each of those features are NUMA related even
though the core migration mechanism is not dependant on NUMA.

>
> > useful on both NUMA and non-NUMA systems. This patch allows CONFIG_MIGRATION
> > to be used in all memory models. To preserve existing behaviour, move_pages()
> > is only available when CONFIG_NUMA is set.
>
> What does this have to do with memory models? A bit unclear.
>

More poor phrasing. It would have been clearer to simply say that the
patch allows CONFIG_MIGRATION to be used without NUMA.

> Otherwise
>
> Acked-by: Christoph Lameter <[email protected]>

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-19 16:36:21

by mel

[permalink] [raw]

Subject: Re: [PATCH 5/7] Introduce a means of compacting memory within a zone

On (18/06/07 10:18), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
>
> > + /* Isolate free pages. This assumes the block is valid */
> > + for (; blockpfn < end_pfn; blockpfn++) {
> > + struct page *page;
> > + int isolated, i;
> > +
> > + if (!pfn_valid_within(blockpfn))
> > + continue;
> > +
> > + page = pfn_to_page(blockpfn);
> > + if (!PageBuddy(page))
> > + continue;
>
> The name PageBuddy is getting to be misleading. Maybe rename this to
> PageFree or so?
>

That would be suprisingly ambiguous. per-cpu pages are free pages but are not
PageBuddy pages. In this case, I really mean a PageBuddy page, not a free page.

> > +
> > + /* Found a free page, break it into order-0 pages */
> > + isolated = split_free_page(page);
> > + total_isolated += isolated;
> > + for (i = 0; i < isolated; i++) {
> > + list_add(&page->lru, freelist);
> > + page++;
> > + }
>
> Why do you need to break them all up? Easier to coalesce later?
>

They are broken up because migration currently works on order-0 pages.
It is easier to break them up now for compaction_alloc() to give out one
at a time than trying to figure out how to split them up later.

> > +/* Returns 1 if the page is within a block suitable for migration to */
> > +static int pageblock_migratable(struct page *page)
> > +{
> > + /* If the page is a large free page, then allow migration */
> > + if (PageBuddy(page) && page_order(page) >= pageblock_order)
> > + return 1;
>
> if (PageSlab(page) && page->slab->ops->kick) {
> migratable slab
> }
>
> if (page table page) {
> migratable page table page?
> }
>
> etc?
>

Not quite. pageblock_migratable() is telling if this block is suitable for
taking free pages from so movable pages can be migrated there. Right now
that means checking if there are enough free pages that the whole block
becomes MOVABLE or if the block is already being used for movable pages.

The block could become movable if the decision was made to kick out slab
pages that are located towards the end of the zone. If page tables
become movable, then they would need to be identified here but that is
not the case.

The pageblock_migratable() function is named so that this decision can
be easily revisited in one place.

> > + /* Try isolate the page */
> > + if (locked_isolate_lru_page(zone, page, migratelist) == 0)
> > + isolated++;
>
> Support for other ways of migrating a page?
>

When other mechanisms exist, they would be added here. Right now,
isolate_lru_page() is the only one I am aware of.

> > +static int compact_zone(struct zone *zone, struct compact_control *cc)
> > +{
> > + int ret = COMPACT_INCOMPLETE;
> > +
> > + /* Setup to move all movable pages to the end of the zone */
> > + cc->migrate_pfn = zone->zone_start_pfn;
> > + cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> > + cc->free_pfn &= ~(pageblock_nr_pages-1);
> > +
> > + for (; ret == COMPACT_INCOMPLETE; ret = compact_finished(zone, cc)) {
> > + isolate_migratepages(zone, cc);
> > +
> > + if (!cc->nr_migratepages)
> > + continue;
> > +
> > + /* Isolate free pages if necessary */
> > + if (cc->nr_freepages < cc->nr_migratepages)
> > + isolate_freepages(zone, cc);
> > +
> > + /* Stop compacting if we cannot get enough free pages */
> > + if (cc->nr_freepages < cc->nr_migratepages)
> > + break;
> > +
> > + migrate_pages(&cc->migratepages, compaction_alloc,
> > + (unsigned long)cc);
>
> You do not need to check the result of migration? Page migration is a best
> effort that may fail.
>

You're right. I used to check it for debugging purposes to make sure migration
was actually occuring. It is not unusual still for a fair number of pages
to fail to migrate. migration already uses a retry logic and I shouldn't
be replicating it.

More importantly, by leaving the pages on the migratelist, I potentially
retry the same migrations over and over again wasting time and effort not
to mention that I keep pages isolated for much longer than necessary and
that could cause stalling problems. I should be calling putback_lru_pages()
when migrate_pages() tells me it failed to migrate pages.

I'll revisit this one. Thanks

> Looks good otherwise.
>
> Acked-by: Christoph Lameter <[email protected]>

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-19 16:49:38

by mel

[permalink] [raw]

Subject: Re: [PATCH 5/7] Introduce a means of compacting memory within a zone

On (19/06/07 21:54), Yasunori Goto didst pronounce:
> Hi Mel-san.
> This is very interesting feature.
>
> Now, I'm testing your patches.
>
> > +static int isolate_migratepages(struct zone *zone,
> > + struct compact_control *cc)
> > +{
> > + unsigned long high_pfn, low_pfn, end_pfn, start_pfn;
>
> (snip)
>
> > + /* Time to isolate some pages for migration */
> > + spin_lock_irq(&zone->lru_lock);
> > + for (; low_pfn < end_pfn; low_pfn++) {
> > + if (!pfn_valid_within(low_pfn))
> > + continue;
> > +
> > + /* Get the page and skip if free */
> > + page = pfn_to_page(low_pfn);
>
> I met panic at here on my tiger4.
>

How annoying.

> I compiled with CONFIG_SPARSEMEM. So, CONFIG_HOLES_IN_ZONE is not set.
> pfn_valid_within() returns 1 every time on this configuration.

As it should.

> (This config is for only virtual memmap)
> But, my tiger4 box has memory holes in normal zone.
>
> When it is changed to normal pfn_valid(), no panic occurs.
>

It's because I never check if the MAX_ORDER block is valid before
isolating. This needs to be implemented just like what
isolate_freepages() and isolate_freepages_block() does. Change it to
pfn_valid() for the moment and I'll have this one fixed up properly in
the next version.

> Hmmm.
>
> Bye.

Thanks for testing.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-19 16:51:12

by mel

[permalink] [raw]

Subject: Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails

On (18/06/07 10:22), Christoph Lameter didst pronounce:
> You are amazing.
>

Thanks!

There are still knots that need ironing out but I believe the core idea
is solid and can be built into something useful.

Thanks for reviewing.

> Acked-by: Christoph Lameter <[email protected]>
>

--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-19 16:58:52

by mel

[permalink] [raw]

Subject: Re: [PATCH 0/7] Memory Compaction v2

On (18/06/07 10:24), Christoph Lameter didst pronounce:
> On Mon, 18 Jun 2007, Mel Gorman wrote:
>
> > The patchset implements memory compaction for the page allocator reducing
> > external fragmentation so that free memory exists as fewer, but larger
> > contiguous blocks. Instead of being a full defragmentation solution,
> > this focuses exclusively on pages that are movable via the page migration
> > mechanism.
>
> We need an additional facility at some point that allows the moving of
> pages that are not on the LRU. Such support seems to be possible
> for page table pages and slab pages.

Agreed. When I put this together first, I felt I would be able to isolate
pages of different types on migratelist but that is not the case as migration
would not be able to tell the difference between a LRU page and a pagetable
page. I'll rename cc->migratelist to cc->migratelist_lru with the view to
potentially adding cc->migratelist_pagetable or cc->migratelist_slab later.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2007-06-19 19:21:00

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 5/7] Introduce a means of compacting memory within a zone

On Tue, 19 Jun 2007, Mel Gorman wrote:

> When other mechanisms exist, they would be added here. Right now,
> isolate_lru_page() is the only one I am aware of.

Did you have a look at kmem_cache_vacate in the slab defrag patchset?

> > You do not need to check the result of migration? Page migration is a best
> > effort that may fail.

> You're right. I used to check it for debugging purposes to make sure migration
> was actually occuring. It is not unusual still for a fair number of pages
> to fail to migrate. migration already uses a retry logic and I shouldn't
> be replicating it.
>
> More importantly, by leaving the pages on the migratelist, I potentially
> retry the same migrations over and over again wasting time and effort not
> to mention that I keep pages isolated for much longer than necessary and
> that could cause stalling problems. I should be calling putback_lru_pages()
> when migrate_pages() tells me it failed to migrate pages.

No the putback_lru is done for you.

> I'll revisit this one. Thanks

You could simply ignore it if you do not care if its migrated or not.

2007-06-19 19:22:43

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH 0/7] Memory Compaction v2

On Tue, 19 Jun 2007, Mel Gorman wrote:

> Agreed. When I put this together first, I felt I would be able to isolate
> pages of different types on migratelist but that is not the case as migration
> would not be able to tell the difference between a LRU page and a pagetable
> page. I'll rename cc->migratelist to cc->migratelist_lru with the view to
> potentially adding cc->migratelist_pagetable or cc->migratelist_slab later.

Right. The particular issue with moving page table pages or slab pages is
that you do not have a LRU. The page state needs to be established in a
different way and there needs to be mechanism to ensure that the page is
not currently being setup or torn down. For the slab pages I have relied
on page->inuse > 0 to signify a page in use. I am not sure how one would
realize that for page table pages.

2007-06-21 12:28:52

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails

> On Mon, 18 Jun 2007 10:30:42 +0100 (IST) Mel Gorman <[email protected]> wrote:
> +
> + /*
> + * It's a race if compaction frees a suitable page but
> + * someone else allocates it
> + */
> + count_vm_event(COMPACTRACE);
> + }

Could perhaps cause arbitrarily long starvation. A fix would be to free
the synchronously-compacted higher-order page into somewhere which is
private to this task (a new field in task_struct would be one such place).

2007-06-21 13:26:42

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH 7/7] Compact memory directly by a process when a high-order allocation fails

Andrew Morton wrote:
>> On Mon, 18 Jun 2007 10:30:42 +0100 (IST) Mel Gorman <[email protected]> wrote:
>> +
>> + /*
>> + * It's a race if compaction frees a suitable page but
>> + * someone else allocates it
>> + */
>> + count_vm_event(COMPACTRACE);
>> + }
>
> Could perhaps cause arbitrarily long starvation.

More likely it will just fail allocations where it could have succeeded.
I knew the situation would occur so I thought I would count how often it
happens before doing.

> A fix would be to free
> the synchronously-compacted higher-order page into somewhere which is
> private to this task (a new field in task_struct would be one such place).

There used to be such fields and a process flag PF_FREE_PAGES for a
similar purpose. I'll look into reintroducing it. Thanks

--
Mel Gorman