2011-06-30 14:55:52

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 00/10] Prevent LRU churning

Changelog since v3
o Patch reordering - suggested by Mel and Michal
o Bug fix of trace-vmscan-postprocess - pointed out by Mel
o Clean up(function naming, mistakes in previos version)
o bitwise type usage for isolate_mode_t - suggested by Mel
o Add comment ilru handling in migrate.c - suggested by Mel
o Reduce zone->lru_lock - pointed out by Mel

Changelog since v2
o Remove ISOLATE_BOTH - suggested by Johannes
o change description slightly
o Clean up unman_and_move
o Add Reviewed-by and Acked-by

Changelog since v1
o Rebase on 2.6.39
o change description slightly

There are some places to isolate and putback pages.
For example, compaction does it for getting contiguous page.
The problem is that if we isolate page in the middle of LRU and putback it
we lose LRU history as putback_lru_page inserts the page into head of LRU list.

LRU history is important parameter to select victim page in curre page reclaim
when memory pressure is heavy. Unfortunately, if someone want to allocate high-order page
and memory pressure is heavy, it would trigger compaction and we end up lost LRU history.
It means we can evict working set pages and system latency would be high.

This patch is for solving the problem with two methods.

* Anti-churning
when we isolate page on LRU list, let's not isolate page we can't handle
* De-churning
when we putback page on LRU list in migration, let's insert new page into old page's lru position.

[1,2,5/10] is just clean up.
[3,4/10] is related to Anti-churning.
[6,7,8/10] is related to De-churning.
[9/10] is adding to new tracepoints which is never for merge but just show the effect.
[10/10] is enhancement of ilru.

I test it in my mahchine(2G DRAM, Intel Core 2 Duo), test scenario is following as.

1) Boot up
2) decompress 10G compressed file
3) Run many applications and switch attached script(which is made by Wu) and
4) kernel compile

I think this is worst-case scenario since there are many contiguous pages when machine boots up.
It means system memory isn't aging so that many pages are contiguous-LRU order. It could make
bad effect on inorder-lru but I solved the problem. Please see description of [6/9].

Test result is following as.

1) Elapased time 10GB file decompressed.
Old inorder inorder + pagevec flush[10/10]
01:47:50.88 01:43:16.16 01:40:27.18

2) failure of inorder lru
For test, it isolated 375756 pages. Only 45875 pages(12%) are put backed to
out-of-order(ie, head of LRU) Others, 329963 pages(88%) are put backed to in-order
(ie, position of old page in LRU).

Welcome to any comments.

You can see Wu's test script and all-at-once patch in following URL.
http://www.kernel.org/pub/linux/kernel/people/minchan/inorder_putback/v4/

Minchan Kim (10):
[1/10] compaction: trivial clean up acct_isolated
[2/10] Change isolate mode from #define to bitwise type
[3/10] compaction: make isolate_lru_page with filter aware
[4/10] zone_reclaim: make isolate_lru_page with filter aware
[5/10] migration: clean up unmap_and_move
[6/10] migration: introudce migrate_ilru_pages
[7/10] compaction: make compaction use in-order putback
[8/10] ilru: reduce zone->lru_lock
[9/10] add inorder-lru tracepoints for just measurement
[10/10] compaction: add drain ilru of pagevec

.../trace/postprocess/trace-vmscan-postprocess.pl | 8 +-
include/linux/memcontrol.h | 3 +-
include/linux/migrate.h | 87 ++++++++
include/linux/mm_types.h | 22 ++-
include/linux/mmzone.h | 12 +
include/linux/pagevec.h | 1 +
include/linux/swap.h | 15 +-
include/trace/events/inorder_putback.h | 88 ++++++++
include/trace/events/vmscan.h | 8 +-
mm/compaction.c | 47 ++---
mm/internal.h | 1 +
mm/memcontrol.c | 3 +-
mm/migrate.c | 213 ++++++++++++++++---
mm/swap.c | 217 +++++++++++++++++++-
mm/vmscan.c | 122 ++++++++---
15 files changed, 738 insertions(+), 109 deletions(-)
create mode 100644 include/trace/events/inorder_putback.h

--
1.7.4.1


2011-06-30 14:55:57

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 01/10] compaction: trivial clean up acct_isolated

acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
fields in conpact_control.
This patch removes fields from compact_control and makes clear function of
acct_issolated which counts the number of anon|file pages isolated.

Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 18 +++++-------------
1 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 6cc604b..b2977a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -35,10 +35,6 @@ struct compact_control {
unsigned long migrate_pfn; /* isolate_migratepages search base */
bool sync; /* Synchronous migration */

- /* Account for isolated anon and file pages */
- unsigned long nr_anon;
- unsigned long nr_file;
-
unsigned int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
@@ -223,17 +219,13 @@ static void isolate_freepages(struct zone *zone,
static void acct_isolated(struct zone *zone, struct compact_control *cc)
{
struct page *page;
- unsigned int count[NR_LRU_LISTS] = { 0, };
+ unsigned int count[2] = { 0, };

- list_for_each_entry(page, &cc->migratepages, lru) {
- int lru = page_lru_base_type(page);
- count[lru]++;
- }
+ list_for_each_entry(page, &cc->migratepages, lru)
+ count[!!page_is_file_cache(page)]++;

- cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
- cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
- __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
}

/* Similar to reclaim, but different enough that they don't share logic */
--
1.7.4.1

2011-06-30 14:56:05

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 02/10] Change isolate mode from #define to bitwise type

This patch changes ISOLATE_XXX macro with bitwise isolate_mode_t type.
Normally, macro isn't recommended as it's type-unsafe and making debugging harder
as symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags. INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Changelog since V3
o use isolate_mode_t bitwise type - suggested by Mel
o fix trace-vmscan-postprocess.pl - pointed out by Mel

Changelog since V2
o Remove ISOLATE_BOTH - suggested by Johannes Weiner

Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
.../trace/postprocess/trace-vmscan-postprocess.pl | 8 ++--
include/linux/memcontrol.h | 3 +-
include/linux/mmzone.h | 8 ++++
include/linux/swap.h | 7 +---
include/trace/events/vmscan.h | 8 ++--
mm/compaction.c | 3 +-
mm/memcontrol.c | 3 +-
mm/vmscan.c | 37 +++++++++++---------
8 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
index 12cecc8..4a37c47 100644
--- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
+++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
@@ -379,10 +379,10 @@ EVENT_PROCESS:

# To closer match vmstat scanning statistics, only count isolate_both
# and isolate_inactive as scanning. isolate_active is rotation
- # isolate_inactive == 0
- # isolate_active == 1
- # isolate_both == 2
- if ($isolate_mode != 1) {
+ # isolate_inactive == 1
+ # isolate_active == 2
+ # isolate_both == 3
+ if ($isolate_mode != 2) {
$perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned;
}
$perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty;
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 970f32c..815d16e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -35,7 +35,8 @@ enum mem_cgroup_page_stat_item {
extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
struct list_head *dst,
unsigned long *scanned, int order,
- int mode, struct zone *z,
+ isolate_mode_t mode,
+ struct zone *z,
struct mem_cgroup *mem_cont,
int active, int file);

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 795ec6c..0efdd6e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -158,6 +158,14 @@ static inline int is_unevictable_lru(enum lru_list l)
return (l == LRU_UNEVICTABLE);
}

+/* Isolate inactive pages */
+#define ISOLATE_INACTIVE ((__force fmode_t)0x1)
+/* Isolate active pages */
+#define ISOLATE_ACTIVE ((__force fmode_t)0x2)
+
+/* LRU Isolation modes. */
+typedef unsigned __bitwise__ isolate_mode_t;
+
enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 808690a..03727bf 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -245,11 +245,6 @@ static inline void lru_cache_add_file(struct page *page)
__lru_cache_add(page, LRU_INACTIVE_FILE);
}

-/* LRU Isolation modes. */
-#define ISOLATE_INACTIVE 0 /* Isolate inactive pages. */
-#define ISOLATE_ACTIVE 1 /* Isolate active pages. */
-#define ISOLATE_BOTH 2 /* Isolate both active and inactive pages. */
-
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *mask);
@@ -261,7 +256,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
unsigned int swappiness,
struct zone *zone,
unsigned long *nr_scanned);
-extern int __isolate_lru_page(struct page *page, int mode, int file);
+extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index b2c33bd..04203b8 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -189,7 +189,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
unsigned long nr_lumpy_taken,
unsigned long nr_lumpy_dirty,
unsigned long nr_lumpy_failed,
- int isolate_mode),
+ isolate_mode_t isolate_mode),

TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode),

@@ -201,7 +201,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
__field(unsigned long, nr_lumpy_taken)
__field(unsigned long, nr_lumpy_dirty)
__field(unsigned long, nr_lumpy_failed)
- __field(int, isolate_mode)
+ __field(isolate_mode_t, isolate_mode)
),

TP_fast_assign(
@@ -235,7 +235,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
unsigned long nr_lumpy_taken,
unsigned long nr_lumpy_dirty,
unsigned long nr_lumpy_failed,
- int isolate_mode),
+ isolate_mode_t isolate_mode),

TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)

@@ -250,7 +250,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
unsigned long nr_lumpy_taken,
unsigned long nr_lumpy_dirty,
unsigned long nr_lumpy_failed,
- int isolate_mode),
+ isolate_mode_t isolate_mode),

TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)

diff --git a/mm/compaction.c b/mm/compaction.c
index b2977a5..47f717f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,7 +349,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}

/* Try isolate the page */
- if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
+ if (__isolate_lru_page(page,
+ ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
continue;

VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 88890b4..2105673 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1246,7 +1246,8 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
struct list_head *dst,
unsigned long *scanned, int order,
- int mode, struct zone *z,
+ isolate_mode_t mode,
+ struct zone *z,
struct mem_cgroup *mem_cont,
int active, int file)
{
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8ff834e..70bcf21 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -972,23 +972,27 @@ keep_lumpy:
*
* returns 0 on success, -ve errno on failure.
*/
-int __isolate_lru_page(struct page *page, int mode, int file)
+int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
{
+ bool all_lru_mode;
int ret = -EINVAL;

/* Only take pages on the LRU. */
if (!PageLRU(page))
return ret;

+ all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
+ (ISOLATE_ACTIVE|ISOLATE_INACTIVE);
+
/*
* When checking the active state, we need to be sure we are
* dealing with comparible boolean values. Take the logical not
* of each.
*/
- if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode))
+ if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
return ret;

- if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
+ if (!all_lru_mode && !!page_is_file_cache(page) != file)
return ret;

/*
@@ -1036,7 +1040,8 @@ int __isolate_lru_page(struct page *page, int mode, int file)
*/
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
- unsigned long *scanned, int order, int mode, int file)
+ unsigned long *scanned, int order, isolate_mode_t mode,
+ int file)
{
unsigned long nr_taken = 0;
unsigned long nr_lumpy_taken = 0;
@@ -1161,8 +1166,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
static unsigned long isolate_pages_global(unsigned long nr,
struct list_head *dst,
unsigned long *scanned, int order,
- int mode, struct zone *z,
- int active, int file)
+ isolate_mode_t mode,
+ struct zone *z, int active, int file)
{
int lru = LRU_BASE;
if (active)
@@ -1408,6 +1413,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
unsigned long nr_taken;
unsigned long nr_anon;
unsigned long nr_file;
+ isolate_mode_t reclaim_mode = ISOLATE_INACTIVE;

while (unlikely(too_many_isolated(zone, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1418,15 +1424,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
}

set_reclaim_mode(priority, sc, false);
+ if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
+ reclaim_mode |= ISOLATE_ACTIVE;
+
lru_add_drain();
spin_lock_irq(&zone->lru_lock);

if (scanning_global_lru(sc)) {
- nr_taken = isolate_pages_global(nr_to_scan,
- &page_list, &nr_scanned, sc->order,
- sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
- ISOLATE_BOTH : ISOLATE_INACTIVE,
- zone, 0, file);
+ nr_taken = isolate_pages_global(nr_to_scan, &page_list,
+ &nr_scanned, sc->order, reclaim_mode, zone, 0, file);
zone->pages_scanned += nr_scanned;
if (current_is_kswapd())
__count_zone_vm_events(PGSCAN_KSWAPD, zone,
@@ -1435,12 +1441,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
__count_zone_vm_events(PGSCAN_DIRECT, zone,
nr_scanned);
} else {
- nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
- &page_list, &nr_scanned, sc->order,
- sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
- ISOLATE_BOTH : ISOLATE_INACTIVE,
- zone, sc->mem_cgroup,
- 0, file);
+ nr_taken = mem_cgroup_isolate_pages(nr_to_scan, &page_list,
+ &nr_scanned, sc->order, reclaim_mode, zone,
+ sc->mem_cgroup, 0, file);
/*
* mem_cgroup_isolate_pages() keeps track of
* scanned pages on its own.
--
1.7.4.1

2011-06-30 14:56:15

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 03/10] compaction: make isolate_lru_page with filter aware

In async mode, compaction doesn't migrate dirty or writeback pages.
So, it's meaningless to pick the page and re-add it to lru list.

Of course, when we isolate the page in compaction, the page might
be dirty or writeback but when we try to migrate the page, the page
would be not dirty, writeback. So it could be migrated. But it's
very unlikely as isolate and migration cycle is much faster than
writeout.

So, this patch helps cpu overhead and prevent unnecessary LRU churning.

Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/mmzone.h | 3 ++-
mm/compaction.c | 7 +++++--
mm/vmscan.c | 3 +++
3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0efdd6e..84819b5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -162,7 +162,8 @@ static inline int is_unevictable_lru(enum lru_list l)
#define ISOLATE_INACTIVE ((__force fmode_t)0x1)
/* Isolate active pages */
#define ISOLATE_ACTIVE ((__force fmode_t)0x2)
-
+/* Isolate clean file */
+#define ISOLATE_CLEAN ((__force fmode_t)0x4)
/* LRU Isolation modes. */
typedef unsigned __bitwise__ isolate_mode_t;

diff --git a/mm/compaction.c b/mm/compaction.c
index 47f717f..a0e4202 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,6 +261,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
unsigned long last_pageblock_nr = 0, pageblock_nr;
unsigned long nr_scanned = 0, nr_isolated = 0;
struct list_head *migratelist = &cc->migratepages;
+ isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;

/* Do not scan outside zone boundaries */
low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
@@ -348,9 +349,11 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
continue;
}

+ if (!cc->sync)
+ mode |= ISOLATE_CLEAN;
+
/* Try isolate the page */
- if (__isolate_lru_page(page,
- ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
+ if (__isolate_lru_page(page, mode, 0) != 0)
continue;

VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 70bcf21..6f6d443 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1005,6 +1005,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)

ret = -EBUSY;

+ if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
+ return ret;
+
if (likely(get_page_unless_zero(page))) {
/*
* Be careful not to clear PageLRU until after we're
--
1.7.4.1

2011-06-30 14:56:24

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 04/10] zone_reclaim: make isolate_lru_page with filter aware

In __zone_reclaim case, we don't want to shrink mapped page.
Nonetheless, we have isolated mapped page and re-add it into
LRU's head. It's unnecessary CPU overhead and makes LRU churning.

Of course, when we isolate the page, the page might be mapped but
when we try to migrate the page, the page would be not mapped.
So it could be migrated. But race is rare and although it happens,
it's no big deal.

Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/mmzone.h | 3 +++
mm/vmscan.c | 20 ++++++++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 84819b5..1d1791f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -164,6 +164,9 @@ static inline int is_unevictable_lru(enum lru_list l)
#define ISOLATE_ACTIVE ((__force fmode_t)0x2)
/* Isolate clean file */
#define ISOLATE_CLEAN ((__force fmode_t)0x4)
+/* Isolate unmapped file */
+#define ISOLATE_UNMAPPED ((__force fmode_t)0x8)
+
/* LRU Isolation modes. */
typedef unsigned __bitwise__ isolate_mode_t;

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6f6d443..132d2d7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1008,6 +1008,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
return ret;

+ if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
+ return ret;
+
if (likely(get_page_unless_zero(page))) {
/*
* Be careful not to clear PageLRU until after we're
@@ -1431,6 +1434,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
reclaim_mode |= ISOLATE_ACTIVE;

lru_add_drain();
+
+ if (!sc->may_unmap)
+ reclaim_mode |= ISOLATE_UNMAPPED;
+ if (!sc->may_writepage)
+ reclaim_mode |= ISOLATE_CLEAN;
+
spin_lock_irq(&zone->lru_lock);

if (scanning_global_lru(sc)) {
@@ -1548,19 +1557,26 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
struct page *page;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
unsigned long nr_rotated = 0;
+ isolate_mode_t reclaim_mode = ISOLATE_ACTIVE;

lru_add_drain();
+
+ if (!sc->may_unmap)
+ reclaim_mode |= ISOLATE_UNMAPPED;
+ if (!sc->may_writepage)
+ reclaim_mode |= ISOLATE_CLEAN;
+
spin_lock_irq(&zone->lru_lock);
if (scanning_global_lru(sc)) {
nr_taken = isolate_pages_global(nr_pages, &l_hold,
&pgscanned, sc->order,
- ISOLATE_ACTIVE, zone,
+ reclaim_mode, zone,
1, file);
zone->pages_scanned += pgscanned;
} else {
nr_taken = mem_cgroup_isolate_pages(nr_pages, &l_hold,
&pgscanned, sc->order,
- ISOLATE_ACTIVE, zone,
+ reclaim_mode, zone,
sc->mem_cgroup, 1, file);
/*
* mem_cgroup_isolate_pages() keeps track of
--
1.7.4.1

2011-06-30 14:57:31

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 05/10] migration: clean up unmap_and_move

The unmap_and_move is one of big messy functions.
This patch try to clean up.

It can help readability and make unmap_and_move_ilru simple.
unmap_and_move_ilru will be introduced by next patch.

Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/migrate.c | 75 +++++++++++++++++++++++++++++++---------------------------
1 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 666e4e6..71713fc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -621,38 +621,18 @@ static int move_to_new_page(struct page *newpage, struct page *page,
return rc;
}

-/*
- * Obtain the lock on page, remove all ptes and migrate the page
- * to the newly allocated page in newpage.
- */
-static int unmap_and_move(new_page_t get_new_page, unsigned long private,
- struct page *page, int force, bool offlining, bool sync)
+static int __unmap_and_move(struct page *page, struct page *newpage,
+ int force, bool offlining, bool sync)
{
- int rc = 0;
- int *result = NULL;
- struct page *newpage = get_new_page(page, private, &result);
+ int rc = -EAGAIN;
int remap_swapcache = 1;
int charge = 0;
struct mem_cgroup *mem;
struct anon_vma *anon_vma = NULL;

- if (!newpage)
- return -ENOMEM;
-
- if (page_count(page) == 1) {
- /* page was freed from under us. So we are done. */
- goto move_newpage;
- }
- if (unlikely(PageTransHuge(page)))
- if (unlikely(split_huge_page(page)))
- goto move_newpage;
-
- /* prepare cgroup just returns 0 or -ENOMEM */
- rc = -EAGAIN;
-
if (!trylock_page(page)) {
if (!force || !sync)
- goto move_newpage;
+ goto out;

/*
* It's not safe for direct compaction to call lock_page.
@@ -668,7 +648,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
* altogether.
*/
if (current->flags & PF_MEMALLOC)
- goto move_newpage;
+ goto out;

lock_page(page);
}
@@ -785,27 +765,52 @@ uncharge:
mem_cgroup_end_migration(mem, page, newpage, rc == 0);
unlock:
unlock_page(page);
+out:
+ return rc;
+}

-move_newpage:
+/*
+ * Obtain the lock on page, remove all ptes and migrate the page
+ * to the newly allocated page in newpage.
+ */
+static int unmap_and_move(new_page_t get_new_page, unsigned long private,
+ struct page *page, int force, bool offlining, bool sync)
+{
+ int rc = 0;
+ int *result = NULL;
+ struct page *newpage = get_new_page(page, private, &result);
+
+ if (!newpage)
+ return -ENOMEM;
+
+ if (page_count(page) == 1) {
+ /* page was freed from under us. So we are done. */
+ goto out;
+ }
+
+ if (unlikely(PageTransHuge(page)))
+ if (unlikely(split_huge_page(page)))
+ goto out;
+
+ rc = __unmap_and_move(page, newpage, force, offlining, sync);
+out:
if (rc != -EAGAIN) {
- /*
- * A page that has been migrated has all references
- * removed and will be freed. A page that has not been
- * migrated will have kepts its references and be
- * restored.
- */
- list_del(&page->lru);
+ /*
+ * A page that has been migrated has all references
+ * removed and will be freed. A page that has not been
+ * migrated will have kepts its references and be
+ * restored.
+ */
+ list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
putback_lru_page(page);
}
-
/*
* Move the new page to the LRU. If migration was not successful
* then this will free the page.
*/
putback_lru_page(newpage);
-
if (result) {
if (rc)
*result = rc;
--
1.7.4.1

2011-06-30 14:56:38

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 06/10] migration: introudce migrate_ilru_pages

This patch defines new APIs to put back new page into old page's position as LRU order.
for LRU churning of compaction.

The idea I suggested in LSF/MM is simple.

When we try to put back the page into lru list and if friends(prev, next) of the page
still is nearest neighbor, we can insert isolated page into prev's next instead of
head of LRU list. So it keeps LRU history without losing the LRU information.

Before :
LRU POV : H - P1 - P2 - P3 - P4 -T

Isolate P3 :
LRU POV : H - P1 - P2 - P4 - T

Putback P3 :
if (P2->next == P4)
putback(P3, P2);
So,
LRU POV : H - P1 - P2 - P3 - P4 -T

I implemented this idea in RFC but it had two problems.

1)
For implement, I defined new structure _pages_lru_ which remembers
both lru friend pages of isolated one and handling functions.
For space of pages_lru, I allocated the space dynamically in kmalloc(GFP_AOTMIC)
but as we know, compaction is a reclaim path so it's not good idea to allocate memory
dynamically in the path. The space need to store pages_lru is enough to allocate just a page
as current compaction migrates unit of chunk of 32 pages.
In addition, compaction makes sure lots of order-0 free pages before starting
so it wouldn't a big problem, I think. But I admit it can pin some pages
so migration successful ratio might be down if concurrent compaction happens.

I decide changing my mind. I don't use dynamic memory space any more.
As I see migration, we don't need doubly linked list of page->lru.
Whole of operation is performed with enumeration so I think singly linked list is enough.
If we can use singly linked list, we can use a pointer as another buffer.
In here, we use it to store prev LRU page of page isolated.

2)
The page-relation approach had a problem on contiguous pages.
That's because the idea can not work since friend pages are isolated, too.
It means prev_page->next == next_page always is _false_ and both pages are not
LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
So for solving the problem, I changed the idea.
We don't need both friend(prev, next) pages relation but just consider
either prev or next page that it is still same LRU

Worst case in this approach, prev or next page is free and allocate new
so it's in head of LRU and our isolated page is located on next of head.
But it's almost same situation with current problem. So it doesn't make worse
than now.

New idea works below.

How to work inorder_lru
Assumption : we isolate pages P3-P7 and we consider only prev LRU pointer.
Notation : (P3,P2) = (isolated page, previous LRU page of isolated page)

H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T

If we isolate P3,
H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
Isolated page list - (P3,P2)

If we isolate P4,

H - P1 - P2 - P5 - P6 - P7 - P8 - P9 - P10 - T
Isolated page list - (P4,P2) - (P3,P2)

If we isolate P5,

H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
Isolated page list - (P5,P2) - (P4,P2) - (P3,P2)

..

If we isolate P7, following as
H - P1 - P2 - P8 - P9 - P10 - T
Isolated page list - (P7,P2) - (P6,P2) - (P5,P2) - (P4,P2) - (P3,P2)

Let's start putback from P7

P7.
H - P1 - P2 - P8 - P9 - P10 - T
prev P2 is on still LRU so P7 would be located at P2's next.
H - P1 - P2 - P7 - P8 - P9 - P10 - T

P6.
H - P1 - P2 - P7 - P8 - P9 - P10 - T
prev P2 is on still LRU so P6 would be located at P2's next.
H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T

P5.
..

P3.
H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
prev P2 is on still LRU so P3 would be located at P2's next.
H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T

In addtion, this patch introduces new API *migrate_ilru_pages* which
is aware of inorder_lru putback. So newpage is located at old page's
LRU position.

[migrate_pages vs migrate_ilru_pages]

1) we need handle singly linked list.
The page->lru isn't doubly linked list any more in inorder_lru handling
So migrate_ilru_pages have to handle singly linked list instead of doubly lined list.

2) We need defer old page's putback.
At present, during migration, old page would be freed through unmap_and_move's
putback_lru_page. It has a problem in inorder-putback's logic.
The same_lru in migrate_ilru_pages checks old pages's PageLRU and something
for determining whether the page can be located at old page's position or not.
If old page is freed before handling inorder-lru list, it ends up having !PageLRU
and same_lru returns 'false' so that inorder putback would become no-op.

3) we need adjust prev_page of inorder_lru page list when we putback newpage
and free old page.

For example,

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4 - P3 - P2 - P1 - T
inorder_lru : 0

We isolate P2,P3,P4 so inorder_lru has following list

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P1 - T
inorder_lru : (P4,P5) - (P3,P4) - (P2,P3)

After 1st putback,

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P1 - T
inorder_lru : (P3,P4) - (P2,P3)
P4' is newpage and P4(ie, old page) would freed

In 2nd putback, P3 would find P4 in same_lru but P4 is in buddy
so it returns 'false' then inorder_lru doesn't work any more.
The bad effect continues until P2. That's too bad.
For fixing, this patch defines adjust_ilru_prev_page.
It works following as.

Notation)
PHY : page physical layout on memory
LRU : page logical layout as LRU order
ilru : inorder_lru list
PN : old page(ie, source page of migration)
PN' : new page(ie, destination page of migration)

Let's assume there is below layout.
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4 - P3 - P2 - P1 - T
ilru :

We isolate P2,P3,P4 so inorder_lru has following as.

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P1 - T
ilru : (P4,P5) - (P3,P4) - (P2,P3)

After 1st putback happens,

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P1 - T
ilru : (P3,P4) - (P2,P3)
P4' is a newpage and P4(ie, old page) would freed

In 2nd putback, P3 would try findding P4 but P4 would be freed.
so same_lru returns 'false' so that inorder_lru doesn't work any more.
The bad effect continues until P2. That's too bad.
For fixing, we define adjust_ilru_prev_page. It works following as.

After 1st putback,
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P1 - T
ilru : (P3,P4') - (P2,P3)
It replaces prev pointer of pages remained in inorder_lru list with
new one's so in 2nd putback,

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P3' - P1 - T
ilru : (P2,P3')

In 3rd putback,

PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P3' - P2' - P1 - T
ilru :

Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/migrate.h | 87 +++++++++++++++++
include/linux/mm_types.h | 18 +++-
include/linux/swap.h | 4 +
mm/internal.h | 1 +
mm/migrate.c | 242 +++++++++++++++++++++++++++++++++++++++++++++-
mm/swap.c | 2 +-
mm/vmscan.c | 51 ++++++++++
7 files changed, 402 insertions(+), 3 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..62724e1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -9,12 +9,99 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
#ifdef CONFIG_MIGRATION
#define PAGE_MIGRATION 1

+/* How to work inorder_lru
+ * Assumption : we isolate pages P3-P7 and we consider only prev LRU pointer.
+ * Notation : (P3,P2) = (isolated page, previous LRU page of isolated page)
+ *
+ * H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
+ *
+ * If we isolate P3,
+ *
+ * H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
+ * Isolated page list - (P3,P2)
+ *
+ * If we isolate P4,
+ *
+ * H - P1 - P2 - P5 - P6 - P7 - P8 - P9 - P10 - T
+ * Isolated page list - (P4,P2) - (P3,P2)
+ *
+ * If we isolate P5,
+ *
+ * H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
+ * Isolated page list - (P5,P2) - (P4,P2) - (P3,P2)
+ *
+ * ..
+ *
+ * If we isolate P7, following as
+ * H - P1 - P2 - P8 - P9 - P10 - T
+ * Isolated page list - (P7,P2) - (P6,P2) - (P5,P2) - (P4,P2) - (P3,P2)
+ *
+ * Let's start putback from P7
+ *
+ * P7.
+ * H - P1 - P2 - P8 - P9 - P10 - T
+ * prev P2 is on still LRU so P7 would be located at P2's next.
+ * H - P1 - P2 - P7 - P8 - P9 - P10 - T
+ *
+ * P6.
+ * H - P1 - P2 - P7 - P8 - P9 - P10 - T
+ * prev P2 is on still LRU so P6 would be located at P2's next.
+ * H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
+ *
+ * P5.
+ * ..
+ *
+ * P3.
+ * H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
+ * prev P2 is on still LRU so P3 would be located at P2's next.
+ * H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
+ */
+
+/*
+ * ilru_list is singly linked list and used for compaction
+ * for keeping LRU ordering.
+ */
+static inline void INIT_ILRU_LIST(struct inorder_lru *list)
+{
+ list->prev_page = NULL;
+ list->next = list;
+}
+
+static inline int ilru_list_empty(const struct inorder_lru *head)
+{
+ return head->next == head;
+}
+
+static inline void ilru_list_add(struct page *page, struct page *prev_page,
+ struct inorder_lru *head)
+{
+ VM_BUG_ON(PageLRU(page));
+
+ page->ilru.prev_page = prev_page;
+ page->ilru.next = head->next;
+ head->next = &page->ilru;
+}
+
+static inline void ilru_list_del(struct page *page, struct inorder_lru *head)
+{
+ head->next = page->ilru.next;
+}
+
+#define list_for_each_ilru_entry list_for_each_entry
+#define list_for_each_ilru_entry_safe list_for_each_entry_safe
+
+extern void putback_ilru_pages(struct inorder_lru *l);
extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
+
+extern int migrate_ilru_pages(struct inorder_lru *l, new_page_t x,
+ unsigned long private, bool offlining,
+ bool sync);
+
extern int migrate_huge_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 027935c..3634c04 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -24,6 +24,19 @@ struct address_space;

#define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)

+struct page;
+
+/*
+ * The inorder_lru is used by compaction for keeping LRU order
+ * during migration.
+ */
+struct inorder_lru {
+ /* prev LRU page of isolated page */
+ struct page *prev_page;
+ /* next for singly linked list*/
+ struct inorder_lru *next;
+};
+
/*
* Each physical page in the system has a struct page associated with
* it to keep track of whatever it is we are using the page for at the
@@ -72,9 +85,12 @@ struct page {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* SLUB: freelist req. slab lock */
};
- struct list_head lru; /* Pageout list, eg. active_list
+ union {
+ struct inorder_lru ilru;/* compaction: migrated page list */
+ struct list_head lru; /* Pageout list, eg. active_list
* protected by zone->lru_lock !
*/
+ };
/*
* On machines where all RAM is mapped into kernel address space,
* we can simply calculate the virtual address. On machines with
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 03727bf..2208412 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -228,6 +228,8 @@ extern int lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
extern void deactivate_page(struct page *page);
extern void swap_setup(void);
+extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
+ int file, int rotated);

extern void add_page_to_unevictable_list(struct page *page);

@@ -257,6 +259,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
struct zone *zone,
unsigned long *nr_scanned);
extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
+extern int isolate_ilru_page(struct page *page, isolate_mode_t mode, int file,
+ struct page **prev_page);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/internal.h b/mm/internal.h
index d071d38..8a919c7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -42,6 +42,7 @@ extern unsigned long highest_memmap_pfn;
/*
* in mm/vmscan.c:
*/
+extern void putback_page_to_lru(struct page *page, struct page *head_page);
extern int isolate_lru_page(struct page *page);
extern void putback_lru_page(struct page *page);

diff --git a/mm/migrate.c b/mm/migrate.c
index 71713fc..b997de5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -85,6 +85,50 @@ void putback_lru_pages(struct list_head *l)
}

/*
+ * Check if page and prev are on same LRU.
+ * zone->lru_lock must be hold.
+ */
+static bool same_lru(struct page *page, struct page *prev)
+{
+ bool ret = false;
+ if (!prev || !PageLRU(prev))
+ goto out;
+
+ if (unlikely(PageUnevictable(prev)))
+ goto out;
+
+ if (page_lru_base_type(page) != page_lru_base_type(prev))
+ goto out;
+
+ ret = true;
+out:
+ return ret;
+}
+
+
+void putback_ilru_pages(struct inorder_lru *l)
+{
+ struct zone *zone;
+ struct page *page, *page2, *prev;
+
+ list_for_each_ilru_entry_safe(page, page2, l, ilru) {
+ ilru_list_del(page, l);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+ zone = page_zone(page);
+ spin_lock_irq(&zone->lru_lock);
+ prev = page->ilru.prev_page;
+ if (same_lru(page, prev)) {
+ putback_page_to_lru(page, prev);
+ spin_unlock_irq(&zone->lru_lock);
+ put_page(page);
+ } else {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(page);
+ }
+ }
+}
+/*
* Restore a potential migration pte to a working pte entry
*/
static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
@@ -821,6 +865,151 @@ out:
}

/*
+ * We need adjust prev_page of ilru_list when we putback newpage
+ * and free old page. Let's think about it.
+ * For example,
+ *
+ * Notation)
+ * PHY : page physical layout on memory
+ * LRU : page logical layout as LRU order
+ * ilru : inorder_lru list
+ * PN : old page(ie, source page of migration)
+ * PN' : new page(ie, destination page of migration)
+ *
+ * Let's assume there is below layout.
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4 - P3 - P2 - P1 - T
+ * ilru :
+ *
+ * We isolate P2,P3,P4 so inorder_lru has following as.
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P1 - T
+ * ilru : (P4,P5) - (P3,P4) - (P2,P3)
+ *
+ * After 1st putback happens,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P1 - T
+ * ilru : (P3,P4) - (P2,P3)
+ * P4' is a newpage and P4(ie, old page) would freed
+ *
+ * In 2nd putback, P3 would try findding P4 but P4 would be freed.
+ * so same_lru returns 'false' so that inorder_lru doesn't work any more.
+ * The bad effect continues until P2. That's too bad.
+ * For fixing, we define adjust_ilru_prev_page. It works following as.
+ *
+ * After 1st putback,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P1 - T
+ * ilru : (P3,P4') - (P2,P3)
+ * It replaces prev pointer of pages remained in inorder_lru list with
+ * new one's so in 2nd putback,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P3' - P1 - T
+ * ilru : (P2,P3')
+ *
+ * In 3rd putback,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P3' - P2' - P1 - T
+ * ilru :
+ */
+static inline void adjust_ilru_prev_page(struct inorder_lru *head,
+ struct page *prev_page, struct page *new_page)
+{
+ struct page *page;
+ list_for_each_ilru_entry(page, head, ilru)
+ if (page->ilru.prev_page == prev_page)
+ page->ilru.prev_page = new_page;
+}
+
+void __put_ilru_pages(struct page *page, struct page *newpage,
+ struct inorder_lru *prev_lru, struct inorder_lru *ihead)
+{
+ struct page *prev_page;
+ struct zone *zone;
+ prev_page = page->ilru.prev_page;
+ /*
+ * A page that has been migrated has all references
+ * removed and will be freed. A page that has not been
+ * migrated will have kepts its references and be
+ * restored.
+ */
+ ilru_list_del(page, prev_lru);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+
+ /*
+ * Move the new page to the LRU. If migration was not successful
+ * then this will free the page.
+ */
+ zone = page_zone(newpage);
+ spin_lock_irq(&zone->lru_lock);
+ if (same_lru(page, prev_page)) {
+ putback_page_to_lru(newpage, prev_page);
+ spin_unlock_irq(&zone->lru_lock);
+ /*
+ * The newpage replaced LRU position of old page and
+ * old one would be freed. So let's adjust prev_page of pages
+ * remained in inorder_lru list.
+ */
+ adjust_ilru_prev_page(ihead, page, newpage);
+ put_page(newpage);
+ } else {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(newpage);
+ }
+
+ putback_lru_page(page);
+}
+
+/*
+ * Counterpart of unmap_and_move() for compaction.
+ * The logic is almost same with unmap_and_move. The difference is that
+ * this function handles inorder_lru for locating new page into old pages's
+ * LRU position.
+ */
+static int unmap_and_move_ilru(new_page_t get_new_page, unsigned long private,
+ struct page *page, int force, bool offlining, bool sync,
+ struct inorder_lru *prev_lru, struct inorder_lru *ihead)
+{
+ int rc = 0;
+ int *result = NULL;
+ struct page *newpage = get_new_page(page, private, &result);
+
+ if (!newpage)
+ return -ENOMEM;
+
+ if (page_count(page) == 1) {
+ /* page was freed from under us. So we are done. */
+ goto out;
+ }
+
+ if (unlikely(PageTransHuge(page)))
+ if (unlikely(split_huge_page(page)))
+ goto out;
+
+ rc = __unmap_and_move(page, newpage, force, offlining, sync);
+out:
+ if (rc != -EAGAIN)
+ __put_ilru_pages(page, newpage, prev_lru, ihead);
+ else
+ putback_lru_page(newpage);
+
+ if (result) {
+ if (rc)
+ *result = rc;
+ else
+ *result = page_to_nid(newpage);
+ }
+
+ return rc;
+}
+
+/*
* Counterpart of unmap_and_move_page() for hugepage migration.
*
* This function doesn't wait the completion of hugepage I/O
@@ -920,7 +1109,7 @@ int migrate_pages(struct list_head *from,
if (!swapwrite)
current->flags |= PF_SWAPWRITE;

- for(pass = 0; pass < 10 && retry; pass++) {
+ for (pass = 0; pass < 10 && retry; pass++) {
retry = 0;

list_for_each_entry_safe(page, page2, from, lru) {
@@ -956,6 +1145,57 @@ out:
return nr_failed + retry;
}

+int migrate_ilru_pages(struct inorder_lru *ihead, new_page_t get_new_page,
+ unsigned long private, bool offlining, bool sync)
+{
+ int retry = 1;
+ int nr_failed = 0;
+ int pass = 0;
+ struct page *page, *page2;
+ struct inorder_lru *prev;
+ int swapwrite = current->flags & PF_SWAPWRITE;
+ int rc;
+
+ if (!swapwrite)
+ current->flags |= PF_SWAPWRITE;
+
+ for (pass = 0; pass < 10 && retry; pass++) {
+ retry = 0;
+ prev = ihead;
+ list_for_each_ilru_entry_safe(page, page2, ihead, ilru) {
+ cond_resched();
+
+ rc = unmap_and_move_ilru(get_new_page, private,
+ page, pass > 2, offlining,
+ sync, prev, ihead);
+
+ switch (rc) {
+ case -ENOMEM:
+ goto out;
+ case -EAGAIN:
+ retry++;
+ prev = &page->ilru;
+ break;
+ case 0:
+ break;
+ default:
+ /* Permanent failure */
+ nr_failed++;
+ break;
+ }
+ }
+ }
+ rc = 0;
+out:
+ if (!swapwrite)
+ current->flags &= ~PF_SWAPWRITE;
+
+ if (rc)
+ return rc;
+
+ return nr_failed + retry;
+}
+
int migrate_huge_pages(struct list_head *from,
new_page_t get_new_page, unsigned long private, bool offlining,
bool sync)
diff --git a/mm/swap.c b/mm/swap.c
index 3a442f1..bdaf329 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
}
}

-static void update_page_reclaim_stat(struct zone *zone, struct page *page,
+void update_page_reclaim_stat(struct zone *zone, struct page *page,
int file, int rotated)
{
struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 132d2d7..45eadee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -566,6 +566,35 @@ int remove_mapping(struct address_space *mapping, struct page *page)
}

/**
+ * putback_page_to_lru - put isolated @page onto @head
+ * @page: page to be put back to appropriate lru list
+ * @head_page: lru position to be put back
+ *
+ * Insert previously isolated @page to appropriate position of lru list
+ * zone->lru_lock must be hold.
+ */
+void putback_page_to_lru(struct page *page, struct page *head_page)
+{
+ int lru, active, file;
+ struct zone *zone = page_zone(page);
+
+ VM_BUG_ON(PageLRU(page));
+
+ lru = page_lru(head_page);
+ active = is_active_lru(lru);
+ file = is_file_lru(lru);
+
+ if (active)
+ SetPageActive(page);
+ else
+ ClearPageActive(page);
+
+ update_page_reclaim_stat(zone, page, file, active);
+ SetPageLRU(page);
+ __add_page_to_lru_list(zone, page, lru, &head_page->lru);
+}
+
+/**
* putback_lru_page - put previously isolated page onto appropriate LRU list
* @page: page to be put back to appropriate lru list
*
@@ -1025,6 +1054,28 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
}

/*
+ * It's same with __isolate_lru_page except that it returns previous page
+ * of page isolated as LRU order if isolation is successful.
+ */
+int isolate_ilru_page(struct page *page, isolate_mode_t mode, int file,
+ struct page **prev_page)
+{
+ int ret = __isolate_lru_page(page, mode, file);
+ if (!ret) {
+ struct zone *zone = page_zone(page);
+ enum lru_list l = page_lru(page);
+ if (&zone->lru[l].list == page->lru.prev) {
+ *prev_page = NULL;
+ return ret;
+ }
+
+ *prev_page = lru_to_page(&page->lru);
+ }
+
+ return ret;
+}
+
+/*
* zone->lru_lock is heavily contended. Some of the functions that
* shrink the lists perform better by taking out a batch of pages
* and working on them outside the LRU lock.
--
1.7.4.1

2011-06-30 14:57:28

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 07/10] compaction: make compaction use in-order putback

Compaction is good solution to get contiguous page but it makes
LRU churing which is not good. Moreover, LRU order is important
when VM has memory pressure to select right victim pages.

This patch makes that compaction code use inorder putback so
after compaction completion, migrated pages are keeping LRU ordering.

Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 25 +++++++++++++------------
1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a0e4202..7bc784a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -28,7 +28,7 @@
*/
struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
- struct list_head migratepages; /* List of pages being migrated */
+ struct inorder_lru migratepages;/* List of pages being migrated */
unsigned long nr_freepages; /* Number of isolated free pages */
unsigned long nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
@@ -221,7 +221,7 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
struct page *page;
unsigned int count[2] = { 0, };

- list_for_each_entry(page, &cc->migratepages, lru)
+ list_for_each_ilru_entry(page, &cc->migratepages, ilru)
count[!!page_is_file_cache(page)]++;

__mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
@@ -260,7 +260,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
unsigned long low_pfn, end_pfn;
unsigned long last_pageblock_nr = 0, pageblock_nr;
unsigned long nr_scanned = 0, nr_isolated = 0;
- struct list_head *migratelist = &cc->migratepages;
+ struct inorder_lru *migratelist = &cc->migratepages;
isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;

/* Do not scan outside zone boundaries */
@@ -295,7 +295,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
cond_resched();
spin_lock_irq(&zone->lru_lock);
for (; low_pfn < end_pfn; low_pfn++) {
- struct page *page;
+ struct page *page, *prev_page;
bool locked = true;

/* give a chance to irqs before checking need_resched() */
@@ -353,14 +353,14 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
mode |= ISOLATE_CLEAN;

/* Try isolate the page */
- if (__isolate_lru_page(page, mode, 0) != 0)
+ if (isolate_ilru_page(page, mode, 0, &prev_page) != 0)
continue;

VM_BUG_ON(PageTransCompound(page));

/* Successfully isolated */
del_page_from_lru_list(zone, page, page_lru(page));
- list_add(&page->lru, migratelist);
+ ilru_list_add(page, prev_page, migratelist);
cc->nr_migratepages++;
nr_isolated++;

@@ -416,7 +416,7 @@ static void update_nr_listpages(struct compact_control *cc)
int nr_freepages = 0;
struct page *page;

- list_for_each_entry(page, &cc->migratepages, lru)
+ list_for_each_ilru_entry(page, &cc->migratepages, ilru)
nr_migratepages++;
list_for_each_entry(page, &cc->freepages, lru)
nr_freepages++;
@@ -553,7 +553,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
}

nr_migrate = cc->nr_migratepages;
- err = migrate_pages(&cc->migratepages, compaction_alloc,
+ err = migrate_ilru_pages(&cc->migratepages,
+ compaction_alloc,
(unsigned long)cc, false,
cc->sync);
update_nr_listpages(cc);
@@ -568,7 +569,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)

/* Release LRU pages not migrated */
if (err) {
- putback_lru_pages(&cc->migratepages);
+ putback_ilru_pages(&cc->migratepages);
cc->nr_migratepages = 0;
}

@@ -595,7 +596,7 @@ unsigned long compact_zone_order(struct zone *zone,
.sync = sync,
};
INIT_LIST_HEAD(&cc.freepages);
- INIT_LIST_HEAD(&cc.migratepages);
+ INIT_ILRU_LIST(&cc.migratepages);

return compact_zone(zone, &cc);
}
@@ -677,12 +678,12 @@ static int compact_node(int nid)

cc.zone = zone;
INIT_LIST_HEAD(&cc.freepages);
- INIT_LIST_HEAD(&cc.migratepages);
+ INIT_ILRU_LIST(&cc.migratepages);

compact_zone(zone, &cc);

VM_BUG_ON(!list_empty(&cc.freepages));
- VM_BUG_ON(!list_empty(&cc.migratepages));
+ VM_BUG_ON(!ilru_list_empty(&cc.migratepages));
}

return 0;
--
1.7.4.1

2011-06-30 14:56:48

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 08/10] ilru: reduce zone->lru_lock

inorder_lru increases zone->lru_lock overhead(pointed out by Mel)
as it doesn't support pagevec.
This patch introduces ilru_add_pvecs and APIs.

The problem of this approach is that we lost information of old page
(ie, source of migration) when pagevec drain happens.
For solving this problem, I introuce old_page in inorder_lru.
It can union with next of struct inorder_lru as new page(ie,
destination of migration) is always detached from ilru list so we can
use next pointer as keeping old page.

Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/mm_types.h | 8 ++-
include/linux/pagevec.h | 1 +
include/linux/swap.h | 2 +
mm/internal.h | 2 +-
mm/migrate.c | 139 ++++----------------------------
mm/swap.c | 199 ++++++++++++++++++++++++++++++++++++++++++++++
mm/vmscan.c | 65 ++++++---------
7 files changed, 251 insertions(+), 165 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3634c04..db192c7 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -33,8 +33,12 @@ struct page;
struct inorder_lru {
/* prev LRU page of isolated page */
struct page *prev_page;
- /* next for singly linked list*/
- struct inorder_lru *next;
+ union {
+ /* next for singly linked list*/
+ struct inorder_lru *next;
+ /* the source page of migration */
+ struct page *old_page;
+ };
};

/*
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index bab82f4..8f609ea 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -23,6 +23,7 @@ struct pagevec {
void __pagevec_release(struct pagevec *pvec);
void __pagevec_free(struct pagevec *pvec);
void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
+void ____pagevec_ilru_add(struct pagevec *pvec, enum lru_list lru);
void pagevec_strip(struct pagevec *pvec);
unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
pgoff_t start, unsigned nr_pages);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2208412..78f5249 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -217,7 +217,9 @@ extern unsigned int nr_free_pagecache_pages(void);


/* linux/mm/swap.c */
+extern void __ilru_cache_add(struct page *, enum lru_list lru);
extern void __lru_cache_add(struct page *, enum lru_list lru);
+extern void lru_cache_add_ilru(struct page *, enum lru_list lru);
extern void lru_cache_add_lru(struct page *, enum lru_list lru);
extern void lru_add_page_tail(struct zone* zone,
struct page *page, struct page *page_tail);
diff --git a/mm/internal.h b/mm/internal.h
index 8a919c7..cb969e0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -42,8 +42,8 @@ extern unsigned long highest_memmap_pfn;
/*
* in mm/vmscan.c:
*/
-extern void putback_page_to_lru(struct page *page, struct page *head_page);
extern int isolate_lru_page(struct page *page);
+extern void putback_ilru_page(struct page *page);
extern void putback_lru_page(struct page *page);

/*
diff --git a/mm/migrate.c b/mm/migrate.c
index b997de5..cf73477 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -84,48 +84,15 @@ void putback_lru_pages(struct list_head *l)
}
}

-/*
- * Check if page and prev are on same LRU.
- * zone->lru_lock must be hold.
- */
-static bool same_lru(struct page *page, struct page *prev)
-{
- bool ret = false;
- if (!prev || !PageLRU(prev))
- goto out;
-
- if (unlikely(PageUnevictable(prev)))
- goto out;
-
- if (page_lru_base_type(page) != page_lru_base_type(prev))
- goto out;
-
- ret = true;
-out:
- return ret;
-}
-
-
void putback_ilru_pages(struct inorder_lru *l)
{
- struct zone *zone;
- struct page *page, *page2, *prev;
-
+ struct page *page, *page2;
list_for_each_ilru_entry_safe(page, page2, l, ilru) {
ilru_list_del(page, l);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- zone = page_zone(page);
- spin_lock_irq(&zone->lru_lock);
- prev = page->ilru.prev_page;
- if (same_lru(page, prev)) {
- putback_page_to_lru(page, prev);
- spin_unlock_irq(&zone->lru_lock);
- put_page(page);
- } else {
- spin_unlock_irq(&zone->lru_lock);
- putback_lru_page(page);
- }
+ page->ilru.old_page = page;
+ putback_ilru_page(page);
}
}
/*
@@ -864,74 +831,19 @@ out:
return rc;
}

-/*
- * We need adjust prev_page of ilru_list when we putback newpage
- * and free old page. Let's think about it.
- * For example,
- *
- * Notation)
- * PHY : page physical layout on memory
- * LRU : page logical layout as LRU order
- * ilru : inorder_lru list
- * PN : old page(ie, source page of migration)
- * PN' : new page(ie, destination page of migration)
- *
- * Let's assume there is below layout.
- * PHY : H - P1 - P2 - P3 - P4 - P5 - T
- * LRU : H - P5 - P4 - P3 - P2 - P1 - T
- * ilru :
- *
- * We isolate P2,P3,P4 so inorder_lru has following as.
- *
- * PHY : H - P1 - P2 - P3 - P4 - P5 - T
- * LRU : H - P5 - P1 - T
- * ilru : (P4,P5) - (P3,P4) - (P2,P3)
- *
- * After 1st putback happens,
- *
- * PHY : H - P1 - P2 - P3 - P4 - P5 - T
- * LRU : H - P5 - P4' - P1 - T
- * ilru : (P3,P4) - (P2,P3)
- * P4' is a newpage and P4(ie, old page) would freed
- *
- * In 2nd putback, P3 would try findding P4 but P4 would be freed.
- * so same_lru returns 'false' so that inorder_lru doesn't work any more.
- * The bad effect continues until P2. That's too bad.
- * For fixing, we define adjust_ilru_prev_page. It works following as.
- *
- * After 1st putback,
- *
- * PHY : H - P1 - P2 - P3 - P4 - P5 - T
- * LRU : H - P5 - P4' - P1 - T
- * ilru : (P3,P4') - (P2,P3)
- * It replaces prev pointer of pages remained in inorder_lru list with
- * new one's so in 2nd putback,
- *
- * PHY : H - P1 - P2 - P3 - P4 - P5 - T
- * LRU : H - P5 - P4' - P3' - P1 - T
- * ilru : (P2,P3')
- *
- * In 3rd putback,
- *
- * PHY : H - P1 - P2 - P3 - P4 - P5 - T
- * LRU : H - P5 - P4' - P3' - P2' - P1 - T
- * ilru :
- */
-static inline void adjust_ilru_prev_page(struct inorder_lru *head,
- struct page *prev_page, struct page *new_page)
-{
- struct page *page;
- list_for_each_ilru_entry(page, head, ilru)
- if (page->ilru.prev_page == prev_page)
- page->ilru.prev_page = new_page;
-}
-
void __put_ilru_pages(struct page *page, struct page *newpage,
- struct inorder_lru *prev_lru, struct inorder_lru *ihead)
+ struct inorder_lru *prev_lru)
{
struct page *prev_page;
- struct zone *zone;
prev_page = page->ilru.prev_page;
+
+ newpage->ilru.prev_page = prev_page;
+ /*
+ * We need keeping old page which is the source page
+ * of migration for adjusting prev_page of pages in pagevec.
+ * Look at adjust_ilru_list.
+ */
+ newpage->ilru.old_page = page;
/*
* A page that has been migrated has all references
* removed and will be freed. A page that has not been
@@ -941,29 +853,13 @@ void __put_ilru_pages(struct page *page, struct page *newpage,
ilru_list_del(page, prev_lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
+ putback_lru_page(page);

/*
* Move the new page to the LRU. If migration was not successful
* then this will free the page.
*/
- zone = page_zone(newpage);
- spin_lock_irq(&zone->lru_lock);
- if (same_lru(page, prev_page)) {
- putback_page_to_lru(newpage, prev_page);
- spin_unlock_irq(&zone->lru_lock);
- /*
- * The newpage replaced LRU position of old page and
- * old one would be freed. So let's adjust prev_page of pages
- * remained in inorder_lru list.
- */
- adjust_ilru_prev_page(ihead, page, newpage);
- put_page(newpage);
- } else {
- spin_unlock_irq(&zone->lru_lock);
- putback_lru_page(newpage);
- }
-
- putback_lru_page(page);
+ putback_ilru_page(newpage);
}

/*
@@ -974,7 +870,7 @@ void __put_ilru_pages(struct page *page, struct page *newpage,
*/
static int unmap_and_move_ilru(new_page_t get_new_page, unsigned long private,
struct page *page, int force, bool offlining, bool sync,
- struct inorder_lru *prev_lru, struct inorder_lru *ihead)
+ struct inorder_lru *prev_lru)
{
int rc = 0;
int *result = NULL;
@@ -995,7 +891,7 @@ static int unmap_and_move_ilru(new_page_t get_new_page, unsigned long private,
rc = __unmap_and_move(page, newpage, force, offlining, sync);
out:
if (rc != -EAGAIN)
- __put_ilru_pages(page, newpage, prev_lru, ihead);
+ __put_ilru_pages(page, newpage, prev_lru);
else
putback_lru_page(newpage);

@@ -1166,8 +1062,7 @@ int migrate_ilru_pages(struct inorder_lru *ihead, new_page_t get_new_page,
cond_resched();

rc = unmap_and_move_ilru(get_new_page, private,
- page, pass > 2, offlining,
- sync, prev, ihead);
+ page, pass > 2, offlining, sync, prev);

switch (rc) {
case -ENOMEM:
diff --git a/mm/swap.c b/mm/swap.c
index bdaf329..611013d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -37,6 +37,7 @@
/* How many pages do we try to swap or page in/out together? */
int page_cluster;

+static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], ilru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
@@ -179,6 +180,33 @@ void put_pages_list(struct list_head *pages)
}
EXPORT_SYMBOL(put_pages_list);

+static void pagevec_ilru_move_fn(struct pagevec *pvec,
+ void (*move_fn)(struct page *page, void *arg, int idx),
+ void *arg)
+{
+ int i;
+ struct zone *zone = NULL;
+ unsigned long flags = 0;
+
+ for (i = 0; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+ struct zone *pagezone = page_zone(page);
+
+ if (pagezone != zone) {
+ if (zone)
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ zone = pagezone;
+ spin_lock_irqsave(&zone->lru_lock, flags);
+ }
+
+ (*move_fn)(page, arg, i);
+ }
+ if (zone)
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ release_pages(pvec->pages, pvec->nr, pvec->cold);
+ pagevec_reinit(pvec);
+}
+
static void pagevec_lru_move_fn(struct pagevec *pvec,
void (*move_fn)(struct page *page, void *arg),
void *arg)
@@ -348,6 +376,16 @@ void mark_page_accessed(struct page *page)

EXPORT_SYMBOL(mark_page_accessed);

+void __ilru_cache_add(struct page *page, enum lru_list lru)
+{
+ struct pagevec *pvec = &get_cpu_var(ilru_add_pvecs)[lru];
+
+ page_cache_get(page);
+ if (!pagevec_add(pvec, page))
+ ____pagevec_ilru_add(pvec, lru);
+ put_cpu_var(ilru_add_pvecs);
+}
+
void __lru_cache_add(struct page *page, enum lru_list lru)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
@@ -360,6 +398,25 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
EXPORT_SYMBOL(__lru_cache_add);

/**
+ * lru_cache_add_ilru - add a page to a page list
+ * @page: the page to be added to the LRU.
+ * @lru: the LRU list to which the page is added.
+ */
+void lru_cache_add_ilru(struct page *page, enum lru_list lru)
+{
+ if (PageActive(page)) {
+ VM_BUG_ON(PageUnevictable(page));
+ ClearPageActive(page);
+ } else if (PageUnevictable(page)) {
+ VM_BUG_ON(PageActive(page));
+ ClearPageUnevictable(page);
+ }
+
+ VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
+ __ilru_cache_add(page, lru);
+}
+
+/**
* lru_cache_add_lru - add a page to a page list
* @page: the page to be added to the LRU.
* @lru: the LRU list to which the page is added.
@@ -484,6 +541,13 @@ static void drain_cpu_pagevecs(int cpu)
____pagevec_lru_add(pvec, lru);
}

+ pvecs = per_cpu(ilru_add_pvecs, cpu);
+ for_each_lru(lru) {
+ pvec = &pvecs[lru - LRU_BASE];
+ if (pagevec_count(pvec))
+ ____pagevec_ilru_add(pvec, lru);
+ }
+
pvec = &per_cpu(lru_rotate_pvecs, cpu);
if (pagevec_count(pvec)) {
unsigned long flags;
@@ -669,6 +733,130 @@ void lru_add_page_tail(struct zone* zone,
}
}

+/*
+ * We need adjust prev_page of ilru_list when we putback newpage
+ * and free old page. Let's think about it.
+ * For example,
+ *
+ * Notation)
+ * PHY : page physical layout on memory
+ * LRU : page logical layout as LRU order
+ * ilru : inorder_lru list
+ * PN : old page(ie, source page of migration)
+ * PN' : new page(ie, destination page of migration)
+ *
+ * Let's assume there is below layout.
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4 - P3 - P2 - P1 - T
+ * ilru :
+ *
+ * We isolate P2,P3,P4 so inorder_lru has following as.
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P1 - T
+ * ilru : (P4,P5) - (P3,P4) - (P2,P3)
+ *
+ * After 1st putback happens,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P1 - T
+ * ilru : (P3,P4) - (P2,P3)
+ * P4' is a newpage and P4(ie, old page) would freed
+ *
+ * In 2nd putback, P3 would try findding P4 but P4 would be freed.
+ * so same_lru returns 'false' so that inorder_lru doesn't work any more.
+ * The bad effect continues until P2. That's too bad.
+ * For fixing, we define adjust_ilru_list. It works following as.
+ *
+ * After 1st putback,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P1 - T
+ * ilru : (P3,P4') - (P2,P3)
+ * It replaces prev pointer of pages remained in inorder_lru list with
+ * new one's so in 2nd putback,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P3' - P1 - T
+ * ilru : (P2,P3')
+ *
+ * In 3rd putback,
+ *
+ * PHY : H - P1 - P2 - P3 - P4 - P5 - T
+ * LRU : H - P5 - P4' - P3' - P2' - P1 - T
+ * ilru :
+ */
+static inline void adjust_ilru_list(enum lru_list lru,
+ struct page *old_page, struct page *new_page, int idx)
+{
+ int i;
+ struct pagevec *pvec = &get_cpu_var(ilru_add_pvecs)[lru];
+ for (i = idx + 1; i < pagevec_count(pvec); i++) {
+ struct page *page = pvec->pages[i];
+ if (page->ilru.prev_page == old_page)
+ page->ilru.prev_page = new_page;
+ }
+}
+
+/*
+ * Check if page and prev are on same LRU.
+ * zone->lru_lock must be hold.
+ */
+static bool same_lru(struct page *page, struct page *prev)
+{
+ bool ret = false;
+ if (!prev || !PageLRU(prev))
+ goto out;
+
+ if (unlikely(PageUnevictable(prev)))
+ goto out;
+
+ if (page_lru_base_type(page) != page_lru_base_type(prev))
+ goto out;
+
+ ret = true;
+out:
+ return ret;
+}
+
+static void ____pagevec_ilru_add_fn(struct page *page, void *arg, int idx)
+{
+ enum lru_list lru = (enum lru_list)arg;
+ struct zone *zone = page_zone(page);
+ int file, active;
+
+ struct page *prev_page = page->ilru.prev_page;
+ struct page *old_page = page->ilru.old_page;
+
+ VM_BUG_ON(PageActive(page));
+ VM_BUG_ON(PageUnevictable(page));
+ VM_BUG_ON(PageLRU(page));
+
+ SetPageLRU(page);
+
+ if (same_lru(page, prev_page)) {
+ active = PageActive(prev_page);
+ file = page_is_file_cache(page);
+ if (active)
+ SetPageActive(page);
+ /*
+ * The newpage will replace LRU position of old page.
+ * So let's adjust prev_page of pages remained
+ * in ilru_add_pvecs for same_lru wokring.
+ */
+ adjust_ilru_list(lru, old_page, page, idx);
+ __add_page_to_lru_list(zone, page, lru, &prev_page->lru);
+ } else {
+ file = is_file_lru(lru);
+ active = is_active_lru(lru);
+ if (active)
+ SetPageActive(page);
+ add_page_to_lru_list(zone, page, lru);
+ }
+
+ update_page_reclaim_stat(zone, page, file, active);
+}
+
static void ____pagevec_lru_add_fn(struct page *page, void *arg)
{
enum lru_list lru = (enum lru_list)arg;
@@ -691,6 +879,17 @@ static void ____pagevec_lru_add_fn(struct page *page, void *arg)
* Add the passed pages to the LRU, then drop the caller's refcount
* on them. Reinitialises the caller's pagevec.
*/
+void ____pagevec_ilru_add(struct pagevec *pvec, enum lru_list lru)
+{
+ VM_BUG_ON(is_unevictable_lru(lru));
+
+ pagevec_ilru_move_fn(pvec, ____pagevec_ilru_add_fn, (void *)lru);
+}
+
+/*
+ * Add the passed pages to the LRU, then drop the caller's refcount
+ * on them. Reinitialises the caller's pagevec.
+ */
void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
{
VM_BUG_ON(is_unevictable_lru(lru));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 45eadee..f0e7789 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -565,45 +565,7 @@ int remove_mapping(struct address_space *mapping, struct page *page)
return 0;
}

-/**
- * putback_page_to_lru - put isolated @page onto @head
- * @page: page to be put back to appropriate lru list
- * @head_page: lru position to be put back
- *
- * Insert previously isolated @page to appropriate position of lru list
- * zone->lru_lock must be hold.
- */
-void putback_page_to_lru(struct page *page, struct page *head_page)
-{
- int lru, active, file;
- struct zone *zone = page_zone(page);
-
- VM_BUG_ON(PageLRU(page));
-
- lru = page_lru(head_page);
- active = is_active_lru(lru);
- file = is_file_lru(lru);
-
- if (active)
- SetPageActive(page);
- else
- ClearPageActive(page);
-
- update_page_reclaim_stat(zone, page, file, active);
- SetPageLRU(page);
- __add_page_to_lru_list(zone, page, lru, &head_page->lru);
-}
-
-/**
- * putback_lru_page - put previously isolated page onto appropriate LRU list
- * @page: page to be put back to appropriate lru list
- *
- * Add previously isolated @page to appropriate LRU list.
- * Page may still be unevictable for other reasons.
- *
- * lru_lock must not be held, interrupts must be enabled.
- */
-void putback_lru_page(struct page *page)
+static void __putback_lru_core(struct page *page, bool inorder)
{
int lru;
int active = !!TestClearPageActive(page);
@@ -622,7 +584,10 @@ redo:
* We know how to handle that.
*/
lru = active + page_lru_base_type(page);
- lru_cache_add_lru(page, lru);
+ if (inorder)
+ lru_cache_add_ilru(page, lru);
+ else
+ lru_cache_add_lru(page, lru);
} else {
/*
* Put unevictable pages directly on zone's unevictable
@@ -650,6 +615,7 @@ redo:
if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
if (!isolate_lru_page(page)) {
put_page(page);
+ inorder = false;
goto redo;
}
/* This means someone else dropped this page from LRU
@@ -666,6 +632,25 @@ redo:
put_page(page); /* drop ref from isolate */
}

+/**
+ * putback_lru_page - put previously isolated page onto appropriate LRU list's head
+ * @page: page to be put back to appropriate lru list
+ *
+ * Add previously isolated @page to appropriate LRU list's head
+ * Page may still be unevictable for other reasons.
+ *
+ * lru_lock must not be held, interrupts must be enabled.
+ */
+void putback_lru_page(struct page *page)
+{
+ __putback_lru_core(page, false);
+}
+
+void putback_ilru_page(struct page *page)
+{
+ __putback_lru_core(page, true);
+}
+
enum page_references {
PAGEREF_RECLAIM,
PAGEREF_RECLAIM_CLEAN,
--
1.7.4.1

2011-06-30 14:57:10

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 09/10] add inorder-lru tracepoints for just measurement

This patch adds some tracepints for see the effect this patch
series. This tracepoints isn't for merge but just see the effect.

Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/trace/events/inorder_putback.h | 88 ++++++++++++++++++++++++++++++++
mm/migrate.c | 3 +
mm/swap.c | 3 +
mm/vmscan.c | 4 +-
4 files changed, 96 insertions(+), 2 deletions(-)
create mode 100644 include/trace/events/inorder_putback.h

diff --git a/include/trace/events/inorder_putback.h b/include/trace/events/inorder_putback.h
new file mode 100644
index 0000000..fe81742
--- /dev/null
+++ b/include/trace/events/inorder_putback.h
@@ -0,0 +1,88 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM inorder_putback
+
+#if !defined(_TRACE_INP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_INP_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(mm_inorder_inorder,
+
+ TP_PROTO(struct page *page,
+ struct page *old_page,
+ struct page *prev_page),
+
+ TP_ARGS(page, old_page, prev_page),
+
+ TP_STRUCT__entry(
+ __field(struct page *, page)
+ __field(struct page *, old_page)
+ __field(struct page *, prev_page)
+ ),
+
+ TP_fast_assign(
+ __entry->page = page;
+ __entry->old_page = old_page;
+ __entry->prev_page = prev_page;
+ ),
+
+ TP_printk("pfn=%lu old pfn=%lu prev_pfn=%lu active=%d",
+ page_to_pfn(__entry->page),
+ page_to_pfn(__entry->old_page),
+ page_to_pfn(__entry->prev_page),
+ PageActive(__entry->prev_page))
+);
+
+TRACE_EVENT(mm_inorder_outoforder,
+ TP_PROTO(struct page *page,
+ struct page *old_page,
+ struct page *prev_page),
+
+ TP_ARGS(page, old_page, prev_page),
+
+ TP_STRUCT__entry(
+ __field(struct page *, page)
+ __field(struct page *, old_page)
+ __field(struct page *, prev_page)
+ ),
+
+ TP_fast_assign(
+ __entry->page = page;
+ __entry->old_page = old_page;
+ __entry->prev_page = prev_page;
+ ),
+
+ TP_printk("pfn=%lu old pfn=%lu prev_pfn=%lu active=%d",
+ page_to_pfn(__entry->page),
+ page_to_pfn(__entry->old_page),
+ __entry->prev_page ? page_to_pfn(__entry->prev_page) : 0,
+ __entry->prev_page ? PageActive(__entry->prev_page) : 0)
+);
+
+TRACE_EVENT(mm_inorder_isolate,
+
+ TP_PROTO(struct page *prev_page,
+ struct page *page),
+
+ TP_ARGS(prev_page, page),
+
+ TP_STRUCT__entry(
+ __field(struct page *, prev_page)
+ __field(struct page *, page)
+ ),
+
+ TP_fast_assign(
+ __entry->prev_page = prev_page;
+ __entry->page = page;
+ ),
+
+ TP_printk("prev_pfn=%lu pfn=%lu active=%d",
+ page_to_pfn(__entry->prev_page),
+ page_to_pfn(__entry->page), PageActive(__entry->prev_page))
+);
+
+#endif /* _TRACE_INP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/migrate.c b/mm/migrate.c
index cf73477..1267c45 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -39,6 +39,9 @@

#include "internal.h"

+#define CREATE_TRACE_POINTS
+#include <trace/events/inorder_putback.h>
+
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))

/*
diff --git a/mm/swap.c b/mm/swap.c
index 611013d..f2ccf81 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -32,6 +32,7 @@
#include <linux/memcontrol.h>
#include <linux/gfp.h>

+#include <trace/events/inorder_putback.h>
#include "internal.h"

/* How many pages do we try to swap or page in/out together? */
@@ -846,12 +847,14 @@ static void ____pagevec_ilru_add_fn(struct page *page, void *arg, int idx)
*/
adjust_ilru_list(lru, old_page, page, idx);
__add_page_to_lru_list(zone, page, lru, &prev_page->lru);
+ trace_mm_inorder_inorder(page, old_page, prev_page);
} else {
file = is_file_lru(lru);
active = is_active_lru(lru);
if (active)
SetPageActive(page);
add_page_to_lru_list(zone, page, lru);
+ trace_mm_inorder_outoforder(page, old_page, prev_page);
}

update_page_reclaim_stat(zone, page, file, active);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f0e7789..eb26f03 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -50,10 +50,9 @@
#include <linux/swapops.h>

#include "internal.h"
-
+#include <trace/events/inorder_putback.h>
#define CREATE_TRACE_POINTS
#include <trace/events/vmscan.h>
-
/*
* reclaim_mode determines how the inactive list is shrunk
* RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
@@ -1055,6 +1054,7 @@ int isolate_ilru_page(struct page *page, isolate_mode_t mode, int file,
}

*prev_page = lru_to_page(&page->lru);
+ trace_mm_inorder_isolate(*prev_page, page);
}

return ret;
--
1.7.4.1

2011-06-30 14:57:00

by Minchan Kim

[permalink] [raw]
Subject: [PATCH v4 10/10] compaction: add drain ilru of pagevec

inorder_lru checks whether previous page of drained page is
in lru. If it isn't in lru, same_lru returns false and inorder_lru
got failed which would be frequent in heavy memory pressure as
previous page is in ilru pagevec. It's not desireable.

In addtion, fast returning of migrated page into LRU is important
in case of reclaiming pages by compation and kswapd/other direct
reclaim happens by parallel. That's because the pages in ilru pagevec
might be really tail of LRU so we can prevent eviction working set pages.

The elaspsed time of decompress 10GB in my experiment is following as.

inorder_lru inorder_lru + drain pagevec
01:43:16.16 01:40:27.18

Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/swap.h | 2 ++
mm/compaction.c | 2 ++
mm/swap.c | 13 +++++++++++++
3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 78f5249..6aafb75 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -217,6 +217,8 @@ extern unsigned int nr_free_pagecache_pages(void);


/* linux/mm/swap.c */
+extern void drain_ilru_pagevecs(int cpu);
+extern void __ilru_cache_add(struct page *, enum lru_list lru);
extern void __ilru_cache_add(struct page *, enum lru_list lru);
extern void __lru_cache_add(struct page *, enum lru_list lru);
extern void lru_cache_add_ilru(struct page *, enum lru_list lru);
diff --git a/mm/compaction.c b/mm/compaction.c
index 7bc784a..a515639 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -573,6 +573,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
cc->nr_migratepages = 0;
}

+ drain_ilru_pagevecs(get_cpu());
+ put_cpu();
}

out:
diff --git a/mm/swap.c b/mm/swap.c
index f2ccf81..c2cf0e2 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -525,6 +525,19 @@ static void lru_deactivate_fn(struct page *page, void *arg)
update_page_reclaim_stat(zone, page, file, 0);
}

+void drain_ilru_pagevecs(int cpu)
+{
+ struct pagevec *pvecs = per_cpu(ilru_add_pvecs, cpu);
+ struct pagevec *pvec;
+ int lru;
+
+ for_each_lru(lru) {
+ pvec = &pvecs[lru - LRU_BASE];
+ if (pagevec_count(pvec))
+ ____pagevec_ilru_add(pvec, lru);
+ }
+}
+
/*
* Drain pages out of the cpu's pagevecs.
* Either "cpu" is the current CPU, and preemption has already been
--
1.7.4.1

2011-06-30 15:02:56

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] Prevent LRU churning

I resend by missing Cc.
Sorry for the noise.

--
Kind regards,
Minchan Kim