Changelog since V2
o Remove ISOLATE_BOTH - suggested by Johannes Weiner
o change description slightly
o Clean up unman_and_move
o Add Reviewed-by and Acked-by
Changelog since V1
o Rebase on 2.6.39
o change description slightly
There are some places to isolate and putback pages.
For example, compaction does it for getting contiguous page.
The problem is that if we isolate page in the middle of LRU and putback it
we lose LRU history as putback_lru_page inserts the page into head of LRU list.
LRU history is important parameter to select victim page in curre page reclaim
when memory pressure is heavy. Unfortunately, if someone want to allocate high-order page
and memory pressure is heavy, it would trigger compaction and we end up lost LRU history.
It means we can evict working set pages and system latency would be high.
This patch is for solving the problem with two methods.
* Anti-churning
when we isolate page on LRU list, let's not isolate page we can't handle
* De-churning
when we putback page on LRU list in migration, let's insert new page into old page's lru position.
[1,2,3/10] is just clean up.
[4,5,6/10] is related to Anti-churning.
[7,8,9/10] is related to De-churning.
[10/10] is adding to new tracepoints which is never for merge but just show the effect.
I test and pass this series all[yes|no|mod|def]config.
And in my machine(1G DRAM, Intel Core 2 Duo), test scenario is following as.
1) Boot up
2) qemu ubuntu start up (1G mem)
3) Run many applications and switch attached script(which is made by Wu)
I think this is worst-case scenario since there are many contiguous pages when machine boots up.
It means system memory isn't aging so that many pages are contiguous-LRU order. It could make
bad effect on inorder-lru but I solved the problem. Please see description of [7/10].
Test result is following as.
For compaction, it isolated about 20000 pages. Only 10 pages are put backed with
out-of-order(ie, head of LRU) Others, about 19990 pages are put-backed with in-order
(ie, position of old page while migration happens). It is eactly what I want.
Welcome to any comment.
You can see test script and all-at-once patch in following URL.
http://www.kernel.org/pub/linux/kernel/people/minchan/inorder_putback/v3/
Minchan Kim (10):
[1/10] compaction: trivial clean up acct_isolated
[2/10 Change isolate mode from int type to enum type
[3/10] Add additional isolation mode
[4/10] compaction: make isolate_lru_page with filter aware
[5/10] vmscan: make isolate_lru_page with filter aware
[6/10] In order putback lru core
[7/10] migration: clean up unmap_and_move
[8/10] migration: make in-order-putback aware
[9/10] compaction: make compaction use in-order putback
[10/10] add inorder-lru tracepoints for just measurement
include/linux/memcontrol.h | 5 +-
include/linux/migrate.h | 40 +++++
include/linux/mm_types.h | 16 ++-
include/linux/swap.h | 15 ++-
include/trace/events/inorder_putback.h | 79 ++++++++++
include/trace/events/vmscan.h | 8 +-
mm/compaction.c | 47 +++---
mm/internal.h | 2 +
mm/memcontrol.c | 5 +-
mm/migrate.c | 255 ++++++++++++++++++++++++++++----
mm/swap.c | 2 +-
mm/vmscan.c | 133 ++++++++++++++---
12 files changed, 517 insertions(+), 90 deletions(-)
create mode 100644 include/trace/events/inorder_putback.h
acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
fields in conpact_control.
This patch removes fields from compact_control and makes clear function of
acct_issolated which counts the number of anon|file pages isolated.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 18 +++++-------------
1 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 021a296..61eab88 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -35,10 +35,6 @@ struct compact_control {
unsigned long migrate_pfn; /* isolate_migratepages search base */
bool sync; /* Synchronous migration */
- /* Account for isolated anon and file pages */
- unsigned long nr_anon;
- unsigned long nr_file;
-
unsigned int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
@@ -212,17 +208,13 @@ static void isolate_freepages(struct zone *zone,
static void acct_isolated(struct zone *zone, struct compact_control *cc)
{
struct page *page;
- unsigned int count[NR_LRU_LISTS] = { 0, };
+ unsigned int count[2] = { 0, };
- list_for_each_entry(page, &cc->migratepages, lru) {
- int lru = page_lru_base_type(page);
- count[lru]++;
- }
+ list_for_each_entry(page, &cc->migratepages, lru)
+ count[!!page_is_file_cache(page)]++;
- cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
- cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
- __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
}
/* Similar to reclaim, but different enough that they don't share logic */
--
1.7.0.4
This patch changes macro define with enum variable.
Normally, enum is preferred as it's type-safe and making debugging easier
as symbol can be passed throught to the debugger.
This patch doesn't change old behavior.
It is used by next patches.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/memcontrol.h | 5 ++++-
include/linux/swap.h | 11 +++++++----
include/trace/events/vmscan.h | 8 ++++----
mm/compaction.c | 3 ++-
mm/memcontrol.c | 3 ++-
mm/migrate.c | 4 ++--
mm/vmscan.c | 37 ++++++++++++++++++++-----------------
7 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5e9840f..91a1162 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -30,10 +30,13 @@ enum mem_cgroup_page_stat_item {
MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
};
+enum ISOLATE_MODE;
+
extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
struct list_head *dst,
unsigned long *scanned, int order,
- int mode, struct zone *z,
+ enum ISOLATE_MODE mode,
+ struct zone *z,
struct mem_cgroup *mem_cont,
int active, int file);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index a5c6da5..48d50e6 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -244,9 +244,11 @@ static inline void lru_cache_add_file(struct page *page)
}
/* LRU Isolation modes. */
-#define ISOLATE_INACTIVE 0 /* Isolate inactive pages. */
-#define ISOLATE_ACTIVE 1 /* Isolate active pages. */
-#define ISOLATE_BOTH 2 /* Isolate both active and inactive pages. */
+enum ISOLATE_MODE {
+ ISOLATE_NONE,
+ ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
+ ISOLATE_ACTIVE = 2, /* Isolate active pages */
+};
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
@@ -258,7 +260,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
unsigned int swappiness,
struct zone *zone);
-extern int __isolate_lru_page(struct page *page, int mode, int file);
+extern int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
+ int file);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index ea422aa..4f53d43 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -187,7 +187,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
unsigned long nr_lumpy_taken,
unsigned long nr_lumpy_dirty,
unsigned long nr_lumpy_failed,
- int isolate_mode),
+ enum ISOLATE_MODE isolate_mode),
TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode),
@@ -199,7 +199,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
__field(unsigned long, nr_lumpy_taken)
__field(unsigned long, nr_lumpy_dirty)
__field(unsigned long, nr_lumpy_failed)
- __field(int, isolate_mode)
+ __field(enum ISOLATE_MODE, isolate_mode)
),
TP_fast_assign(
@@ -233,7 +233,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
unsigned long nr_lumpy_taken,
unsigned long nr_lumpy_dirty,
unsigned long nr_lumpy_failed,
- int isolate_mode),
+ enum ISOLATE_MODE isolate_mode),
TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
@@ -248,7 +248,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
unsigned long nr_lumpy_taken,
unsigned long nr_lumpy_dirty,
unsigned long nr_lumpy_failed,
- int isolate_mode),
+ enum ISOLATE_MODE isolate_mode),
TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
diff --git a/mm/compaction.c b/mm/compaction.c
index 61eab88..f0d75e9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -327,7 +327,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
}
/* Try isolate the page */
- if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
+ if (__isolate_lru_page(page,
+ ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 010f916..f4c0b71 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1106,7 +1106,8 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
struct list_head *dst,
unsigned long *scanned, int order,
- int mode, struct zone *z,
+ enum ISOLATE_MODE mode,
+ struct zone *z,
struct mem_cgroup *mem_cont,
int active, int file)
{
diff --git a/mm/migrate.c b/mm/migrate.c
index 819d233..e797b5c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1363,10 +1363,10 @@ int migrate_vmas(struct mm_struct *mm, const nodemask_t *to,
for (vma = mm->mmap; vma && !err; vma = vma->vm_next) {
if (vma->vm_ops && vma->vm_ops->migrate) {
- err = vma->vm_ops->migrate(vma, to, from, flags);
+ err = vma->vm_ops->migrate(vma, to, from, flags);
if (err)
break;
- }
+ }
}
return err;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a658dde..4cbe114 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -957,23 +957,27 @@ keep_lumpy:
*
* returns 0 on success, -ve errno on failure.
*/
-int __isolate_lru_page(struct page *page, int mode, int file)
+int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
{
+ bool all_lru_mode;
int ret = -EINVAL;
/* Only take pages on the LRU. */
if (!PageLRU(page))
return ret;
+ all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
+ (ISOLATE_ACTIVE|ISOLATE_INACTIVE);
+
/*
* When checking the active state, we need to be sure we are
* dealing with comparible boolean values. Take the logical not
* of each.
*/
- if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode))
+ if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
return ret;
- if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
+ if (!all_lru_mode && !!page_is_file_cache(page) != file)
return ret;
/*
@@ -1021,7 +1025,8 @@ int __isolate_lru_page(struct page *page, int mode, int file)
*/
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
- unsigned long *scanned, int order, int mode, int file)
+ unsigned long *scanned, int order, enum ISOLATE_MODE mode,
+ int file)
{
unsigned long nr_taken = 0;
unsigned long nr_lumpy_taken = 0;
@@ -1134,8 +1139,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
static unsigned long isolate_pages_global(unsigned long nr,
struct list_head *dst,
unsigned long *scanned, int order,
- int mode, struct zone *z,
- int active, int file)
+ enum ISOLATE_MODE mode,
+ struct zone *z, int active, int file)
{
int lru = LRU_BASE;
if (active)
@@ -1382,6 +1387,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
unsigned long nr_taken;
unsigned long nr_anon;
unsigned long nr_file;
+ enum ISOLATE_MODE reclaim_mode = ISOLATE_INACTIVE;
while (unlikely(too_many_isolated(zone, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1392,15 +1398,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
}
set_reclaim_mode(priority, sc, false);
+ if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
+ reclaim_mode |= ISOLATE_ACTIVE;
+
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
if (scanning_global_lru(sc)) {
- nr_taken = isolate_pages_global(nr_to_scan,
- &page_list, &nr_scanned, sc->order,
- sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
- ISOLATE_BOTH : ISOLATE_INACTIVE,
- zone, 0, file);
+ nr_taken = isolate_pages_global(nr_to_scan, &page_list,
+ &nr_scanned, sc->order, reclaim_mode, zone, 0, file);
zone->pages_scanned += nr_scanned;
if (current_is_kswapd())
__count_zone_vm_events(PGSCAN_KSWAPD, zone,
@@ -1409,12 +1415,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
__count_zone_vm_events(PGSCAN_DIRECT, zone,
nr_scanned);
} else {
- nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
- &page_list, &nr_scanned, sc->order,
- sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
- ISOLATE_BOTH : ISOLATE_INACTIVE,
- zone, sc->mem_cgroup,
- 0, file);
+ nr_taken = mem_cgroup_isolate_pages(nr_to_scan, &page_list,
+ &nr_scanned, sc->order, reclaim_mode, zone,
+ sc->mem_cgroup, 0, file);
/*
* mem_cgroup_isolate_pages() keeps track of
* scanned pages on its own.
--
1.7.0.4
There are some places to isolate lru page and I believe
users of isolate_lru_page will be growing.
The purpose of them is each different so part of isolated pages
should put back to LRU, again.
The problem is when we put back the page into LRU,
we lose LRU ordering and the page is inserted at head of LRU list.
It makes unnecessary LRU churning so that vm can evict working set pages
rather than idle pages.
This patch adds new modes when we isolate page in LRU so we don't isolate pages
if we can't handle it. It could reduce LRU churning.
This patch doesn't change old behavior. It's just used by next patches.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/swap.h | 2 ++
mm/vmscan.c | 6 ++++++
2 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 48d50e6..731f5dd 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -248,6 +248,8 @@ enum ISOLATE_MODE {
ISOLATE_NONE,
ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
ISOLATE_ACTIVE = 2, /* Isolate active pages */
+ ISOLATE_CLEAN = 8, /* Isolate clean file */
+ ISOLATE_UNMAPPED = 16, /* Isolate unmapped file */
};
/* linux/mm/vmscan.c */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4cbe114..26aa627 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -990,6 +990,12 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
ret = -EBUSY;
+ if (mode & ISOLATE_CLEAN && (PageDirty(page) || PageWriteback(page)))
+ return ret;
+
+ if (mode & ISOLATE_UNMAPPED && page_mapped(page))
+ return ret;
+
if (likely(get_page_unless_zero(page))) {
/*
* Be careful not to clear PageLRU until after we're
--
1.7.0.4
In async mode, compaction doesn't migrate dirty or writeback pages.
So, it's meaningless to pick the page and re-add it to lru list.
Of course, when we isolate the page in compaction, the page might
be dirty or writeback but when we try to migrate the page, the page
would be not dirty, writeback. So it could be migrated. But it's
very unlikely as isolate and migration cycle is much faster than
writeout.
So, this patch helps cpu and prevent unnecessary LRU churning.
Cc: Andrea Arcangeli <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index f0d75e9..8079346 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -243,6 +243,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
unsigned long last_pageblock_nr = 0, pageblock_nr;
unsigned long nr_scanned = 0, nr_isolated = 0;
struct list_head *migratelist = &cc->migratepages;
+ enum ISOLATE_MODE mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
/* Do not scan outside zone boundaries */
low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
@@ -326,9 +327,11 @@ static unsigned long isolate_migratepages(struct zone *zone,
continue;
}
+ if (!cc->sync)
+ mode |= ISOLATE_CLEAN;
+
/* Try isolate the page */
- if (__isolate_lru_page(page,
- ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
+ if (__isolate_lru_page(page, mode, 0) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
--
1.7.0.4
In __zone_reclaim case, we don't want to shrink mapped page.
Nonetheless, we have isolated mapped page and re-add it into
LRU's head. It's unnecessary CPU overhead and makes LRU churning.
Of course, when we isolate the page, the page might be mapped but
when we try to migrate the page, the page would be not mapped.
So it could be migrated. But race is rare and although it happens,
it's no big deal.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/vmscan.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26aa627..c08911d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1408,6 +1408,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
reclaim_mode |= ISOLATE_ACTIVE;
lru_add_drain();
+
+ if (!sc->may_unmap)
+ reclaim_mode |= ISOLATE_UNMAPPED;
+ if (!sc->may_writepage)
+ reclaim_mode |= ISOLATE_CLEAN;
+
spin_lock_irq(&zone->lru_lock);
if (scanning_global_lru(sc)) {
@@ -1525,19 +1531,26 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
struct page *page;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
unsigned long nr_rotated = 0;
+ enum ISOLATE_MODE reclaim_mode = ISOLATE_ACTIVE;
lru_add_drain();
+
+ if (!sc->may_unmap)
+ reclaim_mode |= ISOLATE_UNMAPPED;
+ if (!sc->may_writepage)
+ reclaim_mode |= ISOLATE_CLEAN;
+
spin_lock_irq(&zone->lru_lock);
if (scanning_global_lru(sc)) {
nr_taken = isolate_pages_global(nr_pages, &l_hold,
&pgscanned, sc->order,
- ISOLATE_ACTIVE, zone,
+ reclaim_mode, zone,
1, file);
zone->pages_scanned += pgscanned;
} else {
nr_taken = mem_cgroup_isolate_pages(nr_pages, &l_hold,
&pgscanned, sc->order,
- ISOLATE_ACTIVE, zone,
+ reclaim_mode, zone,
sc->mem_cgroup, 1, file);
/*
* mem_cgroup_isolate_pages() keeps track of
--
1.7.0.4
This patch defines new APIs to put back the page into previous position of LRU.
The idea I suggested in LSF/MM is simple.
When we try to put back the page into lru list and if friends(prev, next) of the page
still is nearest neighbor, we can insert isolated page into prev's next instead of
head of LRU list. So it keeps LRU history without losing the LRU information.
Before :
LRU POV : H - P1 - P2 - P3 - P4 -T
Isolate P3 :
LRU POV : H - P1 - P2 - P4 - T
Putback P3 :
if (P2->next == P4)
putback(P3, P2);
So,
LRU POV : H - P1 - P2 - P3 - P4 -T
I implemented this idea in RFC but it had two problems.
1)
For implement, I defined new structure _pages_lru_ which remembers
both lru friend pages of isolated one and handling functions.
For space of pages_lru, I allocated the space dynamically in kmalloc(GFP_AOTMIC)
but as we know, compaction is a reclaim path so it's not good idea to allocate memory
dynamically in the path. The space need to store pages_lru is enough to allocate just a page
as current compaction migrates unit of chunk of 32 pages.
In addition, compaction makes sure lots of order-0 free pages before starting
so it wouldn't a big problem, I think. But I admit it can pin some pages
so migration successful ratio might be down if concurrent compaction happens.
I decide changing my mind. I don't use dynamic memory space any more.
As I see migration, we don't need doubly linked list of page->lru.
Whole of operation is performed with enumeration so I think singly linked list is enough.
If we can use singly linked list, we can use a pointer as another buffer.
In here, we use it to store prev LRU page of page isolated.
2)
The page-relation approach had a problem on contiguous pages.
That's because the idea can not work since friend pages are isolated, too.
It means prev_page->next == next_page always is _false_ and both pages are not
LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
So for solving the problem, I changed the idea.
We don't need both friend(prev, next) pages relation but just consider
either prev or next page that it is still same LRU
Worst case in this approach, prev or next page is free and allocate new
so it's in head of LRU and our isolated page is located on next of head.
But it's almost same situation with current problem. So it doesn't make worse
than now.
New idea works below.
===
assume : we isolate pages P3~P7 and we consider only prev LRU pointer.
notation : (P3,P2) = (isolated page,prev LRU page of isolated page)
H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
If we isolate P3, following as
H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
Isolated page list - (P3,P2)
If we isolate P4, following as
H - P1 - P2 - P5 - P6 - P7 - P8 - P9 - P10 - T
Isolated page list - (P4,P2) - (P3,P2)
If we isolate P5, following as
H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
Isolated page list - (P5,P2) - (P4,P2) - (P3,P2)
..
..
If we isolate P7, following as
H - P1 - P2 - P8 - P9 - P10 - T
Isolated page list - (P7,P2) - (P6,P2) - (P5,P2) - (P4,P2) - (P3,P2)
Let's start putback from P7
P7)
H - P1 - P2 - P8 - P9 - P10 - T
prev P2 is valid, too. So,
H - P1 - P2 - P7 - P8 - P9 - P10 - T
P6)
H - P1 - P2 - P7 - P8 - P9 - P10 - T
Prev P2 is valid, too. So,
H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
..
..
P3)
H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
Prev P2 is valid, too. So,
H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
===
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/migrate.h | 35 ++++++++++++++++++++++
include/linux/mm_types.h | 16 +++++++++-
include/linux/swap.h | 4 ++-
mm/compaction.c | 2 +-
mm/internal.h | 2 +
mm/memcontrol.c | 2 +-
mm/migrate.c | 24 +++++++++++++++
mm/swap.c | 2 +-
mm/vmscan.c | 72 ++++++++++++++++++++++++++++++++++++++++++++--
9 files changed, 151 insertions(+), 8 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..5914282 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -9,7 +9,42 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
#ifdef CONFIG_MIGRATION
#define PAGE_MIGRATION 1
+/*
+ * Migratelist for compaction is singly linked list instead of double linked list.
+ * Current list utility is useful in some sense but we can't make sure compatibilty.
+ * Please use below functions instead of common list's ones.
+ */
+static inline void INIT_MIGRATE_LIST(struct inorder_lru *list)
+{
+ list->prev_page = NULL;
+ list->next = list;
+}
+
+static inline int migratelist_empty(const struct inorder_lru *head)
+{
+ return head->next == head;
+}
+
+static inline void migratelist_add(struct page *page,
+ struct page *prev_page, struct inorder_lru *head)
+{
+ VM_BUG_ON(PageLRU(page));
+
+ page->ilru.prev_page = prev_page;
+ page->ilru.next = head->next;
+ head->next = &page->ilru;
+}
+
+static inline void migratelist_del(struct page *page, struct inorder_lru *head)
+{
+ head->next = page->ilru.next;
+}
+
+#define list_for_each_migrate_entry list_for_each_entry
+#define list_for_each_migrate_entry_safe list_for_each_entry_safe
+
extern void putback_lru_pages(struct list_head *l);
+extern void putback_ilru_pages(struct inorder_lru *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *);
extern int migrate_pages(struct list_head *l, new_page_t x,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 02aa561..af46614 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -24,6 +24,17 @@ struct address_space;
#define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
+struct page;
+
+/*
+ * The inorder_lru is used by compaction for keeping LRU order
+ * during migration.
+ */
+struct inorder_lru {
+ struct page *prev_page; /* prev LRU page of isolated page */
+ struct inorder_lru *next; /* next pointer for singly linked list*/
+};
+
/*
* Each physical page in the system has a struct page associated with
* it to keep track of whatever it is we are using the page for at the
@@ -72,9 +83,12 @@ struct page {
pgoff_t index; /* Our offset within mapping. */
void *freelist; /* SLUB: freelist req. slab lock */
};
- struct list_head lru; /* Pageout list, eg. active_list
+ union {
+ struct inorder_lru ilru;/* compaction: migrated page list */
+ struct list_head lru; /* Pageout list, eg. active_list
* protected by zone->lru_lock !
*/
+ };
/*
* On machines where all RAM is mapped into kernel address space,
* we can simply calculate the virtual address. On machines with
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 731f5dd..854244a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -226,6 +226,8 @@ extern int lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
extern void deactivate_page(struct page *page);
extern void swap_setup(void);
+extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
+ int file, int rotated);
extern void add_page_to_unevictable_list(struct page *page);
@@ -263,7 +265,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
unsigned int swappiness,
struct zone *zone);
extern int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
- int file);
+ int file, struct page **prev_page);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/compaction.c b/mm/compaction.c
index 8079346..54dbdbe 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -331,7 +331,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
mode |= ISOLATE_CLEAN;
/* Try isolate the page */
- if (__isolate_lru_page(page, mode, 0) != 0)
+ if (__isolate_lru_page(page, mode, 0, NULL) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/internal.h b/mm/internal.h
index 9d0ced8..a08d8c6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -42,6 +42,8 @@ extern unsigned long highest_memmap_pfn;
/*
* in mm/vmscan.c:
*/
+extern bool same_lru(struct page *page, struct page *prev);
+extern void putback_page_to_lru(struct page *page, struct page *head_page);
extern int isolate_lru_page(struct page *page);
extern void putback_lru_page(struct page *page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4c0b71..04d460d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1141,7 +1141,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
continue;
scan++;
- ret = __isolate_lru_page(page, mode, file);
+ ret = __isolate_lru_page(page, mode, file, NULL);
switch (ret) {
case 0:
list_move(&page->lru, dst);
diff --git a/mm/migrate.c b/mm/migrate.c
index e797b5c..874c081 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -84,6 +84,30 @@ void putback_lru_pages(struct list_head *l)
}
}
+void putback_ilru_pages(struct inorder_lru *l)
+{
+ struct zone *zone;
+ struct page *page, *page2, *prev;
+
+ list_for_each_migrate_entry_safe(page, page2, l, ilru) {
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+ zone = page_zone(page);
+ spin_lock_irq(&zone->lru_lock);
+ prev = page->ilru.prev_page;
+ if (same_lru(page, prev)) {
+ putback_page_to_lru(page, prev);
+ spin_unlock_irq(&zone->lru_lock);
+ }
+ else {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(page);
+ }
+
+ l->next = &page2->ilru;
+ }
+}
+
/*
* Restore a potential migration pte to a working pte entry
*/
diff --git a/mm/swap.c b/mm/swap.c
index 5602f1a..6c24a75 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
}
}
-static void update_page_reclaim_stat(struct zone *zone, struct page *page,
+void update_page_reclaim_stat(struct zone *zone, struct page *page,
int file, int rotated)
{
struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c08911d..7668e8d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -550,6 +550,56 @@ int remove_mapping(struct address_space *mapping, struct page *page)
return 0;
}
+/*
+ * If prev and page are on same LRU still, we can keep LRU order of page.
+ * zone->lru_lock must be hold.
+ */
+bool same_lru(struct page *page, struct page *prev)
+{
+ bool ret = false;
+ if (!prev || !PageLRU(prev))
+ goto out;
+
+ if (unlikely(PageUnevictable(prev)))
+ goto out;
+
+ if (page_lru_base_type(page) != page_lru_base_type(prev))
+ goto out;
+
+ ret = true;
+out:
+ return ret;
+}
+
+/**
+ * putback_page_to_lru - put isolated @page onto @head
+ * @page: page to be put back to appropriate lru list
+ * @head_page: lru position to be put back
+ *
+ * Insert previously isolated @page to appropriate position of lru list
+ * zone->lru_lock must be hold.
+ */
+void putback_page_to_lru(struct page *page, struct page *head_page)
+{
+ int lru, active, file;
+ struct zone *zone = page_zone(page);
+
+ VM_BUG_ON(PageLRU(page));
+
+ lru = page_lru(head_page);
+ active = is_active_lru(lru);
+ file = is_file_lru(lru);
+
+ if (active)
+ SetPageActive(page);
+ else
+ ClearPageActive(page);
+
+ update_page_reclaim_stat(zone, page, file, active);
+ SetPageLRU(page);
+ __add_page_to_lru_list(zone, page, lru, &head_page->lru);
+}
+
/**
* putback_lru_page - put previously isolated page onto appropriate LRU list's head
* @page: page to be put back to appropriate lru list
@@ -954,10 +1004,13 @@ keep_lumpy:
*
* page: page to consider
* mode: one of the LRU isolation modes defined above
+ * file: True [1] if isolating file [!anon] pages
+ * prev_page: prev page of isolated page as LRU order
*
* returns 0 on success, -ve errno on failure.
*/
-int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
+int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
+ int file, struct page **prev_page)
{
bool all_lru_mode;
int ret = -EINVAL;
@@ -1003,6 +1056,18 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
* page release code relies on it.
*/
ClearPageLRU(page);
+ if (prev_page != NULL) {
+ struct zone *zone = page_zone(page);
+ enum lru_list l = page_lru(page);
+
+ if (&zone->lru[l].list == page->lru.prev) {
+ *prev_page = NULL;
+ goto out;
+ }
+
+ *prev_page = lru_to_page(&page->lru);
+ }
+out:
ret = 0;
}
@@ -1052,7 +1117,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
VM_BUG_ON(!PageLRU(page));
- switch (__isolate_lru_page(page, mode, file)) {
+ switch (__isolate_lru_page(page, mode, file, NULL)) {
case 0:
list_move(&page->lru, dst);
mem_cgroup_del_lru(page);
@@ -1111,7 +1176,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
!PageSwapCache(cursor_page))
break;
- if (__isolate_lru_page(cursor_page, mode, file) == 0) {
+ if (__isolate_lru_page(cursor_page,
+ mode, file, NULL) == 0) {
list_move(&cursor_page->lru, dst);
mem_cgroup_del_lru(cursor_page);
nr_taken += hpage_nr_pages(page);
--
1.7.0.4
The unmap_and_move is one of big messy functions.
This patch try to clean up.
It can help readability and make unmap_and_move_ilru simple.
unmap_and_move_ilru will be introduced by next patch.
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/migrate.c | 82 +++++++++++++++++++++++++++++++++------------------------
1 files changed, 47 insertions(+), 35 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 874c081..3aec310 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -645,38 +645,18 @@ static int move_to_new_page(struct page *newpage, struct page *page,
return rc;
}
-/*
- * Obtain the lock on page, remove all ptes and migrate the page
- * to the newly allocated page in newpage.
- */
-static int unmap_and_move(new_page_t get_new_page, unsigned long private,
- struct page *page, int force, bool offlining, bool sync)
+static int __unmap_and_move(struct page *page, struct page *newpage,
+ int force, bool offlining, bool sync)
{
- int rc = 0;
- int *result = NULL;
- struct page *newpage = get_new_page(page, private, &result);
+ int rc = -EAGAIN;
int remap_swapcache = 1;
int charge = 0;
struct mem_cgroup *mem;
struct anon_vma *anon_vma = NULL;
- if (!newpage)
- return -ENOMEM;
-
- if (page_count(page) == 1) {
- /* page was freed from under us. So we are done. */
- goto move_newpage;
- }
- if (unlikely(PageTransHuge(page)))
- if (unlikely(split_huge_page(page)))
- goto move_newpage;
-
- /* prepare cgroup just returns 0 or -ENOMEM */
- rc = -EAGAIN;
-
if (!trylock_page(page)) {
if (!force || !sync)
- goto move_newpage;
+ goto out;
/*
* It's not safe for direct compaction to call lock_page.
@@ -692,7 +672,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
* altogether.
*/
if (current->flags & PF_MEMALLOC)
- goto move_newpage;
+ goto out;
lock_page(page);
}
@@ -813,9 +793,13 @@ uncharge:
mem_cgroup_end_migration(mem, page, newpage, rc == 0);
unlock:
unlock_page(page);
+out:
+ return rc;
+}
-move_newpage:
- if (rc != -EAGAIN) {
+static void __put_lru_pages(struct page *page, struct page *newpage)
+{
+ if (page != NULL) {
/*
* A page that has been migrated has all references
* removed and will be freed. A page that has not been
@@ -827,20 +811,48 @@ move_newpage:
page_is_file_cache(page));
putback_lru_page(page);
}
-
/*
* Move the new page to the LRU. If migration was not successful
* then this will free the page.
*/
putback_lru_page(newpage);
+}
- if (result) {
- if (rc)
- *result = rc;
- else
- *result = page_to_nid(newpage);
- }
- return rc;
+/*
+ * Obtain the lock on page, remove all ptes and migrate the page
+ * to the newly allocated page in newpage.
+ */
+static int unmap_and_move(new_page_t get_new_page, unsigned long private,
+ struct page *page, int force, bool offlining, bool sync)
+{
+ int rc = 0;
+ int *result = NULL;
+ struct page *newpage = get_new_page(page, private, &result);
+
+ if (!newpage)
+ return -ENOMEM;
+
+ if (page_count(page) == 1) {
+ /* page was freed from under us. So we are done. */
+ goto out;
+ }
+
+ if (unlikely(PageTransHuge(page)))
+ if (unlikely(split_huge_page(page)))
+ goto out;
+
+ rc = __unmap_and_move(page, newpage, force, offlining, sync);
+ if (rc == -EAGAIN)
+ page = NULL;
+out:
+ __put_lru_pages(page, newpage);
+ if (result) {
+ if (rc)
+ *result = rc;
+ else
+ *result = page_to_nid(newpage);
+ }
+ return rc;
}
/*
--
1.7.0.4
This patch introduces new API makes migrate_ilru_pages which is aware of in-order putback.
So newpage is located at old page's LRU position.
[migrate_pages vs migrate_ilru_pages]
1) we need handle singly linked list.
The page->lru isn't doubly linked list any more when we handle migration list.
So migrate_ilru_pages have to handle singly linked list instead of doubly lined list.
2) We need defer old page's putback.
At present, during migration, old page would be freed through unmap_and_move's
putback_lru_page. It has a problem in inorder-putback's logic.
The same_lru in migrate_ilru_pages checks old pages's PageLRU and something
for determining whether the page can be located at old page's position or not.
If old page is freed before handling inorder-lru list, it ends up having !PageLRU
and same_lru returns 'false' so that inorder putback would become no-op.
3) we need adjust prev_page of inorder_lru page list when we putback newpage and free old page.
For example,
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4 - P3 - P2 - P1 - T
inorder_lru : 0
We isolate P2,P3,P4 so inorder_lru has following list
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P1 - T
inorder_lru : (P4,P5) - (P3,P4) - (P2,P3)
After 1st putback,
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P1 - T
inorder_lru : (P3,P4) - (P2,P3)
P4' is newpage and P4(ie, old page) would freed
In 2nd putback, P3 would find P4 in same_lru but P4 is in buddy
so it returns 'false' then inorder_lru doesn't work any more.
The bad effect continues until P2. That's too bad.
For fixing, this patch defines adjust_ilru_prev_page.
It works following as.
After 1st putback,
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P1 - T
inorder_lru : (P3,P4') - (P2,P3)
It replaces old page's pointer with new one's so
In 2nd putback,
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P3' - P1 - T
inorder_lru : (P2,P3')
In 3rd putback,
PHY : H - P1 - P2 - P3 - P4 - P5 - T
LRU : H - P5 - P4' - P3' - P2' - P1 - T
inorder_lru : 0
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/migrate.h | 5 ++
mm/migrate.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 157 insertions(+), 0 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 5914282..3858618 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -50,6 +50,11 @@ extern int migrate_page(struct address_space *,
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
+
+extern int migrate_ilru_pages(struct inorder_lru *l, new_page_t x,
+ unsigned long private, bool offlining,
+ bool sync);
+
extern int migrate_huge_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
diff --git a/mm/migrate.c b/mm/migrate.c
index 3aec310..a57f60b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -855,6 +855,108 @@ out:
return rc;
}
+static inline void adjust_ilru_prev_page(struct inorder_lru *head,
+ struct page *prev_page, struct page *new_page)
+{
+ struct page *page;
+ list_for_each_migrate_entry(page, head, ilru)
+ if (page->ilru.prev_page == prev_page)
+ page->ilru.prev_page = new_page;
+}
+
+void __put_ilru_pages(struct page *page, struct page *newpage,
+ struct inorder_lru **prev_lru, struct inorder_lru *head)
+{
+ struct zone *zone;
+ bool del = false;
+ struct page *prev_page = page->ilru.prev_page;
+ if (page != NULL) {
+ /*
+ * A page that has been migrated has all references
+ * removed and will be freed. A page that has not been
+ * migrated will have kepts its references and be
+ * restored.
+ */
+ migratelist_del(page, *prev_lru);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+ /*
+ * Unlike unmap_and_move, we defer putback_lru_page
+ * after inorder-lru list handling. If we call it,
+ * the page would be freed then and it doesn't have PG_lru.
+ * So same_lru doesn't work correctly.
+ */
+ del = true;
+ }
+ else
+ *prev_lru = &page->ilru;
+ /*
+ * Move the new page to the LRU. If migration was not successful
+ * then this will free the page.
+ */
+ zone = page_zone(newpage);
+ spin_lock_irq(&zone->lru_lock);
+ if (page && same_lru(page, prev_page)) {
+ putback_page_to_lru(newpage, prev_page);
+ spin_unlock_irq(&zone->lru_lock);
+ /*
+ * The newpage will replace LRU position of old page and
+ * old one would be freed. So let's adjust prev_page of pages
+ * remained in migratelist for same_lru wokring.
+ */
+ adjust_ilru_prev_page(head, page, newpage);
+ put_page(newpage); /* drop ref from isolate */
+ }
+ else {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(newpage);
+ }
+
+ if (del)
+ putback_lru_page(page);
+}
+
+/*
+ * Counterpart of unmap_and_move() for compaction.
+ * The logic is almost same with unmap_and_move. The difference is
+ * this function handles prev_lru. For inorder-lru compaction, we use
+ * singly linked list so we need prev pointer handling to delete entry.
+ */
+static int unmap_and_move_ilru(new_page_t get_new_page, unsigned long private,
+ struct page *page, int force, bool offlining, bool sync,
+ struct inorder_lru **prev_lru, struct inorder_lru *head)
+{
+ int rc = 0;
+ int *result = NULL;
+ struct page *newpage = get_new_page(page, private, &result);
+
+ if (!newpage)
+ return -ENOMEM;
+
+ if (page_count(page) == 1) {
+ /* page was freed from under us. So we are done. */
+ goto out;
+ }
+
+ if (unlikely(PageTransHuge(page)))
+ if (unlikely(split_huge_page(page)))
+ goto out;
+
+ rc = __unmap_and_move(page, newpage, force, offlining, sync);
+ if (rc == -EAGAIN)
+ page = NULL;
+out:
+ __put_ilru_pages(page, newpage, prev_lru, head);
+ if (result) {
+ if (rc)
+ *result = rc;
+ else
+ *result = page_to_nid(newpage);
+ }
+ return rc;
+
+}
+
/*
* Counterpart of unmap_and_move_page() for hugepage migration.
*
@@ -996,6 +1098,56 @@ out:
return nr_failed + retry;
}
+int migrate_ilru_pages(struct inorder_lru *head, new_page_t get_new_page,
+ unsigned long private, bool offlining, bool sync)
+{
+ int retry = 1;
+ int nr_failed = 0;
+ int pass = 0;
+ struct page *page, *page2;
+ struct inorder_lru *prev;
+ int swapwrite = current->flags & PF_SWAPWRITE;
+ int rc;
+
+ if (!swapwrite)
+ current->flags |= PF_SWAPWRITE;
+
+ for(pass = 0; pass < 10 && retry; pass++) {
+ retry = 0;
+ list_for_each_migrate_entry_safe(page, page2, head, ilru) {
+ cond_resched();
+
+ prev = head;
+ rc = unmap_and_move_ilru(get_new_page, private,
+ page, pass > 2, offlining,
+ sync, &prev, head);
+
+ switch(rc) {
+ case -ENOMEM:
+ goto out;
+ case -EAGAIN:
+ retry++;
+ break;
+ case 0:
+ break;
+ default:
+ /* Permanent failure */
+ nr_failed++;
+ break;
+ }
+ }
+ }
+ rc = 0;
+out:
+ if (!swapwrite)
+ current->flags &= ~PF_SWAPWRITE;
+
+ if (rc)
+ return rc;
+
+ return nr_failed + retry;
+}
+
int migrate_huge_pages(struct list_head *from,
new_page_t get_new_page, unsigned long private, bool offlining,
bool sync)
--
1.7.0.4
Compaction is good solution to get contiguous page but it makes
LRU churing which is not good. Moreover, LRU order is important
when VM has memory pressure to select right victim pages.
This patch makes that compaction code use in-order putback so
after compaction completion, migrated pages are keeping LRU ordering.
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 25 +++++++++++++------------
1 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 54dbdbe..29e6aa9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -28,7 +28,7 @@
*/
struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
- struct list_head migratepages; /* List of pages being migrated */
+ struct inorder_lru migratepages;/* List of pages being migrated */
unsigned long nr_freepages; /* Number of isolated free pages */
unsigned long nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
@@ -210,7 +210,7 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
struct page *page;
unsigned int count[2] = { 0, };
- list_for_each_entry(page, &cc->migratepages, lru)
+ list_for_each_migrate_entry(page, &cc->migratepages, ilru)
count[!!page_is_file_cache(page)]++;
__mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
@@ -242,7 +242,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
unsigned long low_pfn, end_pfn;
unsigned long last_pageblock_nr = 0, pageblock_nr;
unsigned long nr_scanned = 0, nr_isolated = 0;
- struct list_head *migratelist = &cc->migratepages;
+ struct inorder_lru *migratelist = &cc->migratepages;
enum ISOLATE_MODE mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
/* Do not scan outside zone boundaries */
@@ -273,7 +273,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
cond_resched();
spin_lock_irq(&zone->lru_lock);
for (; low_pfn < end_pfn; low_pfn++) {
- struct page *page;
+ struct page *page, *prev_page;
bool locked = true;
/* give a chance to irqs before checking need_resched() */
@@ -331,14 +331,14 @@ static unsigned long isolate_migratepages(struct zone *zone,
mode |= ISOLATE_CLEAN;
/* Try isolate the page */
- if (__isolate_lru_page(page, mode, 0, NULL) != 0)
+ if (__isolate_lru_page(page, mode, 0, &prev_page) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
/* Successfully isolated */
del_page_from_lru_list(zone, page, page_lru(page));
- list_add(&page->lru, migratelist);
+ migratelist_add(page, prev_page, migratelist);
cc->nr_migratepages++;
nr_isolated++;
@@ -394,7 +394,7 @@ static void update_nr_listpages(struct compact_control *cc)
int nr_freepages = 0;
struct page *page;
- list_for_each_entry(page, &cc->migratepages, lru)
+ list_for_each_migrate_entry(page, &cc->migratepages, ilru)
nr_migratepages++;
list_for_each_entry(page, &cc->freepages, lru)
nr_freepages++;
@@ -522,7 +522,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
continue;
nr_migrate = cc->nr_migratepages;
- err = migrate_pages(&cc->migratepages, compaction_alloc,
+ err = migrate_ilru_pages(&cc->migratepages,
+ compaction_alloc,
(unsigned long)cc, false,
cc->sync);
update_nr_listpages(cc);
@@ -537,7 +538,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
/* Release LRU pages not migrated */
if (err) {
- putback_lru_pages(&cc->migratepages);
+ putback_ilru_pages(&cc->migratepages);
cc->nr_migratepages = 0;
}
@@ -563,7 +564,7 @@ unsigned long compact_zone_order(struct zone *zone,
.sync = sync,
};
INIT_LIST_HEAD(&cc.freepages);
- INIT_LIST_HEAD(&cc.migratepages);
+ INIT_MIGRATE_LIST(&cc.migratepages);
return compact_zone(zone, &cc);
}
@@ -645,12 +646,12 @@ static int compact_node(int nid)
cc.zone = zone;
INIT_LIST_HEAD(&cc.freepages);
- INIT_LIST_HEAD(&cc.migratepages);
+ INIT_MIGRATE_LIST(&cc.migratepages);
compact_zone(zone, &cc);
VM_BUG_ON(!list_empty(&cc.freepages));
- VM_BUG_ON(!list_empty(&cc.migratepages));
+ VM_BUG_ON(!migratelist_empty(&cc.migratepages));
}
return 0;
--
1.7.0.4
This patch adds some tracepints for see the effect this patch
series. This tracepoints isn't for merge but just see the effect.
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/trace/events/inorder_putback.h | 79 ++++++++++++++++++++++++++++++++
mm/compaction.c | 2 +
mm/migrate.c | 7 +++
mm/vmscan.c | 3 +-
4 files changed, 89 insertions(+), 2 deletions(-)
create mode 100644 include/trace/events/inorder_putback.h
diff --git a/include/trace/events/inorder_putback.h b/include/trace/events/inorder_putback.h
new file mode 100644
index 0000000..c615ed8
--- /dev/null
+++ b/include/trace/events/inorder_putback.h
@@ -0,0 +1,79 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM inorder_putback
+
+#if !defined(_TRACE_INP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_INP_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(mm_compaction_inorder,
+
+ TP_PROTO(struct page *page,
+ struct page *newpage),
+
+ TP_ARGS(page, newpage),
+
+ TP_STRUCT__entry(
+ __field(struct page *, page)
+ __field(struct page *, newpage)
+ ),
+
+ TP_fast_assign(
+ __entry->page = page;
+ __entry->newpage = newpage;
+ ),
+
+ TP_printk("pfn=%lu new pfn=%lu",
+ page_to_pfn(__entry->page),
+ page_to_pfn(__entry->newpage))
+);
+
+TRACE_EVENT(mm_compaction_outoforder,
+
+ TP_PROTO(struct page *page,
+ struct page *newpage),
+
+ TP_ARGS(page, newpage),
+
+ TP_STRUCT__entry(
+ __field(struct page *, page)
+ __field(struct page *, newpage)
+ ),
+
+ TP_fast_assign(
+ __entry->page = page;
+ __entry->newpage = newpage;
+ ),
+
+ TP_printk("pfn=%lu new pfn=%lu",
+ page_to_pfn(__entry->page),
+ page_to_pfn(__entry->newpage))
+);
+
+TRACE_EVENT(mm_compact_isolate,
+
+ TP_PROTO(struct page *prev_page,
+ struct page *page),
+
+ TP_ARGS(prev_page, page),
+
+ TP_STRUCT__entry(
+ __field(struct page *, prev_page)
+ __field(struct page *, page)
+ ),
+
+ TP_fast_assign(
+ __entry->prev_page = prev_page;
+ __entry->page = page;
+ ),
+
+ TP_printk("pfn=%lu prev_pfn=%lu",
+ page_to_pfn(__entry->page),
+ page_to_pfn(__entry->prev_page))
+);
+
+#endif /* _TRACE_INP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/compaction.c b/mm/compaction.c
index 29e6aa9..1041251 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,6 +16,7 @@
#include <linux/sysfs.h>
#include "internal.h"
+#include <trace/events/inorder_putback.h>
#define CREATE_TRACE_POINTS
#include <trace/events/compaction.h>
@@ -334,6 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
if (__isolate_lru_page(page, mode, 0, &prev_page) != 0)
continue;
+ trace_mm_compact_isolate(prev_page, page);
VM_BUG_ON(PageTransCompound(page));
/* Successfully isolated */
diff --git a/mm/migrate.c b/mm/migrate.c
index a57f60b..2a8f713 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -39,6 +39,9 @@
#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/inorder_putback.h>
+
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
/*
@@ -96,10 +99,12 @@ void putback_ilru_pages(struct inorder_lru *l)
spin_lock_irq(&zone->lru_lock);
prev = page->ilru.prev_page;
if (same_lru(page, prev)) {
+ trace_mm_compaction_inorder(page, page);
putback_page_to_lru(page, prev);
spin_unlock_irq(&zone->lru_lock);
}
else {
+ trace_mm_compaction_outoforder(page, page);
spin_unlock_irq(&zone->lru_lock);
putback_lru_page(page);
}
@@ -899,6 +904,7 @@ void __put_ilru_pages(struct page *page, struct page *newpage,
if (page && same_lru(page, prev_page)) {
putback_page_to_lru(newpage, prev_page);
spin_unlock_irq(&zone->lru_lock);
+ trace_mm_compaction_inorder(page, newpage);
/*
* The newpage will replace LRU position of old page and
* old one would be freed. So let's adjust prev_page of pages
@@ -909,6 +915,7 @@ void __put_ilru_pages(struct page *page, struct page *newpage,
}
else {
spin_unlock_irq(&zone->lru_lock);
+ trace_mm_compaction_inorder(page, newpage);
putback_lru_page(newpage);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7668e8d..5af1ba0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -49,10 +49,9 @@
#include <linux/swapops.h>
#include "internal.h"
-
+#include <trace/events/inorder_putback.h>
#define CREATE_TRACE_POINTS
#include <trace/events/vmscan.h>
-
/*
* reclaim_mode determines how the inactive list is shrunk
* RECLAIM_MODE_SINGLE: Reclaim only order-0 pages
--
1.7.0.4
On Tue, Jun 07, 2011 at 11:38:14PM +0900, Minchan Kim wrote:
> acct_isolated of compaction uses page_lru_base_type which returns only
> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
> fields in conpact_control.
> This patch removes fields from compact_control and makes clear function of
> acct_issolated which counts the number of anon|file pages isolated.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 18 +++++-------------
> 1 files changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 021a296..61eab88 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -35,10 +35,6 @@ struct compact_control {
> unsigned long migrate_pfn; /* isolate_migratepages search base */
> bool sync; /* Synchronous migration */
>
> - /* Account for isolated anon and file pages */
> - unsigned long nr_anon;
> - unsigned long nr_file;
> -
> unsigned int order; /* order a direct compactor needs */
> int migratetype; /* MOVABLE, RECLAIMABLE etc */
> struct zone *zone;
> @@ -212,17 +208,13 @@ static void isolate_freepages(struct zone *zone,
> static void acct_isolated(struct zone *zone, struct compact_control *cc)
> {
> struct page *page;
> - unsigned int count[NR_LRU_LISTS] = { 0, };
> + unsigned int count[2] = { 0, };
>
> - list_for_each_entry(page, &cc->migratepages, lru) {
> - int lru = page_lru_base_type(page);
> - count[lru]++;
> - }
> + list_for_each_entry(page, &cc->migratepages, lru)
> + count[!!page_is_file_cache(page)]++;
>
> - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
> - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
> - __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
> - __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
> + __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
> + __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
You are hard-coding assumptions about the value of LRU_INACTIVE_ANON
and LRU_INACTIVE_FILE here. I have no expectation that these will
change but who knows for sure? To be robust against unexpected
changes, count should still be NR_LRU_LISTS and you should use the
LRU_INACTIVE_ANON and LRU_INACTIVE_FILE values.
I agree that summing LRU_ACTIVE_* is silly.
--
Mel Gorman
SUSE Labs
On Tue, Jun 07, 2011 at 11:38:15PM +0900, Minchan Kim wrote:
> This patch changes macro define with enum variable.
> Normally, enum is preferred as it's type-safe and making debugging easier
> as symbol can be passed throught to the debugger.
>
> This patch doesn't change old behavior.
> It is used by next patches.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/memcontrol.h | 5 ++++-
> include/linux/swap.h | 11 +++++++----
> include/trace/events/vmscan.h | 8 ++++----
> mm/compaction.c | 3 ++-
> mm/memcontrol.c | 3 ++-
> mm/migrate.c | 4 ++--
> mm/vmscan.c | 37 ++++++++++++++++++++-----------------
> 7 files changed, 41 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 5e9840f..91a1162 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -30,10 +30,13 @@ enum mem_cgroup_page_stat_item {
> MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
> };
>
> +enum ISOLATE_MODE;
> +
All caps. Really? It's NOT VERY PRETTY AND MAKES MY EYES DEAF WHICH
SHOULD BE IMPOSSIBLE.
What's wrong with
enum isolate_mode {
}
? It's reasonable that the constants themselves are in caps. It's
expected for defines and enum values.
> extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> struct list_head *dst,
> unsigned long *scanned, int order,
> - int mode, struct zone *z,
> + enum ISOLATE_MODE mode,
I don't mind as much now because I'm already eye-deaf.
> + struct zone *z,
> struct mem_cgroup *mem_cont,
> int active, int file);
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index a5c6da5..48d50e6 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -244,9 +244,11 @@ static inline void lru_cache_add_file(struct page *page)
> }
>
> /* LRU Isolation modes. */
> -#define ISOLATE_INACTIVE 0 /* Isolate inactive pages. */
> -#define ISOLATE_ACTIVE 1 /* Isolate active pages. */
> -#define ISOLATE_BOTH 2 /* Isolate both active and inactive pages. */
> +enum ISOLATE_MODE {
> + ISOLATE_NONE,
> + ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
> + ISOLATE_ACTIVE = 2, /* Isolate active pages */
> +};
No need to explicitly define like this. i.e. drop the = 1, = 2 etc.
The leader does not explain why ISOLATE_BOTH is replaced and what it
gains us.
>
> /* linux/mm/vmscan.c */
> extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> @@ -258,7 +260,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> gfp_t gfp_mask, bool noswap,
> unsigned int swappiness,
> struct zone *zone);
> -extern int __isolate_lru_page(struct page *page, int mode, int file);
> +extern int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
> + int file);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index ea422aa..4f53d43 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -187,7 +187,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
> unsigned long nr_lumpy_taken,
> unsigned long nr_lumpy_dirty,
> unsigned long nr_lumpy_failed,
> - int isolate_mode),
> + enum ISOLATE_MODE isolate_mode),
>
> TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode),
>
> @@ -199,7 +199,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
> __field(unsigned long, nr_lumpy_taken)
> __field(unsigned long, nr_lumpy_dirty)
> __field(unsigned long, nr_lumpy_failed)
> - __field(int, isolate_mode)
> + __field(enum ISOLATE_MODE, isolate_mode)
> ),
>
> TP_fast_assign(
> @@ -233,7 +233,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
> unsigned long nr_lumpy_taken,
> unsigned long nr_lumpy_dirty,
> unsigned long nr_lumpy_failed,
> - int isolate_mode),
> + enum ISOLATE_MODE isolate_mode),
>
> TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
>
> @@ -248,7 +248,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
> unsigned long nr_lumpy_taken,
> unsigned long nr_lumpy_dirty,
> unsigned long nr_lumpy_failed,
> - int isolate_mode),
> + enum ISOLATE_MODE isolate_mode),
>
And the meaning of isolate_mode has changed. This affects users of the
tracepoint. Documentation/trace/postprocess/trace-vmscan-postprocess.pl
will consider ISOLATE_NONE to be scanning for example.
> TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 61eab88..f0d75e9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -327,7 +327,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
> + if (__isolate_lru_page(page,
> + ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
Ahh, so you assign the enum to user it as a flag. That's very
unexpected to me. Why did you not do something like gfp_t which is a
bitwise type?
Because mode is a bitmask, it's also impossible to check if
ISOLATE_NONE is set. As ISOLATE_NONE is not used in this patch,
it's hard to know at this point what it's for.
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 010f916..f4c0b71 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1106,7 +1106,8 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
> unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> struct list_head *dst,
> unsigned long *scanned, int order,
> - int mode, struct zone *z,
> + enum ISOLATE_MODE mode,
> + struct zone *z,
> struct mem_cgroup *mem_cont,
> int active, int file)
> {
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 819d233..e797b5c 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1363,10 +1363,10 @@ int migrate_vmas(struct mm_struct *mm, const nodemask_t *to,
>
> for (vma = mm->mmap; vma && !err; vma = vma->vm_next) {
> if (vma->vm_ops && vma->vm_ops->migrate) {
> - err = vma->vm_ops->migrate(vma, to, from, flags);
> + err = vma->vm_ops->migrate(vma, to, from, flags);
> if (err)
> break;
> - }
> + }
Only modification here is adding whitespace damage.
> }
> return err;
> }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a658dde..4cbe114 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -957,23 +957,27 @@ keep_lumpy:
> *
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, int mode, int file)
> +int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> {
> + bool all_lru_mode;
> int ret = -EINVAL;
>
> /* Only take pages on the LRU. */
> if (!PageLRU(page))
> return ret;
>
> + all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
> + (ISOLATE_ACTIVE|ISOLATE_INACTIVE);
> +
both_lru would be a better name because all LRU implies that UNEVICTABLE
is involved which is not the case.
> /*
> * When checking the active state, we need to be sure we are
> * dealing with comparible boolean values. Take the logical not
> * of each.
> */
> - if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode))
> + if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
> return ret;
>
> - if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
> + if (!all_lru_mode && !!page_is_file_cache(page) != file)
> return ret;
>
> /*
> @@ -1021,7 +1025,8 @@ int __isolate_lru_page(struct page *page, int mode, int file)
> */
> static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> struct list_head *src, struct list_head *dst,
> - unsigned long *scanned, int order, int mode, int file)
> + unsigned long *scanned, int order, enum ISOLATE_MODE mode,
> + int file)
> {
> unsigned long nr_taken = 0;
> unsigned long nr_lumpy_taken = 0;
> @@ -1134,8 +1139,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> static unsigned long isolate_pages_global(unsigned long nr,
> struct list_head *dst,
> unsigned long *scanned, int order,
> - int mode, struct zone *z,
> - int active, int file)
> + enum ISOLATE_MODE mode,
> + struct zone *z, int active, int file)
> {
> int lru = LRU_BASE;
> if (active)
> @@ -1382,6 +1387,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> unsigned long nr_taken;
> unsigned long nr_anon;
> unsigned long nr_file;
> + enum ISOLATE_MODE reclaim_mode = ISOLATE_INACTIVE;
>
> while (unlikely(too_many_isolated(zone, file, sc))) {
> congestion_wait(BLK_RW_ASYNC, HZ/10);
> @@ -1392,15 +1398,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> }
>
> set_reclaim_mode(priority, sc, false);
> + if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
> + reclaim_mode |= ISOLATE_ACTIVE;
> +
> lru_add_drain();
> spin_lock_irq(&zone->lru_lock);
>
> if (scanning_global_lru(sc)) {
> - nr_taken = isolate_pages_global(nr_to_scan,
> - &page_list, &nr_scanned, sc->order,
> - sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
> - ISOLATE_BOTH : ISOLATE_INACTIVE,
> - zone, 0, file);
> + nr_taken = isolate_pages_global(nr_to_scan, &page_list,
> + &nr_scanned, sc->order, reclaim_mode, zone, 0, file);
> zone->pages_scanned += nr_scanned;
> if (current_is_kswapd())
> __count_zone_vm_events(PGSCAN_KSWAPD, zone,
While this looks ok, I do not see why it's better.
> @@ -1409,12 +1415,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> __count_zone_vm_events(PGSCAN_DIRECT, zone,
> nr_scanned);
> } else {
> - nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
> - &page_list, &nr_scanned, sc->order,
> - sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
> - ISOLATE_BOTH : ISOLATE_INACTIVE,
> - zone, sc->mem_cgroup,
> - 0, file);
> + nr_taken = mem_cgroup_isolate_pages(nr_to_scan, &page_list,
> + &nr_scanned, sc->order, reclaim_mode, zone,
> + sc->mem_cgroup, 0, file);
> /*
> * mem_cgroup_isolate_pages() keeps track of
> * scanned pages on its own.
Overall, I'm failing to see how this patch helps but maybe it's because
I haven't read the rest of the series. My main gripe is that meaning
of isolate_mode in the tracepoint has changed without any clear gain.
--
Mel Gorman
SUSE Labs
On Tue, Jun 07, 2011 at 11:38:16PM +0900, Minchan Kim wrote:
> There are some places to isolate lru page and I believe
> users of isolate_lru_page will be growing.
> The purpose of them is each different so part of isolated pages
> should put back to LRU, again.
>
> The problem is when we put back the page into LRU,
> we lose LRU ordering and the page is inserted at head of LRU list.
> It makes unnecessary LRU churning so that vm can evict working set pages
> rather than idle pages.
>
> This patch adds new modes when we isolate page in LRU so we don't isolate pages
> if we can't handle it. It could reduce LRU churning.
>
> This patch doesn't change old behavior. It's just used by next patches.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/swap.h | 2 ++
> mm/vmscan.c | 6 ++++++
> 2 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 48d50e6..731f5dd 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -248,6 +248,8 @@ enum ISOLATE_MODE {
> ISOLATE_NONE,
> ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
> ISOLATE_ACTIVE = 2, /* Isolate active pages */
> + ISOLATE_CLEAN = 8, /* Isolate clean file */
> + ISOLATE_UNMAPPED = 16, /* Isolate unmapped file */
> };
This really should be a bitwise type like gfp_t.
>
> /* linux/mm/vmscan.c */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4cbe114..26aa627 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -990,6 +990,12 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
>
> ret = -EBUSY;
>
> + if (mode & ISOLATE_CLEAN && (PageDirty(page) || PageWriteback(page)))
> + return ret;
> +
> + if (mode & ISOLATE_UNMAPPED && page_mapped(page))
> + return ret;
> +
> if (likely(get_page_unless_zero(page))) {
> /*
> * Be careful not to clear PageLRU until after we're
This patch does notuse ISOLATE_CLEAN or ISOLATE_UMAPPED anywhere. While
I can guess how they will be used, it would be easier to review if one
patch introduced ISOLATE_CLEAN and updated the call sites where it was
relevant. Same with ISOLATE_UNMAPPED.
Also when using & like this, I thought the compiler warned if it wasn't
in parenthesis but maybe that's wrong. The problem is the operator
precedence for bitwise AND and logical AND is easy to forget as it's
so rarely an issue.
i.e. it's easy to forget if
mode & ISOLATE_UNMAPPED && page_mapped(page)
means
mode & (ISOLATE_UNMAPPED && page_mapped(page))
or
(mode & ISOLATE_UNMAPPED) && page_mapped(page)
Be nice and specific for this one.
--
Mel Gorman
SUSE Labs
On Tue, Jun 07, 2011 at 11:38:17PM +0900, Minchan Kim wrote:
> In async mode, compaction doesn't migrate dirty or writeback pages.
> So, it's meaningless to pick the page and re-add it to lru list.
>
> Of course, when we isolate the page in compaction, the page might
> be dirty or writeback but when we try to migrate the page, the page
> would be not dirty, writeback. So it could be migrated. But it's
> very unlikely as isolate and migration cycle is much faster than
> writeout.
>
> So, this patch helps cpu and prevent unnecessary LRU churning.
>
> Cc: Andrea Arcangeli <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Reviewed-by: KOSAKI Motohiro <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f0d75e9..8079346 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -243,6 +243,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> unsigned long last_pageblock_nr = 0, pageblock_nr;
> unsigned long nr_scanned = 0, nr_isolated = 0;
> struct list_head *migratelist = &cc->migratepages;
> + enum ISOLATE_MODE mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>
> /* Do not scan outside zone boundaries */
> low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
> @@ -326,9 +327,11 @@ static unsigned long isolate_migratepages(struct zone *zone,
> continue;
> }
>
> + if (!cc->sync)
> + mode |= ISOLATE_CLEAN;
> +
> /* Try isolate the page */
> - if (__isolate_lru_page(page,
> - ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
> + if (__isolate_lru_page(page, mode, 0) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
Looks good for compaction! This change makes a lot of sense.
However I would expand the meaning of this patch. Introduce
ISOLATE_CLEAN in this patch and update both compaction and
__zone_reclaim. You might encounter some mess mapping RECLAIM_WRITE to
ISOLATE_CLEAN but nothing unmanageable.
--
Mel Gorman
SUSE Labs
On Tue, Jun 07, 2011 at 11:38:18PM +0900, Minchan Kim wrote:
> In __zone_reclaim case, we don't want to shrink mapped page.
> Nonetheless, we have isolated mapped page and re-add it into
> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>
Gack, I should have keep reading. I didn't cop from the subject that
__zone_reclaim would be updated. Still, it would have been easier to
review if one patch introduced ISOLATE_CLEAN and updated the callers and
this patch introduced ISOLATE_UNMAPPED and updated the relevant callers.
> <SNIP>
--
Mel Gorman
SUSE Labs
On Tue, Jun 07, 2011 at 11:38:19PM +0900, Minchan Kim wrote:
> This patch defines new APIs to put back the page into previous position of LRU.
> The idea I suggested in LSF/MM is simple.
>
> When we try to put back the page into lru list and if friends(prev, next) of the page
> still is nearest neighbor, we can insert isolated page into prev's next instead of
> head of LRU list. So it keeps LRU history without losing the LRU information.
>
> Before :
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> Isolate P3 :
> LRU POV : H - P1 - P2 - P4 - T
>
> Putback P3 :
> if (P2->next == P4)
> putback(P3, P2);
> So,
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> I implemented this idea in RFC but it had two problems.
>
> 1)
> For implement, I defined new structure _pages_lru_ which remembers
> both lru friend pages of isolated one and handling functions.
> For space of pages_lru, I allocated the space dynamically in kmalloc(GFP_AOTMIC)
ATOMIC
> but as we know, compaction is a reclaim path so it's not good idea to allocate memory
> dynamically in the path. The space need to store pages_lru is enough to allocate just a page
> as current compaction migrates unit of chunk of 32 pages.
> In addition, compaction makes sure lots of order-0 free pages before starting
> so it wouldn't a big problem, I think. But I admit it can pin some pages
> so migration successful ratio might be down if concurrent compaction happens.
>
> I decide changing my mind. I don't use dynamic memory space any more.
> As I see migration, we don't need doubly linked list of page->lru.
> Whole of operation is performed with enumeration so I think singly linked list is enough.
> If we can use singly linked list, we can use a pointer as another buffer.
> In here, we use it to store prev LRU page of page isolated.
>
> 2)
> The page-relation approach had a problem on contiguous pages.
> That's because the idea can not work since friend pages are isolated, too.
> It means prev_page->next == next_page always is _false_ and both pages are not
> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
> So for solving the problem, I changed the idea.
> We don't need both friend(prev, next) pages relation but just consider
> either prev or next page that it is still same LRU
>
> Worst case in this approach, prev or next page is free and allocate new
> so it's in head of LRU and our isolated page is located on next of head.
> But it's almost same situation with current problem. So it doesn't make worse
> than now.
> New idea works below.
>
> ===
>
> assume : we isolate pages P3~P7 and we consider only prev LRU pointer.
> notation : (P3,P2) = (isolated page,prev LRU page of isolated page)
>
> H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
>
> If we isolate P3, following as
>
> H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
> Isolated page list - (P3,P2)
>
> If we isolate P4, following as
>
> H - P1 - P2 - P5 - P6 - P7 - P8 - P9 - P10 - T
> Isolated page list - (P4,P2) - (P3,P2)
>
> If we isolate P5, following as
>
> H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
> Isolated page list - (P5,P2) - (P4,P2) - (P3,P2)
>
> ..
> ..
>
> If we isolate P7, following as
>
> H - P1 - P2 - P8 - P9 - P10 - T
>
> Isolated page list - (P7,P2) - (P6,P2) - (P5,P2) - (P4,P2) - (P3,P2)
>
> Let's start putback from P7
>
> P7)
>
> H - P1 - P2 - P8 - P9 - P10 - T
> prev P2 is valid, too. So,
>
> H - P1 - P2 - P7 - P8 - P9 - P10 - T
>
> P6)
>
> H - P1 - P2 - P7 - P8 - P9 - P10 - T
> Prev P2 is valid, too. So,
>
> H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
>
> ..
> ..
>
> P3)
> H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
> Prev P2 is valid, too. So,
>
> H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
>
Could move this explanation into migrate.h where you talk about why the
list is singly-linked instead of doubly-linked.
> ===
>
> Cc: Johannes Weiner <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/migrate.h | 35 ++++++++++++++++++++++
> include/linux/mm_types.h | 16 +++++++++-
> include/linux/swap.h | 4 ++-
> mm/compaction.c | 2 +-
> mm/internal.h | 2 +
> mm/memcontrol.c | 2 +-
> mm/migrate.c | 24 +++++++++++++++
> mm/swap.c | 2 +-
> mm/vmscan.c | 72 ++++++++++++++++++++++++++++++++++++++++++++--
> 9 files changed, 151 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index e39aeec..5914282 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -9,7 +9,42 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
> #ifdef CONFIG_MIGRATION
> #define PAGE_MIGRATION 1
>
> +/*
> + * Migratelist for compaction is singly linked list instead of double linked list.
> + * Current list utility is useful in some sense but we can't make sure compatibilty.
> + * Please use below functions instead of common list's ones.
> + */
> +static inline void INIT_MIGRATE_LIST(struct inorder_lru *list)
> +{
> + list->prev_page = NULL;
> + list->next = list;
> +}
> +
> +static inline int migratelist_empty(const struct inorder_lru *head)
> +{
> + return head->next == head;
> +}
> +
> +static inline void migratelist_add(struct page *page,
> + struct page *prev_page, struct inorder_lru *head)
> +{
> + VM_BUG_ON(PageLRU(page));
> +
> + page->ilru.prev_page = prev_page;
> + page->ilru.next = head->next;
> + head->next = &page->ilru;
> +}
> +
> +static inline void migratelist_del(struct page *page, struct inorder_lru *head)
> +{
> + head->next = page->ilru.next;
> +}
> +
Does this sort of list mangling not get upset when CONFIG_LIST_DEBUG is
set?
> +#define list_for_each_migrate_entry list_for_each_entry
> +#define list_for_each_migrate_entry_safe list_for_each_entry_safe
> +
> extern void putback_lru_pages(struct list_head *l);
> +extern void putback_ilru_pages(struct inorder_lru *l);
> extern int migrate_page(struct address_space *,
> struct page *, struct page *);
> extern int migrate_pages(struct list_head *l, new_page_t x,
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 02aa561..af46614 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -24,6 +24,17 @@ struct address_space;
>
> #define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
>
> +struct page;
> +
> +/*
> + * The inorder_lru is used by compaction for keeping LRU order
> + * during migration.
> + */
> +struct inorder_lru {
> + struct page *prev_page; /* prev LRU page of isolated page */
> + struct inorder_lru *next; /* next pointer for singly linked list*/
> +};
> +
> /*
> * Each physical page in the system has a struct page associated with
> * it to keep track of whatever it is we are using the page for at the
> @@ -72,9 +83,12 @@ struct page {
> pgoff_t index; /* Our offset within mapping. */
> void *freelist; /* SLUB: freelist req. slab lock */
> };
> - struct list_head lru; /* Pageout list, eg. active_list
> + union {
> + struct inorder_lru ilru;/* compaction: migrated page list */
> + struct list_head lru; /* Pageout list, eg. active_list
> * protected by zone->lru_lock !
> */
> + };
> /*
> * On machines where all RAM is mapped into kernel address space,
> * we can simply calculate the virtual address. On machines with
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 731f5dd..854244a 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -226,6 +226,8 @@ extern int lru_add_drain_all(void);
> extern void rotate_reclaimable_page(struct page *page);
> extern void deactivate_page(struct page *page);
> extern void swap_setup(void);
> +extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
> + int file, int rotated);
>
> extern void add_page_to_unevictable_list(struct page *page);
>
> @@ -263,7 +265,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> unsigned int swappiness,
> struct zone *zone);
> extern int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
> - int file);
> + int file, struct page **prev_page);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8079346..54dbdbe 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -331,7 +331,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> mode |= ISOLATE_CLEAN;
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, mode, 0) != 0)
> + if (__isolate_lru_page(page, mode, 0, NULL) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/internal.h b/mm/internal.h
> index 9d0ced8..a08d8c6 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -42,6 +42,8 @@ extern unsigned long highest_memmap_pfn;
> /*
> * in mm/vmscan.c:
> */
> +extern bool same_lru(struct page *page, struct page *prev);
> +extern void putback_page_to_lru(struct page *page, struct page *head_page);
> extern int isolate_lru_page(struct page *page);
> extern void putback_lru_page(struct page *page);
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f4c0b71..04d460d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1141,7 +1141,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> continue;
>
> scan++;
> - ret = __isolate_lru_page(page, mode, file);
> + ret = __isolate_lru_page(page, mode, file, NULL);
> switch (ret) {
> case 0:
> list_move(&page->lru, dst);
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e797b5c..874c081 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -84,6 +84,30 @@ void putback_lru_pages(struct list_head *l)
> }
> }
>
> +void putback_ilru_pages(struct inorder_lru *l)
> +{
> + struct zone *zone;
> + struct page *page, *page2, *prev;
> +
> + list_for_each_migrate_entry_safe(page, page2, l, ilru) {
> + dec_zone_page_state(page, NR_ISOLATED_ANON +
> + page_is_file_cache(page));
> + zone = page_zone(page);
> + spin_lock_irq(&zone->lru_lock);
> + prev = page->ilru.prev_page;
> + if (same_lru(page, prev)) {
> + putback_page_to_lru(page, prev);
> + spin_unlock_irq(&zone->lru_lock);
> + }
Ouch. So with putback_lru_pages(), pages get added to a pagevec but
now we are taking the zone LRU lock for every page we want to put
back? That is very painful. We're talking about THP promotion having to
take the zone lru lock, many hundreds if not thousands of times.
Minimally, I now think this patchset needs to be split into two -
the first set which avoids isolating pages from the LRU when it's
pointless like dirty pages for async compaction and __zone_reclaim. I
think that has clear merit and should be without cost. The second
set would be this to determine how taking the lru_lock an excessive
number of times can be avoided.
> + else {
> + spin_unlock_irq(&zone->lru_lock);
> + putback_lru_page(page);
> + }
> +
> + l->next = &page2->ilru;
> + }
> +}
> +
> /*
> * Restore a potential migration pte to a working pte entry
> */
> diff --git a/mm/swap.c b/mm/swap.c
> index 5602f1a..6c24a75 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
> }
> }
>
> -static void update_page_reclaim_stat(struct zone *zone, struct page *page,
> +void update_page_reclaim_stat(struct zone *zone, struct page *page,
> int file, int rotated)
> {
> struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c08911d..7668e8d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -550,6 +550,56 @@ int remove_mapping(struct address_space *mapping, struct page *page)
> return 0;
> }
>
> +/*
> + * If prev and page are on same LRU still, we can keep LRU order of page.
> + * zone->lru_lock must be hold.
> + */
> +bool same_lru(struct page *page, struct page *prev)
> +{
> + bool ret = false;
> + if (!prev || !PageLRU(prev))
> + goto out;
> +
> + if (unlikely(PageUnevictable(prev)))
> + goto out;
> +
> + if (page_lru_base_type(page) != page_lru_base_type(prev))
> + goto out;
> +
> + ret = true;
> +out:
> + return ret;
> +}
> +
> +/**
> + * putback_page_to_lru - put isolated @page onto @head
> + * @page: page to be put back to appropriate lru list
> + * @head_page: lru position to be put back
> + *
> + * Insert previously isolated @page to appropriate position of lru list
> + * zone->lru_lock must be hold.
> + */
> +void putback_page_to_lru(struct page *page, struct page *head_page)
> +{
> + int lru, active, file;
> + struct zone *zone = page_zone(page);
> +
> + VM_BUG_ON(PageLRU(page));
> +
> + lru = page_lru(head_page);
> + active = is_active_lru(lru);
> + file = is_file_lru(lru);
> +
> + if (active)
> + SetPageActive(page);
> + else
> + ClearPageActive(page);
> +
> + update_page_reclaim_stat(zone, page, file, active);
> + SetPageLRU(page);
> + __add_page_to_lru_list(zone, page, lru, &head_page->lru);
> +}
> +
> /**
> * putback_lru_page - put previously isolated page onto appropriate LRU list's head
> * @page: page to be put back to appropriate lru list
> @@ -954,10 +1004,13 @@ keep_lumpy:
> *
> * page: page to consider
> * mode: one of the LRU isolation modes defined above
> + * file: True [1] if isolating file [!anon] pages
> + * prev_page: prev page of isolated page as LRU order
> *
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> +int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
> + int file, struct page **prev_page)
> {
> bool all_lru_mode;
> int ret = -EINVAL;
> @@ -1003,6 +1056,18 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> * page release code relies on it.
> */
> ClearPageLRU(page);
> + if (prev_page != NULL) {
> + struct zone *zone = page_zone(page);
> + enum lru_list l = page_lru(page);
> +
> + if (&zone->lru[l].list == page->lru.prev) {
> + *prev_page = NULL;
> + goto out;
> + }
> +
> + *prev_page = lru_to_page(&page->lru);
> + }
> +out:
> ret = 0;
> }
>
> @@ -1052,7 +1117,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
> VM_BUG_ON(!PageLRU(page));
>
> - switch (__isolate_lru_page(page, mode, file)) {
> + switch (__isolate_lru_page(page, mode, file, NULL)) {
> case 0:
> list_move(&page->lru, dst);
> mem_cgroup_del_lru(page);
> @@ -1111,7 +1176,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> !PageSwapCache(cursor_page))
> break;
>
> - if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> + if (__isolate_lru_page(cursor_page,
> + mode, file, NULL) == 0) {
> list_move(&cursor_page->lru, dst);
> mem_cgroup_del_lru(cursor_page);
> nr_taken += hpage_nr_pages(page);
--
Mel Gorman
SUSE Labs
Hi Mel,
On Thu, Jun 09, 2011 at 02:33:27PM +0100, Mel Gorman wrote:
> On Tue, Jun 07, 2011 at 11:38:14PM +0900, Minchan Kim wrote:
> > acct_isolated of compaction uses page_lru_base_type which returns only
> > base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> > In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
> > fields in conpact_control.
> > This patch removes fields from compact_control and makes clear function of
> > acct_issolated which counts the number of anon|file pages isolated.
> >
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Acked-by: Rik van Riel <[email protected]>
> > Acked-by: Johannes Weiner <[email protected]>
> > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > mm/compaction.c | 18 +++++-------------
> > 1 files changed, 5 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 021a296..61eab88 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -35,10 +35,6 @@ struct compact_control {
> > unsigned long migrate_pfn; /* isolate_migratepages search base */
> > bool sync; /* Synchronous migration */
> >
> > - /* Account for isolated anon and file pages */
> > - unsigned long nr_anon;
> > - unsigned long nr_file;
> > -
> > unsigned int order; /* order a direct compactor needs */
> > int migratetype; /* MOVABLE, RECLAIMABLE etc */
> > struct zone *zone;
> > @@ -212,17 +208,13 @@ static void isolate_freepages(struct zone *zone,
> > static void acct_isolated(struct zone *zone, struct compact_control *cc)
> > {
> > struct page *page;
> > - unsigned int count[NR_LRU_LISTS] = { 0, };
> > + unsigned int count[2] = { 0, };
> >
> > - list_for_each_entry(page, &cc->migratepages, lru) {
> > - int lru = page_lru_base_type(page);
> > - count[lru]++;
> > - }
> > + list_for_each_entry(page, &cc->migratepages, lru)
> > + count[!!page_is_file_cache(page)]++;
> >
> > - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
> > - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
> > - __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
> > - __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
> > + __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
> > + __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
>
> You are hard-coding assumptions about the value of LRU_INACTIVE_ANON
> and LRU_INACTIVE_FILE here. I have no expectation that these will
I used page_is_file_cache and logical not.
If page_is_file_cache returns zero(ie, anon), logicacl not makes it with 0.
If page_is_file_cache doesn't return zero(ie, file), logical not makes it with 1.
So, anon pages would be put in count[0] and file pages would be in count[1].
Do I miss your point?
> change but who knows for sure? To be robust against unexpected
> changes, count should still be NR_LRU_LISTS and you should use the
> LRU_INACTIVE_ANON and LRU_INACTIVE_FILE values.
>
> I agree that summing LRU_ACTIVE_* is silly.
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards
Minchan Kim
On Thu, Jun 09, 2011 at 11:41:36PM +0900, Minchan Kim wrote:
> Hi Mel,
>
> On Thu, Jun 09, 2011 at 02:33:27PM +0100, Mel Gorman wrote:
> > On Tue, Jun 07, 2011 at 11:38:14PM +0900, Minchan Kim wrote:
> > > acct_isolated of compaction uses page_lru_base_type which returns only
> > > base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> > > In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
> > > fields in conpact_control.
> > > This patch removes fields from compact_control and makes clear function of
> > > acct_issolated which counts the number of anon|file pages isolated.
> > >
> > > Cc: KOSAKI Motohiro <[email protected]>
> > > Cc: Mel Gorman <[email protected]>
> > > Cc: Andrea Arcangeli <[email protected]>
> > > Acked-by: Rik van Riel <[email protected]>
> > > Acked-by: Johannes Weiner <[email protected]>
> > > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> > > Signed-off-by: Minchan Kim <[email protected]>
> > > ---
> > > mm/compaction.c | 18 +++++-------------
> > > 1 files changed, 5 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > index 021a296..61eab88 100644
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -35,10 +35,6 @@ struct compact_control {
> > > unsigned long migrate_pfn; /* isolate_migratepages search base */
> > > bool sync; /* Synchronous migration */
> > >
> > > - /* Account for isolated anon and file pages */
> > > - unsigned long nr_anon;
> > > - unsigned long nr_file;
> > > -
> > > unsigned int order; /* order a direct compactor needs */
> > > int migratetype; /* MOVABLE, RECLAIMABLE etc */
> > > struct zone *zone;
> > > @@ -212,17 +208,13 @@ static void isolate_freepages(struct zone *zone,
> > > static void acct_isolated(struct zone *zone, struct compact_control *cc)
> > > {
> > > struct page *page;
> > > - unsigned int count[NR_LRU_LISTS] = { 0, };
> > > + unsigned int count[2] = { 0, };
> > >
> > > - list_for_each_entry(page, &cc->migratepages, lru) {
> > > - int lru = page_lru_base_type(page);
> > > - count[lru]++;
> > > - }
> > > + list_for_each_entry(page, &cc->migratepages, lru)
> > > + count[!!page_is_file_cache(page)]++;
> > >
> > > - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
> > > - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
> > > - __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
> > > - __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
> > > + __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
> > > + __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
> >
> > You are hard-coding assumptions about the value of LRU_INACTIVE_ANON
> > and LRU_INACTIVE_FILE here. I have no expectation that these will
>
> I used page_is_file_cache and logical not.
> If page_is_file_cache returns zero(ie, anon), logicacl not makes it with 0.
> If page_is_file_cache doesn't return zero(ie, file), logical not makes it with 1.
> So, anon pages would be put in count[0] and file pages would be in count[1].
>
> Do I miss your point?
>
Nope, I simply missed the !!page_is_file_cache part and was still
seeing count[] in its old meaning - sorry, my bad. The patch now
makes sense to me.
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
SUSE Labs
On Thu, Jun 09, 2011 at 02:51:32PM +0100, Mel Gorman wrote:
> On Tue, Jun 07, 2011 at 11:38:15PM +0900, Minchan Kim wrote:
> > This patch changes macro define with enum variable.
> > Normally, enum is preferred as it's type-safe and making debugging easier
> > as symbol can be passed throught to the debugger.
> >
> > This patch doesn't change old behavior.
> > It is used by next patches.
> >
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Cc: Johannes Weiner <[email protected]>
> > Cc: KAMEZAWA Hiroyuki <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > include/linux/memcontrol.h | 5 ++++-
> > include/linux/swap.h | 11 +++++++----
> > include/trace/events/vmscan.h | 8 ++++----
> > mm/compaction.c | 3 ++-
> > mm/memcontrol.c | 3 ++-
> > mm/migrate.c | 4 ++--
> > mm/vmscan.c | 37 ++++++++++++++++++++-----------------
> > 7 files changed, 41 insertions(+), 30 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 5e9840f..91a1162 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -30,10 +30,13 @@ enum mem_cgroup_page_stat_item {
> > MEMCG_NR_FILE_MAPPED, /* # of pages charged as file rss */
> > };
> >
> > +enum ISOLATE_MODE;
> > +
>
> All caps. Really? It's NOT VERY PRETTY AND MAKES MY EYES DEAF WHICH
> SHOULD BE IMPOSSIBLE.
>
> What's wrong with
>
> enum isolate_mode {
> }
No problem.
I will let you hear in next version. :)
>
> ? It's reasonable that the constants themselves are in caps. It's
> expected for defines and enum values.
>
> > extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> > struct list_head *dst,
> > unsigned long *scanned, int order,
> > - int mode, struct zone *z,
> > + enum ISOLATE_MODE mode,
>
> I don't mind as much now because I'm already eye-deaf.
>
> > + struct zone *z,
> > struct mem_cgroup *mem_cont,
> > int active, int file);
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index a5c6da5..48d50e6 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -244,9 +244,11 @@ static inline void lru_cache_add_file(struct page *page)
> > }
> >
> > /* LRU Isolation modes. */
> > -#define ISOLATE_INACTIVE 0 /* Isolate inactive pages. */
> > -#define ISOLATE_ACTIVE 1 /* Isolate active pages. */
> > -#define ISOLATE_BOTH 2 /* Isolate both active and inactive pages. */
> > +enum ISOLATE_MODE {
> > + ISOLATE_NONE,
> > + ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
> > + ISOLATE_ACTIVE = 2, /* Isolate active pages */
> > +};
>
> No need to explicitly define like this. i.e. drop the = 1, = 2 etc.
>
> The leader does not explain why ISOLATE_BOTH is replaced and what it
> gains us.
In next patch, I add new modes in next patches and have a plan to add ISOLATE_UNEVICTALBE
Each mode would be a bit to represent each characteristic.
In such context, ISOLATE_BOTH is awkward and it can be made by ISOLATE_[ACTIVE & INACTIVE].
I will add it in description in next version.
>
> >
> > /* linux/mm/vmscan.c */
> > extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> > @@ -258,7 +260,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > gfp_t gfp_mask, bool noswap,
> > unsigned int swappiness,
> > struct zone *zone);
> > -extern int __isolate_lru_page(struct page *page, int mode, int file);
> > +extern int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
> > + int file);
> > extern unsigned long shrink_all_memory(unsigned long nr_pages);
> > extern int vm_swappiness;
> > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index ea422aa..4f53d43 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -187,7 +187,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
> > unsigned long nr_lumpy_taken,
> > unsigned long nr_lumpy_dirty,
> > unsigned long nr_lumpy_failed,
> > - int isolate_mode),
> > + enum ISOLATE_MODE isolate_mode),
> >
> > TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode),
> >
> > @@ -199,7 +199,7 @@ DECLARE_EVENT_CLASS(mm_vmscan_lru_isolate_template,
> > __field(unsigned long, nr_lumpy_taken)
> > __field(unsigned long, nr_lumpy_dirty)
> > __field(unsigned long, nr_lumpy_failed)
> > - __field(int, isolate_mode)
> > + __field(enum ISOLATE_MODE, isolate_mode)
> > ),
> >
> > TP_fast_assign(
> > @@ -233,7 +233,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_lru_isolate,
> > unsigned long nr_lumpy_taken,
> > unsigned long nr_lumpy_dirty,
> > unsigned long nr_lumpy_failed,
> > - int isolate_mode),
> > + enum ISOLATE_MODE isolate_mode),
> >
> > TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
> >
> > @@ -248,7 +248,7 @@ DEFINE_EVENT(mm_vmscan_lru_isolate_template, mm_vmscan_memcg_isolate,
> > unsigned long nr_lumpy_taken,
> > unsigned long nr_lumpy_dirty,
> > unsigned long nr_lumpy_failed,
> > - int isolate_mode),
> > + enum ISOLATE_MODE isolate_mode),
> >
>
> And the meaning of isolate_mode has changed. This affects users of the
> tracepoint. Documentation/trace/postprocess/trace-vmscan-postprocess.pl
> will consider ISOLATE_NONE to be scanning for example.
I missed that part.
I will consider it.
>
> > TP_ARGS(order, nr_requested, nr_scanned, nr_taken, nr_lumpy_taken, nr_lumpy_dirty, nr_lumpy_failed, isolate_mode)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 61eab88..f0d75e9 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -327,7 +327,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
> > }
> >
> > /* Try isolate the page */
> > - if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
> > + if (__isolate_lru_page(page,
> > + ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
>
> Ahh, so you assign the enum to user it as a flag. That's very
> unexpected to me. Why did you not do something like gfp_t which is a
> bitwise type?
>
Okay.
> Because mode is a bitmask, it's also impossible to check if
> ISOLATE_NONE is set. As ISOLATE_NONE is not used in this patch,
> it's hard to know at this point what it's for.
I might have some BUG_ON check about 0 flags but I can't remember it exactly.
I will remove.
>
> > continue;
> >
> > VM_BUG_ON(PageTransCompound(page));
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 010f916..f4c0b71 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1106,7 +1106,8 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
> > unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> > struct list_head *dst,
> > unsigned long *scanned, int order,
> > - int mode, struct zone *z,
> > + enum ISOLATE_MODE mode,
> > + struct zone *z,
> > struct mem_cgroup *mem_cont,
> > int active, int file)
> > {
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 819d233..e797b5c 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1363,10 +1363,10 @@ int migrate_vmas(struct mm_struct *mm, const nodemask_t *to,
> >
> > for (vma = mm->mmap; vma && !err; vma = vma->vm_next) {
> > if (vma->vm_ops && vma->vm_ops->migrate) {
> > - err = vma->vm_ops->migrate(vma, to, from, flags);
> > + err = vma->vm_ops->migrate(vma, to, from, flags);
> > if (err)
> > break;
> > - }
> > + }
>
> Only modification here is adding whitespace damage.
Oops.
>
> > }
> > return err;
> > }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a658dde..4cbe114 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -957,23 +957,27 @@ keep_lumpy:
> > *
> > * returns 0 on success, -ve errno on failure.
> > */
> > -int __isolate_lru_page(struct page *page, int mode, int file)
> > +int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> > {
> > + bool all_lru_mode;
> > int ret = -EINVAL;
> >
> > /* Only take pages on the LRU. */
> > if (!PageLRU(page))
> > return ret;
> >
> > + all_lru_mode = (mode & (ISOLATE_ACTIVE|ISOLATE_INACTIVE)) ==
> > + (ISOLATE_ACTIVE|ISOLATE_INACTIVE);
> > +
>
> both_lru would be a better name because all LRU implies that UNEVICTABLE
> is involved which is not the case.
Hmm.. Actually, I am considering another patch set which makes compaction possbile
on unevictable pages. In such context, I want to use all rather than both.
But it's future story. If you mind in this version strongly, it's no problem to
use both word. If you don't have strong mind, I want to keep all.
>
> > /*
> > * When checking the active state, we need to be sure we are
> > * dealing with comparible boolean values. Take the logical not
> > * of each.
> > */
> > - if (mode != ISOLATE_BOTH && (!PageActive(page) != !mode))
> > + if (!all_lru_mode && !PageActive(page) != !(mode & ISOLATE_ACTIVE))
> > return ret;
> >
> > - if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
> > + if (!all_lru_mode && !!page_is_file_cache(page) != file)
> > return ret;
> >
> > /*
> > @@ -1021,7 +1025,8 @@ int __isolate_lru_page(struct page *page, int mode, int file)
> > */
> > static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > struct list_head *src, struct list_head *dst,
> > - unsigned long *scanned, int order, int mode, int file)
> > + unsigned long *scanned, int order, enum ISOLATE_MODE mode,
> > + int file)
> > {
> > unsigned long nr_taken = 0;
> > unsigned long nr_lumpy_taken = 0;
> > @@ -1134,8 +1139,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > static unsigned long isolate_pages_global(unsigned long nr,
> > struct list_head *dst,
> > unsigned long *scanned, int order,
> > - int mode, struct zone *z,
> > - int active, int file)
> > + enum ISOLATE_MODE mode,
> > + struct zone *z, int active, int file)
> > {
> > int lru = LRU_BASE;
> > if (active)
> > @@ -1382,6 +1387,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> > unsigned long nr_taken;
> > unsigned long nr_anon;
> > unsigned long nr_file;
> > + enum ISOLATE_MODE reclaim_mode = ISOLATE_INACTIVE;
> >
> > while (unlikely(too_many_isolated(zone, file, sc))) {
> > congestion_wait(BLK_RW_ASYNC, HZ/10);
> > @@ -1392,15 +1398,15 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> > }
> >
> > set_reclaim_mode(priority, sc, false);
> > + if (sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM)
> > + reclaim_mode |= ISOLATE_ACTIVE;
> > +
> > lru_add_drain();
> > spin_lock_irq(&zone->lru_lock);
> >
> > if (scanning_global_lru(sc)) {
> > - nr_taken = isolate_pages_global(nr_to_scan,
> > - &page_list, &nr_scanned, sc->order,
> > - sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
> > - ISOLATE_BOTH : ISOLATE_INACTIVE,
> > - zone, 0, file);
> > + nr_taken = isolate_pages_global(nr_to_scan, &page_list,
> > + &nr_scanned, sc->order, reclaim_mode, zone, 0, file);
> > zone->pages_scanned += nr_scanned;
> > if (current_is_kswapd())
> > __count_zone_vm_events(PGSCAN_KSWAPD, zone,
>
> While this looks ok, I do not see why it's better.
>
> > @@ -1409,12 +1415,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> > __count_zone_vm_events(PGSCAN_DIRECT, zone,
> > nr_scanned);
> > } else {
> > - nr_taken = mem_cgroup_isolate_pages(nr_to_scan,
> > - &page_list, &nr_scanned, sc->order,
> > - sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
> > - ISOLATE_BOTH : ISOLATE_INACTIVE,
> > - zone, sc->mem_cgroup,
> > - 0, file);
> > + nr_taken = mem_cgroup_isolate_pages(nr_to_scan, &page_list,
> > + &nr_scanned, sc->order, reclaim_mode, zone,
> > + sc->mem_cgroup, 0, file);
> > /*
> > * mem_cgroup_isolate_pages() keeps track of
> > * scanned pages on its own.
>
> Overall, I'm failing to see how this patch helps but maybe it's because
> I haven't read the rest of the series. My main gripe is that meaning
> of isolate_mode in the tracepoint has changed without any clear gain.
Thanks for the careful review, Mel.
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards
Minchan Kim
On Thu, Jun 09, 2011 at 02:59:02PM +0100, Mel Gorman wrote:
> On Tue, Jun 07, 2011 at 11:38:16PM +0900, Minchan Kim wrote:
> > There are some places to isolate lru page and I believe
> > users of isolate_lru_page will be growing.
> > The purpose of them is each different so part of isolated pages
> > should put back to LRU, again.
> >
> > The problem is when we put back the page into LRU,
> > we lose LRU ordering and the page is inserted at head of LRU list.
> > It makes unnecessary LRU churning so that vm can evict working set pages
> > rather than idle pages.
> >
> > This patch adds new modes when we isolate page in LRU so we don't isolate pages
> > if we can't handle it. It could reduce LRU churning.
> >
> > This patch doesn't change old behavior. It's just used by next patches.
> >
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Cc: KAMEZAWA Hiroyuki <[email protected]>
> > Acked-by: Rik van Riel <[email protected]>
> > Acked-by: Johannes Weiner <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > include/linux/swap.h | 2 ++
> > mm/vmscan.c | 6 ++++++
> > 2 files changed, 8 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 48d50e6..731f5dd 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -248,6 +248,8 @@ enum ISOLATE_MODE {
> > ISOLATE_NONE,
> > ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
> > ISOLATE_ACTIVE = 2, /* Isolate active pages */
> > + ISOLATE_CLEAN = 8, /* Isolate clean file */
> > + ISOLATE_UNMAPPED = 16, /* Isolate unmapped file */
> > };
>
> This really should be a bitwise type like gfp_t.
Agree. As I said, I will change it.
>
> >
> > /* linux/mm/vmscan.c */
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 4cbe114..26aa627 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -990,6 +990,12 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> >
> > ret = -EBUSY;
> >
> > + if (mode & ISOLATE_CLEAN && (PageDirty(page) || PageWriteback(page)))
> > + return ret;
> > +
> > + if (mode & ISOLATE_UNMAPPED && page_mapped(page))
> > + return ret;
> > +
> > if (likely(get_page_unless_zero(page))) {
> > /*
> > * Be careful not to clear PageLRU until after we're
>
> This patch does notuse ISOLATE_CLEAN or ISOLATE_UMAPPED anywhere. While
> I can guess how they will be used, it would be easier to review if one
> patch introduced ISOLATE_CLEAN and updated the call sites where it was
> relevant. Same with ISOLATE_UNMAPPED.
Totally agree.
I also always wanted it to others. :(
>
> Also when using & like this, I thought the compiler warned if it wasn't
> in parenthesis but maybe that's wrong. The problem is the operator
My compiler(gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) was smart.
> precedence for bitwise AND and logical AND is easy to forget as it's
> so rarely an issue.
I will update the part for readability as well as compiler warning unexpected
>
> i.e. it's easy to forget if
>
> mode & ISOLATE_UNMAPPED && page_mapped(page)
>
> means
>
> mode & (ISOLATE_UNMAPPED && page_mapped(page))
>
> or
>
> (mode & ISOLATE_UNMAPPED) && page_mapped(page)
>
> Be nice and specific for this one.
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards
Minchan Kim
On Thu, Jun 09, 2011 at 03:04:53PM +0100, Mel Gorman wrote:
> On Tue, Jun 07, 2011 at 11:38:18PM +0900, Minchan Kim wrote:
> > In __zone_reclaim case, we don't want to shrink mapped page.
> > Nonetheless, we have isolated mapped page and re-add it into
> > LRU's head. It's unnecessary CPU overhead and makes LRU churning.
> >
>
> Gack, I should have keep reading. I didn't cop from the subject that
I should have used clear subject.
> __zone_reclaim would be updated. Still, it would have been easier to
> review if one patch introduced ISOLATE_CLEAN and updated the callers and
> this patch introduced ISOLATE_UNMAPPED and updated the relevant callers.
Agree. I will fix in later version.
>
> > <SNIP>
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards
Minchan Kim
On Thu, Jun 09, 2011 at 03:27:52PM +0100, Mel Gorman wrote:
> On Tue, Jun 07, 2011 at 11:38:19PM +0900, Minchan Kim wrote:
> > This patch defines new APIs to put back the page into previous position of LRU.
> > The idea I suggested in LSF/MM is simple.
> >
> > When we try to put back the page into lru list and if friends(prev, next) of the page
> > still is nearest neighbor, we can insert isolated page into prev's next instead of
> > head of LRU list. So it keeps LRU history without losing the LRU information.
> >
> > Before :
> > LRU POV : H - P1 - P2 - P3 - P4 -T
> >
> > Isolate P3 :
> > LRU POV : H - P1 - P2 - P4 - T
> >
> > Putback P3 :
> > if (P2->next == P4)
> > putback(P3, P2);
> > So,
> > LRU POV : H - P1 - P2 - P3 - P4 -T
> >
> > I implemented this idea in RFC but it had two problems.
> >
> > 1)
> > For implement, I defined new structure _pages_lru_ which remembers
> > both lru friend pages of isolated one and handling functions.
> > For space of pages_lru, I allocated the space dynamically in kmalloc(GFP_AOTMIC)
>
> ATOMIC
Will fix.
>
> > but as we know, compaction is a reclaim path so it's not good idea to allocate memory
> > dynamically in the path. The space need to store pages_lru is enough to allocate just a page
> > as current compaction migrates unit of chunk of 32 pages.
> > In addition, compaction makes sure lots of order-0 free pages before starting
> > so it wouldn't a big problem, I think. But I admit it can pin some pages
> > so migration successful ratio might be down if concurrent compaction happens.
> >
> > I decide changing my mind. I don't use dynamic memory space any more.
> > As I see migration, we don't need doubly linked list of page->lru.
> > Whole of operation is performed with enumeration so I think singly linked list is enough.
> > If we can use singly linked list, we can use a pointer as another buffer.
> > In here, we use it to store prev LRU page of page isolated.
> >
> > 2)
> > The page-relation approach had a problem on contiguous pages.
> > That's because the idea can not work since friend pages are isolated, too.
> > It means prev_page->next == next_page always is _false_ and both pages are not
> > LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
> > So for solving the problem, I changed the idea.
> > We don't need both friend(prev, next) pages relation but just consider
> > either prev or next page that it is still same LRU
> >
> > Worst case in this approach, prev or next page is free and allocate new
> > so it's in head of LRU and our isolated page is located on next of head.
> > But it's almost same situation with current problem. So it doesn't make worse
> > than now.
> > New idea works below.
> >
> > ===
> >
> > assume : we isolate pages P3~P7 and we consider only prev LRU pointer.
> > notation : (P3,P2) = (isolated page,prev LRU page of isolated page)
> >
> > H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
> >
> > If we isolate P3, following as
> >
> > H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
> > Isolated page list - (P3,P2)
> >
> > If we isolate P4, following as
> >
> > H - P1 - P2 - P5 - P6 - P7 - P8 - P9 - P10 - T
> > Isolated page list - (P4,P2) - (P3,P2)
> >
> > If we isolate P5, following as
> >
> > H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
> > Isolated page list - (P5,P2) - (P4,P2) - (P3,P2)
> >
> > ..
> > ..
> >
> > If we isolate P7, following as
> >
> > H - P1 - P2 - P8 - P9 - P10 - T
> >
> > Isolated page list - (P7,P2) - (P6,P2) - (P5,P2) - (P4,P2) - (P3,P2)
> >
> > Let's start putback from P7
> >
> > P7)
> >
> > H - P1 - P2 - P8 - P9 - P10 - T
> > prev P2 is valid, too. So,
> >
> > H - P1 - P2 - P7 - P8 - P9 - P10 - T
> >
> > P6)
> >
> > H - P1 - P2 - P7 - P8 - P9 - P10 - T
> > Prev P2 is valid, too. So,
> >
> > H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
> >
> > ..
> > ..
> >
> > P3)
> > H - P1 - P2 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
> > Prev P2 is valid, too. So,
> >
> > H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
> >
>
> Could move this explanation into migrate.h where you talk about why the
> list is singly-linked instead of doubly-linked.
Okay.
>
> > ===
> >
> > Cc: Johannes Weiner <[email protected]>
> > Cc: KAMEZAWA Hiroyuki <[email protected]>
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Acked-by: Rik van Riel <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
> > ---
> > include/linux/migrate.h | 35 ++++++++++++++++++++++
> > include/linux/mm_types.h | 16 +++++++++-
> > include/linux/swap.h | 4 ++-
> > mm/compaction.c | 2 +-
> > mm/internal.h | 2 +
> > mm/memcontrol.c | 2 +-
> > mm/migrate.c | 24 +++++++++++++++
> > mm/swap.c | 2 +-
> > mm/vmscan.c | 72 ++++++++++++++++++++++++++++++++++++++++++++--
> > 9 files changed, 151 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > index e39aeec..5914282 100644
> > --- a/include/linux/migrate.h
> > +++ b/include/linux/migrate.h
> > @@ -9,7 +9,42 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
> > #ifdef CONFIG_MIGRATION
> > #define PAGE_MIGRATION 1
> >
> > +/*
> > + * Migratelist for compaction is singly linked list instead of double linked list.
> > + * Current list utility is useful in some sense but we can't make sure compatibilty.
> > + * Please use below functions instead of common list's ones.
> > + */
> > +static inline void INIT_MIGRATE_LIST(struct inorder_lru *list)
> > +{
> > + list->prev_page = NULL;
> > + list->next = list;
> > +}
> > +
> > +static inline int migratelist_empty(const struct inorder_lru *head)
> > +{
> > + return head->next == head;
> > +}
> > +
> > +static inline void migratelist_add(struct page *page,
> > + struct page *prev_page, struct inorder_lru *head)
> > +{
> > + VM_BUG_ON(PageLRU(page));
> > +
> > + page->ilru.prev_page = prev_page;
> > + page->ilru.next = head->next;
> > + head->next = &page->ilru;
> > +}
> > +
> > +static inline void migratelist_del(struct page *page, struct inorder_lru *head)
> > +{
> > + head->next = page->ilru.next;
> > +}
> > +
>
> Does this sort of list mangling not get upset when CONFIG_LIST_DEBUG is
> set?
It's no problem as I didn't use general list utility functions.
For handling inorder_lru, we must use migratelist_[xxx] functions.
I mentioned it at heading.
>
> > +#define list_for_each_migrate_entry list_for_each_entry
> > +#define list_for_each_migrate_entry_safe list_for_each_entry_safe
> > +
> > extern void putback_lru_pages(struct list_head *l);
> > +extern void putback_ilru_pages(struct inorder_lru *l);
> > extern int migrate_page(struct address_space *,
> > struct page *, struct page *);
> > extern int migrate_pages(struct list_head *l, new_page_t x,
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 02aa561..af46614 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -24,6 +24,17 @@ struct address_space;
> >
> > #define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
> >
> > +struct page;
> > +
> > +/*
> > + * The inorder_lru is used by compaction for keeping LRU order
> > + * during migration.
> > + */
> > +struct inorder_lru {
> > + struct page *prev_page; /* prev LRU page of isolated page */
> > + struct inorder_lru *next; /* next pointer for singly linked list*/
> > +};
> > +
> > /*
> > * Each physical page in the system has a struct page associated with
> > * it to keep track of whatever it is we are using the page for at the
> > @@ -72,9 +83,12 @@ struct page {
> > pgoff_t index; /* Our offset within mapping. */
> > void *freelist; /* SLUB: freelist req. slab lock */
> > };
> > - struct list_head lru; /* Pageout list, eg. active_list
> > + union {
> > + struct inorder_lru ilru;/* compaction: migrated page list */
> > + struct list_head lru; /* Pageout list, eg. active_list
> > * protected by zone->lru_lock !
> > */
> > + };
> > /*
> > * On machines where all RAM is mapped into kernel address space,
> > * we can simply calculate the virtual address. On machines with
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 731f5dd..854244a 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -226,6 +226,8 @@ extern int lru_add_drain_all(void);
> > extern void rotate_reclaimable_page(struct page *page);
> > extern void deactivate_page(struct page *page);
> > extern void swap_setup(void);
> > +extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
> > + int file, int rotated);
> >
> > extern void add_page_to_unevictable_list(struct page *page);
> >
> > @@ -263,7 +265,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> > unsigned int swappiness,
> > struct zone *zone);
> > extern int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
> > - int file);
> > + int file, struct page **prev_page);
> > extern unsigned long shrink_all_memory(unsigned long nr_pages);
> > extern int vm_swappiness;
> > extern int remove_mapping(struct address_space *mapping, struct page *page);
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 8079346..54dbdbe 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -331,7 +331,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> > mode |= ISOLATE_CLEAN;
> >
> > /* Try isolate the page */
> > - if (__isolate_lru_page(page, mode, 0) != 0)
> > + if (__isolate_lru_page(page, mode, 0, NULL) != 0)
> > continue;
> >
> > VM_BUG_ON(PageTransCompound(page));
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 9d0ced8..a08d8c6 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -42,6 +42,8 @@ extern unsigned long highest_memmap_pfn;
> > /*
> > * in mm/vmscan.c:
> > */
> > +extern bool same_lru(struct page *page, struct page *prev);
> > +extern void putback_page_to_lru(struct page *page, struct page *head_page);
> > extern int isolate_lru_page(struct page *page);
> > extern void putback_lru_page(struct page *page);
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f4c0b71..04d460d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1141,7 +1141,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> > continue;
> >
> > scan++;
> > - ret = __isolate_lru_page(page, mode, file);
> > + ret = __isolate_lru_page(page, mode, file, NULL);
> > switch (ret) {
> > case 0:
> > list_move(&page->lru, dst);
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index e797b5c..874c081 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -84,6 +84,30 @@ void putback_lru_pages(struct list_head *l)
> > }
> > }
> >
> > +void putback_ilru_pages(struct inorder_lru *l)
> > +{
> > + struct zone *zone;
> > + struct page *page, *page2, *prev;
> > +
> > + list_for_each_migrate_entry_safe(page, page2, l, ilru) {
> > + dec_zone_page_state(page, NR_ISOLATED_ANON +
> > + page_is_file_cache(page));
> > + zone = page_zone(page);
> > + spin_lock_irq(&zone->lru_lock);
> > + prev = page->ilru.prev_page;
> > + if (same_lru(page, prev)) {
> > + putback_page_to_lru(page, prev);
> > + spin_unlock_irq(&zone->lru_lock);
> > + }
>
>
> Ouch. So with putback_lru_pages(), pages get added to a pagevec but
> now we are taking the zone LRU lock for every page we want to put
> back? That is very painful. We're talking about THP promotion having to
> take the zone lru lock, many hundreds if not thousands of times.
True but putback_ilru_pages is called only when migration is failed.
In my test, it was very _rare_(0.1% of total compaction) but I am not sure in
other workload.
In addition, we already have done it on unevictable pages.
It would be problem in many mlocked pages rather than compaction.
The my point is putback_ilru_pages calling(ie, fail of migrate_pages)
should be rare.
>
> Minimally, I now think this patchset needs to be split into two -
> the first set which avoids isolating pages from the LRU when it's
> pointless like dirty pages for async compaction and __zone_reclaim. I
> think that has clear merit and should be without cost. The second
> set would be this to determine how taking the lru_lock an excessive
> number of times can be avoided.
>
> > + else {
> > + spin_unlock_irq(&zone->lru_lock);
> > + putback_lru_page(page);
> > + }
> > +
> > + l->next = &page2->ilru;
> > + }
> > +}
> > +
> > /*
> > * Restore a potential migration pte to a working pte entry
> > */
> > diff --git a/mm/swap.c b/mm/swap.c
> > index 5602f1a..6c24a75 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
> > }
> > }
> >
> > -static void update_page_reclaim_stat(struct zone *zone, struct page *page,
> > +void update_page_reclaim_stat(struct zone *zone, struct page *page,
> > int file, int rotated)
> > {
> > struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index c08911d..7668e8d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -550,6 +550,56 @@ int remove_mapping(struct address_space *mapping, struct page *page)
> > return 0;
> > }
> >
> > +/*
> > + * If prev and page are on same LRU still, we can keep LRU order of page.
> > + * zone->lru_lock must be hold.
> > + */
> > +bool same_lru(struct page *page, struct page *prev)
> > +{
> > + bool ret = false;
> > + if (!prev || !PageLRU(prev))
> > + goto out;
> > +
> > + if (unlikely(PageUnevictable(prev)))
> > + goto out;
> > +
> > + if (page_lru_base_type(page) != page_lru_base_type(prev))
> > + goto out;
> > +
> > + ret = true;
> > +out:
> > + return ret;
> > +}
> > +
> > +/**
> > + * putback_page_to_lru - put isolated @page onto @head
> > + * @page: page to be put back to appropriate lru list
> > + * @head_page: lru position to be put back
> > + *
> > + * Insert previously isolated @page to appropriate position of lru list
> > + * zone->lru_lock must be hold.
> > + */
> > +void putback_page_to_lru(struct page *page, struct page *head_page)
> > +{
> > + int lru, active, file;
> > + struct zone *zone = page_zone(page);
> > +
> > + VM_BUG_ON(PageLRU(page));
> > +
> > + lru = page_lru(head_page);
> > + active = is_active_lru(lru);
> > + file = is_file_lru(lru);
> > +
> > + if (active)
> > + SetPageActive(page);
> > + else
> > + ClearPageActive(page);
> > +
> > + update_page_reclaim_stat(zone, page, file, active);
> > + SetPageLRU(page);
> > + __add_page_to_lru_list(zone, page, lru, &head_page->lru);
> > +}
> > +
> > /**
> > * putback_lru_page - put previously isolated page onto appropriate LRU list's head
> > * @page: page to be put back to appropriate lru list
> > @@ -954,10 +1004,13 @@ keep_lumpy:
> > *
> > * page: page to consider
> > * mode: one of the LRU isolation modes defined above
> > + * file: True [1] if isolating file [!anon] pages
> > + * prev_page: prev page of isolated page as LRU order
> > *
> > * returns 0 on success, -ve errno on failure.
> > */
> > -int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> > +int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode,
> > + int file, struct page **prev_page)
> > {
> > bool all_lru_mode;
> > int ret = -EINVAL;
> > @@ -1003,6 +1056,18 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
> > * page release code relies on it.
> > */
> > ClearPageLRU(page);
> > + if (prev_page != NULL) {
> > + struct zone *zone = page_zone(page);
> > + enum lru_list l = page_lru(page);
> > +
> > + if (&zone->lru[l].list == page->lru.prev) {
> > + *prev_page = NULL;
> > + goto out;
> > + }
> > +
> > + *prev_page = lru_to_page(&page->lru);
> > + }
> > +out:
> > ret = 0;
> > }
> >
> > @@ -1052,7 +1117,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> >
> > VM_BUG_ON(!PageLRU(page));
> >
> > - switch (__isolate_lru_page(page, mode, file)) {
> > + switch (__isolate_lru_page(page, mode, file, NULL)) {
> > case 0:
> > list_move(&page->lru, dst);
> > mem_cgroup_del_lru(page);
> > @@ -1111,7 +1176,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > !PageSwapCache(cursor_page))
> > break;
> >
> > - if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> > + if (__isolate_lru_page(cursor_page,
> > + mode, file, NULL) == 0) {
> > list_move(&cursor_page->lru, dst);
> > mem_cgroup_del_lru(cursor_page);
> > nr_taken += hpage_nr_pages(page);
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards
Minchan Kim
(2011/06/07 23:38), Minchan Kim wrote:
> acct_isolated of compaction uses page_lru_base_type which returns only
> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
> fields in conpact_control.
> This patch removes fields from compact_control and makes clear function of
> acct_issolated which counts the number of anon|file pages isolated.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
On Tue 07-06-11 23:38:14, Minchan Kim wrote:
> acct_isolated of compaction uses page_lru_base_type which returns only
> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
> fields in conpact_control.
> This patch removes fields from compact_control and makes clear function of
> acct_issolated which counts the number of anon|file pages isolated.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Sorry for the late reply. I have looked at the previous posting but
didn't have time to comment on it.
Yes, stack usage reduction makes sense and the function also looks more
compact.
Reviewed-by: Michal Hocko <[email protected]>
> ---
> mm/compaction.c | 18 +++++-------------
> 1 files changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 021a296..61eab88 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -35,10 +35,6 @@ struct compact_control {
> unsigned long migrate_pfn; /* isolate_migratepages search base */
> bool sync; /* Synchronous migration */
>
> - /* Account for isolated anon and file pages */
> - unsigned long nr_anon;
> - unsigned long nr_file;
> -
> unsigned int order; /* order a direct compactor needs */
> int migratetype; /* MOVABLE, RECLAIMABLE etc */
> struct zone *zone;
> @@ -212,17 +208,13 @@ static void isolate_freepages(struct zone *zone,
> static void acct_isolated(struct zone *zone, struct compact_control *cc)
> {
> struct page *page;
> - unsigned int count[NR_LRU_LISTS] = { 0, };
> + unsigned int count[2] = { 0, };
>
> - list_for_each_entry(page, &cc->migratepages, lru) {
> - int lru = page_lru_base_type(page);
> - count[lru]++;
> - }
> + list_for_each_entry(page, &cc->migratepages, lru)
> + count[!!page_is_file_cache(page)]++;
>
> - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
> - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
> - __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
> - __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
> + __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
> + __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
> }
>
> /* Similar to reclaim, but different enough that they don't share logic */
> --
> 1.7.0.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Tue 07-06-11 23:38:16, Minchan Kim wrote:
> There are some places to isolate lru page and I believe
> users of isolate_lru_page will be growing.
> The purpose of them is each different so part of isolated pages
> should put back to LRU, again.
>
> The problem is when we put back the page into LRU,
> we lose LRU ordering and the page is inserted at head of LRU list.
> It makes unnecessary LRU churning so that vm can evict working set pages
> rather than idle pages.
I guess that, although this is true, it doesn't fit in with this patch
very much because this patch doesn't fix this problem. It is a
preparation for for further work. I would expect this description with
the core patch that actlually handles this issue.
>
> This patch adds new modes when we isolate page in LRU so we don't isolate pages
> if we can't handle it. It could reduce LRU churning.
>
> This patch doesn't change old behavior. It's just used by next patches.
It doesn't because there is not user of those flags but maybe it would
be better to have those to see why it actually can reduce LRU
isolations.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/swap.h | 2 ++
> mm/vmscan.c | 6 ++++++
> 2 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 48d50e6..731f5dd 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -248,6 +248,8 @@ enum ISOLATE_MODE {
> ISOLATE_NONE,
> ISOLATE_INACTIVE = 1, /* Isolate inactive pages */
> ISOLATE_ACTIVE = 2, /* Isolate active pages */
> + ISOLATE_CLEAN = 8, /* Isolate clean file */
> + ISOLATE_UNMAPPED = 16, /* Isolate unmapped file */
> };
>
> /* linux/mm/vmscan.c */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4cbe114..26aa627 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -990,6 +990,12 @@ int __isolate_lru_page(struct page *page, enum ISOLATE_MODE mode, int file)
>
> ret = -EBUSY;
>
> + if (mode & ISOLATE_CLEAN && (PageDirty(page) || PageWriteback(page)))
> + return ret;
> +
> + if (mode & ISOLATE_UNMAPPED && page_mapped(page))
> + return ret;
> +
> if (likely(get_page_unless_zero(page))) {
> /*
> * Be careful not to clear PageLRU until after we're
> --
> 1.7.0.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Tue 07-06-11 23:38:17, Minchan Kim wrote:
> In async mode, compaction doesn't migrate dirty or writeback pages.
> So, it's meaningless to pick the page and re-add it to lru list.
>
> Of course, when we isolate the page in compaction, the page might
> be dirty or writeback but when we try to migrate the page, the page
> would be not dirty, writeback. So it could be migrated. But it's
> very unlikely as isolate and migration cycle is much faster than
> writeout.
>
> So, this patch helps cpu and prevent unnecessary LRU churning.
I think you should introduce ISOLATE_CLEAN with this patch.
Apart from that it makes perfect sense. Feel free to add my
Reviewed-by: Michal Hocko <[email protected]>
>
> Cc: Andrea Arcangeli <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Reviewed-by: KOSAKI Motohiro <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Acked-by: Mel Gorman <[email protected]>
> Acked-by: Rik van Riel <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f0d75e9..8079346 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -243,6 +243,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> unsigned long last_pageblock_nr = 0, pageblock_nr;
> unsigned long nr_scanned = 0, nr_isolated = 0;
> struct list_head *migratelist = &cc->migratepages;
> + enum ISOLATE_MODE mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
>
> /* Do not scan outside zone boundaries */
> low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
> @@ -326,9 +327,11 @@ static unsigned long isolate_migratepages(struct zone *zone,
> continue;
> }
>
> + if (!cc->sync)
> + mode |= ISOLATE_CLEAN;
> +
> /* Try isolate the page */
> - if (__isolate_lru_page(page,
> - ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0)
> + if (__isolate_lru_page(page, mode, 0) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> --
> 1.7.0.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Tue 07-06-11 23:38:18, Minchan Kim wrote:
> In __zone_reclaim case, we don't want to shrink mapped page.
> Nonetheless, we have isolated mapped page and re-add it into
> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>
> Of course, when we isolate the page, the page might be mapped but
> when we try to migrate the page, the page would be not mapped.
> So it could be migrated. But race is rare and although it happens,
> it's no big deal.
Same like with the previous patch. I think it would be better to
introduce ISOLATE_UNMAPPED here.
Other than that looks good to me
Reviewed-by: Michal Hocko <[email protected]>
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/vmscan.c | 17 +++++++++++++++--
> 1 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 26aa627..c08911d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1408,6 +1408,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
> reclaim_mode |= ISOLATE_ACTIVE;
>
> lru_add_drain();
> +
> + if (!sc->may_unmap)
> + reclaim_mode |= ISOLATE_UNMAPPED;
> + if (!sc->may_writepage)
> + reclaim_mode |= ISOLATE_CLEAN;
> +
> spin_lock_irq(&zone->lru_lock);
>
> if (scanning_global_lru(sc)) {
> @@ -1525,19 +1531,26 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
> struct page *page;
> struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> unsigned long nr_rotated = 0;
> + enum ISOLATE_MODE reclaim_mode = ISOLATE_ACTIVE;
>
> lru_add_drain();
> +
> + if (!sc->may_unmap)
> + reclaim_mode |= ISOLATE_UNMAPPED;
> + if (!sc->may_writepage)
> + reclaim_mode |= ISOLATE_CLEAN;
> +
> spin_lock_irq(&zone->lru_lock);
> if (scanning_global_lru(sc)) {
> nr_taken = isolate_pages_global(nr_pages, &l_hold,
> &pgscanned, sc->order,
> - ISOLATE_ACTIVE, zone,
> + reclaim_mode, zone,
> 1, file);
> zone->pages_scanned += pgscanned;
> } else {
> nr_taken = mem_cgroup_isolate_pages(nr_pages, &l_hold,
> &pgscanned, sc->order,
> - ISOLATE_ACTIVE, zone,
> + reclaim_mode, zone,
> sc->mem_cgroup, 1, file);
> /*
> * mem_cgroup_isolate_pages() keeps track of
> --
> 1.7.0.4
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
(2011/06/07 23:38), Minchan Kim wrote:
> In __zone_reclaim case, we don't want to shrink mapped page.
> Nonetheless, we have isolated mapped page and re-add it into
> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>
> Of course, when we isolate the page, the page might be mapped but
> when we try to migrate the page, the page would be not mapped.
> So it could be migrated. But race is rare and although it happens,
> it's no big deal.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
(2011/06/07 23:38), Minchan Kim wrote:
> The unmap_and_move is one of big messy functions.
> This patch try to clean up.
>
> It can help readability and make unmap_and_move_ilru simple.
> unmap_and_move_ilru will be introduced by next patch.
>
> Cc: Mel Gorman <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: KAMEZAWA Hiroyuki <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
On Sun, Jun 12, 2011 at 04:24:05PM +0200, Michal Hocko wrote:
> On Tue 07-06-11 23:38:14, Minchan Kim wrote:
> > acct_isolated of compaction uses page_lru_base_type which returns only
> > base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> > In addtion, cc->nr_[anon|file] is used in only acct_isolated so it doesn't have
> > fields in conpact_control.
> > This patch removes fields from compact_control and makes clear function of
> > acct_issolated which counts the number of anon|file pages isolated.
> >
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Acked-by: Rik van Riel <[email protected]>
> > Acked-by: Johannes Weiner <[email protected]>
> > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
>
> Sorry for the late reply. I have looked at the previous posting but
No problem. Thanks for the review, Michal.
> didn't have time to comment on it.
>
> Yes, stack usage reduction makes sense and the function also looks more
> compact.
>
> Reviewed-by: Michal Hocko <[email protected]>
>
--
Kind regards,
Minchan Kim
On Sun, Jun 12, 2011 at 04:45:21PM +0200, Michal Hocko wrote:
> On Tue 07-06-11 23:38:16, Minchan Kim wrote:
> > There are some places to isolate lru page and I believe
> > users of isolate_lru_page will be growing.
> > The purpose of them is each different so part of isolated pages
> > should put back to LRU, again.
> >
> > The problem is when we put back the page into LRU,
> > we lose LRU ordering and the page is inserted at head of LRU list.
> > It makes unnecessary LRU churning so that vm can evict working set pages
> > rather than idle pages.
>
> I guess that, although this is true, it doesn't fit in with this patch
> very much because this patch doesn't fix this problem. It is a
> preparation for for further work. I would expect this description with
> the core patch that actlually handles this issue.
Okay.
>
> >
> > This patch adds new modes when we isolate page in LRU so we don't isolate pages
> > if we can't handle it. It could reduce LRU churning.
> >
> > This patch doesn't change old behavior. It's just used by next patches.
>
> It doesn't because there is not user of those flags but maybe it would
> be better to have those to see why it actually can reduce LRU
> isolations.
Yes. Mel already pointed it out.
I will merge patches in next version.
And I have a idea to reduce lru_lock Mel mentiond
So maybe I will include it in next version, too.
But, now I have no time to revise it :(
--
Kind regards,
Minchan Kim