There are some places to isolated and putback.
For example, compaction does it for getting contiguous page.
The problem is that if we isolate page in the middle of LRU and putback it,
we lose LRU history as putback_lru_page inserts the page into head of LRU list.
It means we can evict working set pages.
This problem is discussed at LSF/MM.
This patch is for solving the problem as two methods.
* anti-churning
when we isolate page on LRU list, let's not isolate page we can't handle
* de-churning
when we putback page on LRU list, let's insert isolate page into previous lru position.
[1,2,3/8] is related to anti-churing.
[4,5/8] is things I found during making this series and
it's not dependent on this series so it could be merged.
[6,7,8/8] is related to de-churing.
[6/8] is core of in-order putback support but it has a problem on hugepage like
physicall contiguos page stream. It's pointed out by Rik. I have another idea to
solve it. It is written down on description of [6/8]. Please look at it.
It's just RFC so I need more time to make code clearness and test and get data.
But before futher progress, I hope listen about approach, design,
review the code(ex, locking, naming and so on) or anyting welcome.
This patches are based on mmotm-2011-04-14-15-08 and my simple test is passed.
Thanks.
Minchan Kim (8):
[1/8] Only isolate page we can handle
[2/8] compaction: make isolate_lru_page with filter aware
[3/8] vmscan: make isolate_lru_page with filter aware
[4/8] Make clear description of putback_lru_page
[5/8] compaction: remove active list counting
[6/8] In order putback lru core
[7/8] migration: make in-order-putback aware
[8/8] compaction: make compaction use in-order putback
include/linux/migrate.h | 6 ++-
include/linux/mm_types.h | 7 +++
include/linux/swap.h | 5 ++-
mm/compaction.c | 27 ++++++---
mm/internal.h | 2 +
mm/memcontrol.c | 2 +-
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 4 +-
mm/migrate.c | 133 +++++++++++++++++++++++++++++++++++++---------
mm/swap.c | 2 +-
mm/vmscan.c | 110 ++++++++++++++++++++++++++++++++++----
12 files changed, 247 insertions(+), 55 deletions(-)
There are some places to isolate lru page and I believe
users of isolate_lru_page will be growing.
The purpose of them is each different so part of isolated pages
should put back to LRU, again.
The problem is when we put back the page into LRU,
we lose LRU ordering and the page is inserted at head of LRU list.
It makes unnecessary LRU churning so that vm can evict working set pages
rather than idle pages.
This patch adds new filter mask when we isolate page in LRU.
So, we don't isolate pages if we can't handle it.
It could reduce LRU churning.
This patch shouldn't change old behavior.
It's just used by next patches.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/swap.h | 3 ++-
mm/compaction.c | 2 +-
mm/memcontrol.c | 2 +-
mm/vmscan.c | 26 ++++++++++++++++++++------
4 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 384eb5f..baef4ad 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -259,7 +259,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
unsigned int swappiness,
struct zone *zone,
unsigned long *nr_scanned);
-extern int __isolate_lru_page(struct page *page, int mode, int file);
+extern int __isolate_lru_page(struct page *page, int mode, int file,
+ int not_dirty, int not_mapped);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/compaction.c b/mm/compaction.c
index 021a296..dea32e3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -335,7 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
}
/* Try isolate the page */
- if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
+ if (__isolate_lru_page(page, ISOLATE_BOTH, 0, 0, 0) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c2776f1..471e7fd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
continue;
scan++;
- ret = __isolate_lru_page(page, mode, file);
+ ret = __isolate_lru_page(page, mode, file, 0, 0);
switch (ret) {
case 0:
list_move(&page->lru, dst);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b3a569f..71d2da9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -954,10 +954,13 @@ keep_lumpy:
*
* page: page to consider
* mode: one of the LRU isolation modes defined above
- *
+ * file: page be on a file LRU
+ * not_dirty: page should be not dirty or not writeback
+ * not_mapped: page should be not mapped
* returns 0 on success, -ve errno on failure.
*/
-int __isolate_lru_page(struct page *page, int mode, int file)
+int __isolate_lru_page(struct page *page, int mode, int file,
+ int not_dirty, int not_mapped)
{
int ret = -EINVAL;
@@ -976,6 +979,12 @@ int __isolate_lru_page(struct page *page, int mode, int file)
if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
return ret;
+ if (not_dirty)
+ if (PageDirty(page) || PageWriteback(page))
+ return ret;
+ if (not_mapped)
+ if (page_mapped(page))
+ return ret;
/*
* When this function is being called for lumpy reclaim, we
* initially look into all LRU pages, active, inactive and
@@ -1016,12 +1025,15 @@ int __isolate_lru_page(struct page *page, int mode, int file)
* @order: The caller's attempted allocation order
* @mode: One of the LRU isolation modes
* @file: True [1] if isolating file [!anon] pages
+ * @not_dirty: True [1] if isolating file [!dirty] pages
+ * @not_mapped: True [1] if isolating file [!mapped] pages
*
* returns how many pages were moved onto *@dst.
*/
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
struct list_head *src, struct list_head *dst,
- unsigned long *scanned, int order, int mode, int file)
+ unsigned long *scanned, int order, int mode, int file,
+ int not_dirty, int not_mapped)
{
unsigned long nr_taken = 0;
unsigned long nr_lumpy_taken = 0;
@@ -1041,7 +1053,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
VM_BUG_ON(!PageLRU(page));
- switch (__isolate_lru_page(page, mode, file)) {
+ switch (__isolate_lru_page(page, mode, file,
+ not_dirty, not_mapped)) {
case 0:
list_move(&page->lru, dst);
mem_cgroup_del_lru(page);
@@ -1100,7 +1113,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
!PageSwapCache(cursor_page))
break;
- if (__isolate_lru_page(cursor_page, mode, file) == 0) {
+ if (__isolate_lru_page(cursor_page, mode, file,
+ not_dirty, not_mapped) == 0) {
list_move(&cursor_page->lru, dst);
mem_cgroup_del_lru(cursor_page);
nr_taken += hpage_nr_pages(page);
@@ -1143,7 +1157,7 @@ static unsigned long isolate_pages_global(unsigned long nr,
if (file)
lru += LRU_FILE;
return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, order,
- mode, file);
+ mode, file, 0, 0);
}
/*
--
1.7.1
In some __zone_reclaim case, we don't want to shrink mapped page.
Nonetheless, we have isolated mapped page and re-add it into
LRU's head. It's unnecessary CPU overhead and makes LRU churning.
Of course, when we isolate the page, the page might be mapped but
when we try to migrate the page, the page would be not mapped.
So it could be migrated. But race is rare and although it happens,
it's no big deal.
Cc: Christoph Lameter <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/vmscan.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 71d2da9..e8d6190 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1147,7 +1147,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
static unsigned long isolate_pages_global(unsigned long nr,
struct list_head *dst,
- unsigned long *scanned, int order,
+ unsigned long *scanned,
+ struct scan_control *sc,
int mode, struct zone *z,
int active, int file)
{
@@ -1156,8 +1157,8 @@ static unsigned long isolate_pages_global(unsigned long nr,
lru += LRU_ACTIVE;
if (file)
lru += LRU_FILE;
- return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, order,
- mode, file, 0, 0);
+ return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, sc->order,
+ mode, file, 0, !sc->may_unmap);
}
/*
@@ -1407,7 +1408,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
if (scanning_global_lru(sc)) {
nr_taken = isolate_pages_global(nr_to_scan,
- &page_list, &nr_scanned, sc->order,
+ &page_list, &nr_scanned, sc,
sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
ISOLATE_BOTH : ISOLATE_INACTIVE,
zone, 0, file);
@@ -1531,7 +1532,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
spin_lock_irq(&zone->lru_lock);
if (scanning_global_lru(sc)) {
nr_taken = isolate_pages_global(nr_pages, &l_hold,
- &pgscanned, sc->order,
+ &pgscanned, sc,
ISOLATE_ACTIVE, zone,
1, file);
zone->pages_scanned += pgscanned;
--
1.7.1
In async mode, compaction doesn't migrate dirty or writeback pages.
So, it's meaningless to pick the page and re-add it to lru list.
Of course, when we isolate the page in compaction, the page might
be dirty or writeback but when we try to migrate the page, the page
would be not dirty, writeback. So it could be migrated. But it's
very unlikely as isolate and migration cycle is much faster than
writeout.
So, this patch helps cpu and prevent unnecessary LRU churning.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index dea32e3..9f80b5a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -335,7 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
}
/* Try isolate the page */
- if (__isolate_lru_page(page, ISOLATE_BOTH, 0, 0, 0) != 0)
+ if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
--
1.7.1
Commonly, putback_lru_page is used with isolated_lru_page.
The isolated_lru_page picks the page in middle of LRU and
putback_lru_page insert the lru in head of LRU.
It means it could make LRU churning so we have to be very careful.
Let's clear description of putback_lru_page.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/migrate.c | 2 +-
mm/vmscan.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 34132f8..819d233 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -68,7 +68,7 @@ int migrate_prep_local(void)
}
/*
- * Add isolated pages on the list back to the LRU under page lock
+ * Add isolated pages on the list back to the LRU's head under page lock
* to avoid leaking evictable pages back onto unevictable list.
*/
void putback_lru_pages(struct list_head *l)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e8d6190..5196f0c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -551,10 +551,10 @@ int remove_mapping(struct address_space *mapping, struct page *page)
}
/**
- * putback_lru_page - put previously isolated page onto appropriate LRU list
+ * putback_lru_page - put previously isolated page onto appropriate LRU list's head
* @page: page to be put back to appropriate lru list
*
- * Add previously isolated @page to appropriate LRU list.
+ * Add previously isolated @page to appropriate LRU list's head
* Page may still be unevictable for other reasons.
*
* lru_lock must not be held, interrupts must be enabled.
--
1.7.1
This patch defines new APIs to putback the page into previous position of LRU.
The idea is simple.
When we try to putback the page into lru list and if friends(prev, next) of the pages
still is nearest neighbor, we can insert isolated page into prev's next instead of
head of LRU list. So it keeps LRU history without losing the LRU information.
Before :
LRU POV : H - P1 - P2 - P3 - P4 -T
Isolate P3 :
LRU POV : H - P1 - P2 - P4 - T
Putback P3 :
if (P2->next == P4)
putback(P3, P2);
So,
LRU POV : H - P1 - P2 - P3 - P4 -T
For implement, we defines new structure pages_lru which remebers
both lru friend pages of isolated one and handling functions.
But this approach has a problem on contiguous pages.
In this case, my idea can not work since friend pages are isolated, too.
It means prev_page->next == next_page always is false and both pages are not
LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
So for solving the problem, I can change the idea.
I think we don't need both friend(prev, next) pages relation but
just consider either prev or next page that it is still same LRU.
Worset case in this approach, prev or next page is free and allocate new
so it's in head of LRU and our isolated page is located on next of head.
But it's almost same situation with current problem. So it doesn't make worse
than now and it would be rare. But in this version, I implement based on idea
discussed at LSF/MM. If my new idea makes sense, I will change it.
Any comment?
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/migrate.h | 2 +
include/linux/mm_types.h | 7 ++++
include/linux/swap.h | 4 ++-
mm/compaction.c | 3 +-
mm/internal.h | 2 +
mm/memcontrol.c | 2 +-
mm/migrate.c | 36 +++++++++++++++++++++
mm/swap.c | 2 +-
mm/vmscan.c | 79 +++++++++++++++++++++++++++++++++++++++++++--
9 files changed, 129 insertions(+), 8 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..3aa5ab6 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
#ifdef CONFIG_MIGRATION
#define PAGE_MIGRATION 1
+extern void putback_pages_lru(struct list_head *l);
extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *);
@@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
#else
#define PAGE_MIGRATION 0
+static inline void putback_pages_lru(struct list_head *l) {}
static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ca01ab2..35e80fb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -102,6 +102,13 @@ struct page {
#endif
};
+/* This structure is used for keeping LRU ordering of isolated page */
+struct pages_lru {
+ struct page *page; /* isolated page */
+ struct page *prev_page; /* previous page of isolate page as LRU order */
+ struct page *next_page; /* next page of isolate page as LRU order */
+ struct list_head lru;
+};
/*
* A region containing a mapping of a non-memory backed file under NOMMU
* conditions. These are held in a global tree and are pinned by the VMAs that
diff --git a/include/linux/swap.h b/include/linux/swap.h
index baef4ad..4ad0a0c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *page);
extern void deactivate_page(struct page *page);
extern void swap_setup(void);
+extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
+ int file, int rotated);
extern void add_page_to_unevictable_list(struct page *page);
/**
@@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
struct zone *zone,
unsigned long *nr_scanned);
extern int __isolate_lru_page(struct page *page, int mode, int file,
- int not_dirty, int not_mapped);
+ int not_dirty, int not_mapped, struct pages_lru *pages_lru);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/compaction.c b/mm/compaction.c
index 653b02b..c453000 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
}
/* Try isolate the page */
- if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
+ if (__isolate_lru_page(page, ISOLATE_BOTH, 0,
+ !cc->sync, 0, NULL) != 0)
continue;
VM_BUG_ON(PageTransCompound(page));
diff --git a/mm/internal.h b/mm/internal.h
index d071d38..3c8182c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn;
* in mm/vmscan.c:
*/
extern int isolate_lru_page(struct page *page);
+extern bool keep_lru_order(struct pages_lru *pages_lru);
+extern void putback_page_to_lru(struct page *page, struct list_head *head);
extern void putback_lru_page(struct page *page);
/*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 471e7fd..92a9046 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
continue;
scan++;
- ret = __isolate_lru_page(page, mode, file, 0, 0);
+ ret = __isolate_lru_page(page, mode, file, 0, 0, NULL);
switch (ret) {
case 0:
list_move(&page->lru, dst);
diff --git a/mm/migrate.c b/mm/migrate.c
index 819d233..9cfb63b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l)
}
/*
+ * This function is almost same iwth putback_lru_pages.
+ * The difference is that function receives struct pages_lru list
+ * and if possible, we add pages into original position of LRU
+ * instead of LRU's head.
+ */
+void putback_pages_lru(struct list_head *l)
+{
+ struct pages_lru *isolated_page;
+ struct pages_lru *isolated_page2;
+ struct page *page;
+
+ list_for_each_entry_safe(isolated_page, isolated_page2, l, lru) {
+ struct zone *zone;
+ page = isolated_page->page;
+ list_del(&isolated_page->lru);
+
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+
+ zone = page_zone(page);
+ spin_lock_irq(&zone->lru_lock);
+ if (keep_lru_order(isolated_page)) {
+ putback_page_to_lru(page, &isolated_page->prev_page->lru);
+ spin_unlock_irq(&zone->lru_lock);
+ }
+ else {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(page);
+ }
+
+ kfree(isolated_page);
+ }
+}
+
+
+/*
* Restore a potential migration pte to a working pte entry
*/
static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
diff --git a/mm/swap.c b/mm/swap.c
index a83ec5a..0cb15b7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
}
}
-static void update_page_reclaim_stat(struct zone *zone, struct page *page,
+void update_page_reclaim_stat(struct zone *zone, struct page *page,
int file, int rotated)
{
struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5196f0c..06a7c9b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -550,6 +550,58 @@ int remove_mapping(struct address_space *mapping, struct page *page)
return 0;
}
+/* zone->lru_lock must be hold */
+bool keep_lru_order(struct pages_lru *pages_lru)
+{
+ bool ret = false;
+ struct page *prev, *next;
+
+ if (!pages_lru->prev_page)
+ return ret;
+
+ prev = pages_lru->prev_page;
+ next = pages_lru->next_page;
+
+ if (!PageLRU(prev) || !PageLRU(next))
+ return ret;
+
+ if (prev->lru.next == &next->lru)
+ ret = true;
+
+ if (unlikely(PageUnevictable(prev)))
+ ret = false;
+
+ return ret;
+}
+
+/**
+ * putback_page_to_lru - put isolated @page onto @head
+ * @page: page to be put back to appropriate lru list
+ * @head: lru position to be put back
+ *
+ * Insert previously isolated @page to appropriate position of lru list
+ * zone->lru_lock must be hold.
+ */
+void putback_page_to_lru(struct page *page, struct list_head *head)
+{
+ int lru, active, file;
+ struct zone *zone = page_zone(page);
+ struct page *prev_page = container_of(head, struct page, lru);
+
+ lru = page_lru(prev_page);
+ active = is_active_lru(lru);
+ file = is_file_lru(lru);
+
+ if (active)
+ SetPageActive(page);
+ else
+ ClearPageActive(page);
+
+ update_page_reclaim_stat(zone, page, file, active);
+ SetPageLRU(page);
+ __add_page_to_lru_list(zone, page, lru, head);
+}
+
/**
* putback_lru_page - put previously isolated page onto appropriate LRU list's head
* @page: page to be put back to appropriate lru list
@@ -959,8 +1011,8 @@ keep_lumpy:
* not_mapped: page should be not mapped
* returns 0 on success, -ve errno on failure.
*/
-int __isolate_lru_page(struct page *page, int mode, int file,
- int not_dirty, int not_mapped)
+int __isolate_lru_page(struct page *page, int mode, int file, int not_dirty,
+ int not_mapped, struct pages_lru *pages_lru)
{
int ret = -EINVAL;
@@ -996,12 +1048,31 @@ int __isolate_lru_page(struct page *page, int mode, int file,
ret = -EBUSY;
if (likely(get_page_unless_zero(page))) {
+ struct zone *zone = page_zone(page);
+ enum lru_list l = page_lru(page);
/*
* Be careful not to clear PageLRU until after we're
* sure the page is not being freed elsewhere -- the
* page release code relies on it.
*/
ClearPageLRU(page);
+
+ if (!pages_lru)
+ goto skip;
+
+ pages_lru->page = page;
+ if (&zone->lru[l].list == pages_lru->lru.prev ||
+ &zone->lru[l].list == pages_lru->lru.next) {
+ pages_lru->prev_page = NULL;
+ pages_lru->next_page = NULL;
+ goto skip;
+ }
+
+ pages_lru->prev_page =
+ list_entry(page->lru.prev, struct page, lru);
+ pages_lru->next_page =
+ list_entry(page->lru.next, struct page, lru);
+skip:
ret = 0;
}
@@ -1054,7 +1125,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
VM_BUG_ON(!PageLRU(page));
switch (__isolate_lru_page(page, mode, file,
- not_dirty, not_mapped)) {
+ not_dirty, not_mapped, NULL)) {
case 0:
list_move(&page->lru, dst);
mem_cgroup_del_lru(page);
@@ -1114,7 +1185,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
break;
if (__isolate_lru_page(cursor_page, mode, file,
- not_dirty, not_mapped) == 0) {
+ not_dirty, not_mapped, NULL) == 0) {
list_move(&cursor_page->lru, dst);
mem_cgroup_del_lru(cursor_page);
nr_taken += hpage_nr_pages(page);
--
1.7.1
This patch makes migrate_pages is aware of in-order putback
This patch should be not changed old behavior.
It's used by next patch.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/migrate.h | 4 +-
mm/compaction.c | 2 +-
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 4 +-
mm/migrate.c | 95 +++++++++++++++++++++++++++++++++++------------
6 files changed, 78 insertions(+), 31 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3aa5ab6..f842fc8 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -15,7 +15,7 @@ extern int migrate_page(struct address_space *,
struct page *, struct page *);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- bool sync);
+ bool sync, bool keep_lru);
extern int migrate_huge_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync);
@@ -38,7 +38,7 @@ static inline void putback_pages_lru(struct list_head *l) {}
static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- bool sync) { return -ENOSYS; }
+ bool sync, bool keep_lru) { return -ENOSYS; }
static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
bool sync) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index c453000..a2f6e96 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -529,7 +529,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
nr_migrate = cc->nr_migratepages;
err = migrate_pages(&cc->migratepages, compaction_alloc,
(unsigned long)cc, false,
- cc->sync);
+ cc->sync, false);
update_nr_listpages(cc);
nr_remaining = cc->nr_migratepages;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2b9a5ee..395a99e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1466,7 +1466,7 @@ int soft_offline_page(struct page *page, int flags)
list_add(&page->lru, &pagelist);
ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
- 0, true);
+ 0, true, false);
if (ret) {
putback_lru_pages(&pagelist);
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 59ac18f..75dd241 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -741,7 +741,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
}
/* this function returns # of failed pages */
ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
- true, true);
+ true, true, false);
if (ret)
putback_lru_pages(&source);
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8e57a72..9fe702a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -938,7 +938,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_node_page, dest,
- false, true);
+ false, true, false);
if (err)
putback_lru_pages(&pagelist);
}
@@ -1159,7 +1159,7 @@ static long do_mbind(unsigned long start, unsigned long len,
if (!list_empty(&pagelist)) {
nr_failed = migrate_pages(&pagelist, new_vma_page,
(unsigned long)vma,
- false, true);
+ false, true, false);
if (nr_failed)
putback_lru_pages(&pagelist);
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 9cfb63b..871e6ee 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -662,7 +662,8 @@ static int move_to_new_page(struct page *newpage, struct page *page,
* to the newly allocated page in newpage.
*/
static int unmap_and_move(new_page_t get_new_page, unsigned long private,
- struct page *page, int force, bool offlining, bool sync)
+ struct page *page, int force, bool offlining,
+ bool sync, struct pages_lru *pages_lru)
{
int rc = 0;
int *result = NULL;
@@ -671,6 +672,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
int charge = 0;
struct mem_cgroup *mem;
struct anon_vma *anon_vma = NULL;
+ bool del_pages_lru = false;
if (!newpage)
return -ENOMEM;
@@ -834,7 +836,13 @@ move_newpage:
* migrated will have kepts its references and be
* restored.
*/
- list_del(&page->lru);
+ if (pages_lru) {
+ list_del(&pages_lru->lru);
+ del_pages_lru = true;
+ }
+ else
+ list_del(&page->lru);
+
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
putback_lru_page(page);
@@ -844,7 +852,21 @@ move_newpage:
* Move the new page to the LRU. If migration was not successful
* then this will free the page.
*/
- putback_lru_page(newpage);
+ if (pages_lru) {
+ struct zone *zone = page_zone(page);
+ spin_lock_irq(&zone->lru_lock);
+ if (keep_lru_order(pages_lru)) {
+ putback_page_to_lru(newpage, &pages_lru->prev_page->lru);
+ spin_unlock_irq(&zone->lru_lock);
+ }
+ else {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(newpage);
+ }
+
+ if (del_pages_lru)
+ kfree(pages_lru);
+ }
if (result) {
if (rc)
@@ -947,13 +969,13 @@ out:
*/
int migrate_pages(struct list_head *from,
new_page_t get_new_page, unsigned long private, bool offlining,
- bool sync)
+ bool sync, bool keep_lru)
{
int retry = 1;
int nr_failed = 0;
int pass = 0;
- struct page *page;
- struct page *page2;
+ struct page *page, *page2;
+ struct pages_lru *pages_lru, *pages_lru2;
int swapwrite = current->flags & PF_SWAPWRITE;
int rc;
@@ -962,26 +984,51 @@ int migrate_pages(struct list_head *from,
for(pass = 0; pass < 10 && retry; pass++) {
retry = 0;
+ if (!keep_lru) {
+ list_for_each_entry_safe(page, page2, from, lru) {
+ cond_resched();
- list_for_each_entry_safe(page, page2, from, lru) {
- cond_resched();
-
- rc = unmap_and_move(get_new_page, private,
+ rc = unmap_and_move(get_new_page, private,
page, pass > 2, offlining,
- sync);
-
- switch(rc) {
- case -ENOMEM:
- goto out;
- case -EAGAIN:
- retry++;
- break;
- case 0:
- break;
- default:
- /* Permanent failure */
- nr_failed++;
- break;
+ sync, NULL);
+
+ switch(rc) {
+ case -ENOMEM:
+ goto out;
+ case -EAGAIN:
+ retry++;
+ break;
+ case 0:
+ break;
+ default:
+ /* Permanent failure */
+ nr_failed++;
+ break;
+ }
+ }
+ }
+ else {
+
+ list_for_each_entry_safe(pages_lru, pages_lru2, from, lru) {
+ cond_resched();
+
+ rc = unmap_and_move(get_new_page, private,
+ pages_lru->page, pass > 2, offlining,
+ sync, pages_lru);
+
+ switch(rc) {
+ case -ENOMEM:
+ goto out;
+ case -EAGAIN:
+ retry++;
+ break;
+ case 0:
+ break;
+ default:
+ /* Permanent failure */
+ nr_failed++;
+ break;
+ }
}
}
}
--
1.7.1
Compaction is good solution to get contiguos page but it makes
LRU churing which is not good.
This patch makes that compaction code use in-order putback so
after compaction completion, migrated pages are keeping LRU ordering.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 22 +++++++++++++++-------
1 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a2f6e96..480d2ac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -211,11 +211,11 @@ static void isolate_freepages(struct zone *zone,
/* Update the number of anon and file isolated pages in the zone */
static void acct_isolated(struct zone *zone, struct compact_control *cc)
{
- struct page *page;
+ struct pages_lru *pages_lru;
unsigned int count[NR_LRU_LISTS] = { 0, };
- list_for_each_entry(page, &cc->migratepages, lru) {
- int lru = page_lru_base_type(page);
+ list_for_each_entry(pages_lru, &cc->migratepages, lru) {
+ int lru = page_lru_base_type(pages_lru->page);
count[lru]++;
}
@@ -281,6 +281,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
spin_lock_irq(&zone->lru_lock);
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
+ struct pages_lru *pages_lru;
bool locked = true;
/* give a chance to irqs before checking need_resched() */
@@ -334,10 +335,16 @@ static unsigned long isolate_migratepages(struct zone *zone,
continue;
}
+ pages_lru = kmalloc(sizeof(struct pages_lru), GFP_ATOMIC);
+ if (pages_lru)
+ continue;
+
/* Try isolate the page */
- if (__isolate_lru_page(page, ISOLATE_BOTH, 0,
- !cc->sync, 0, NULL) != 0)
+ if ( __isolate_lru_page(page, ISOLATE_BOTH, 0,
+ !cc->sync, 0, pages_lru) != 0) {
+ kfree(pages_lru);
continue;
+ }
VM_BUG_ON(PageTransCompound(page));
@@ -398,8 +405,9 @@ static void update_nr_listpages(struct compact_control *cc)
int nr_migratepages = 0;
int nr_freepages = 0;
struct page *page;
+ struct pages_lru *pages_lru;
- list_for_each_entry(page, &cc->migratepages, lru)
+ list_for_each_entry(pages_lru, &cc->migratepages, lru)
nr_migratepages++;
list_for_each_entry(page, &cc->freepages, lru)
nr_freepages++;
@@ -542,7 +550,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
/* Release LRU pages not migrated */
if (err) {
- putback_lru_pages(&cc->migratepages);
+ putback_pages_lru(&cc->migratepages);
cc->nr_migratepages = 0;
}
--
1.7.1
acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/compaction.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 9f80b5a..653b02b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -219,8 +219,8 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
count[lru]++;
}
- cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
- cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
+ cc->nr_anon = count[LRU_INACTIVE_ANON];
+ cc->nr_file = count[LRU_INACTIVE_FILE];
__mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
__mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
}
--
1.7.1
On Wed, Apr 27, 2011 at 1:25 AM, Minchan Kim <[email protected]> wrote:
> This patch defines new APIs to putback the page into previous position of LRU.
> The idea is simple.
>
> When we try to putback the page into lru list and if friends(prev, next) of the pages
> still is nearest neighbor, we can insert isolated page into prev's next instead of
> head of LRU list. So it keeps LRU history without losing the LRU information.
>
> Before :
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> Isolate P3 :
> LRU POV : H - P1 - P2 - P4 - T
>
> Putback P3 :
> if (P2->next == P4)
> putback(P3, P2);
> So,
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> For implement, we defines new structure pages_lru which remebers
> both lru friend pages of isolated one and handling functions.
>
> But this approach has a problem on contiguous pages.
> In this case, my idea can not work since friend pages are isolated, too.
> It means prev_page->next == next_page always is false and both pages are not
> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
> So for solving the problem, I can change the idea.
> I think we don't need both friend(prev, next) pages relation but
> just consider either prev or next page that it is still same LRU.
> Worset case in this approach, prev or next page is free and allocate new
> so it's in head of LRU and our isolated page is located on next of head.
> But it's almost same situation with current problem. So it doesn't make worse
> than now and it would be rare. But in this version, I implement based on idea
> discussed at LSF/MM. If my new idea makes sense, I will change it.
>
> Any comment?
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/migrate.h | 2 +
> include/linux/mm_types.h | 7 ++++
> include/linux/swap.h | 4 ++-
> mm/compaction.c | 3 +-
> mm/internal.h | 2 +
> mm/memcontrol.c | 2 +-
> mm/migrate.c | 36 +++++++++++++++++++++
> mm/swap.c | 2 +-
> mm/vmscan.c | 79 +++++++++++++++++++++++++++++++++++++++++++--
> 9 files changed, 129 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index e39aeec..3aa5ab6 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
> #ifdef CONFIG_MIGRATION
> #define PAGE_MIGRATION 1
>
> +extern void putback_pages_lru(struct list_head *l);
> extern void putback_lru_pages(struct list_head *l);
> extern int migrate_page(struct address_space *,
> struct page *, struct page *);
> @@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
> #else
> #define PAGE_MIGRATION 0
>
> +static inline void putback_pages_lru(struct list_head *l) {}
> static inline void putback_lru_pages(struct list_head *l) {}
> static inline int migrate_pages(struct list_head *l, new_page_t x,
> unsigned long private, bool offlining,
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index ca01ab2..35e80fb 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -102,6 +102,13 @@ struct page {
> #endif
> };
>
> +/* This structure is used for keeping LRU ordering of isolated page */
> +struct pages_lru {
> + struct page *page; /* isolated page */
> + struct page *prev_page; /* previous page of isolate page as LRU order */
> + struct page *next_page; /* next page of isolate page as LRU order */
> + struct list_head lru;
> +};
> /*
> * A region containing a mapping of a non-memory backed file under NOMMU
> * conditions. These are held in a global tree and are pinned by the VMAs that
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index baef4ad..4ad0a0c 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *page);
> extern void deactivate_page(struct page *page);
> extern void swap_setup(void);
>
> +extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
> + int file, int rotated);
> extern void add_page_to_unevictable_list(struct page *page);
>
> /**
> @@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone,
> unsigned long *nr_scanned);
> extern int __isolate_lru_page(struct page *page, int mode, int file,
> - int not_dirty, int not_mapped);
> + int not_dirty, int not_mapped, struct pages_lru *pages_lru);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 653b02b..c453000 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0,
> + !cc->sync, 0, NULL) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/internal.h b/mm/internal.h
> index d071d38..3c8182c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn;
> * in mm/vmscan.c:
> */
> extern int isolate_lru_page(struct page *page);
> +extern bool keep_lru_order(struct pages_lru *pages_lru);
> +extern void putback_page_to_lru(struct page *page, struct list_head *head);
> extern void putback_lru_page(struct page *page);
>
> /*
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 471e7fd..92a9046 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> continue;
>
> scan++;
> - ret = __isolate_lru_page(page, mode, file, 0, 0);
> + ret = __isolate_lru_page(page, mode, file, 0, 0, NULL);
> switch (ret) {
> case 0:
> list_move(&page->lru, dst);
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 819d233..9cfb63b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l)
> }
>
> /*
> + * This function is almost same iwth putback_lru_pages.
> + * The difference is that function receives struct pages_lru list
> + * and if possible, we add pages into original position of LRU
> + * instead of LRU's head.
> + */
> +void putback_pages_lru(struct list_head *l)
> +{
> + struct pages_lru *isolated_page;
> + struct pages_lru *isolated_page2;
> + struct page *page;
> +
> + list_for_each_entry_safe(isolated_page, isolated_page2, l, lru) {
> + struct zone *zone;
> + page = isolated_page->page;
> + list_del(&isolated_page->lru);
> +
> + dec_zone_page_state(page, NR_ISOLATED_ANON +
> + page_is_file_cache(page));
> +
> + zone = page_zone(page);
> + spin_lock_irq(&zone->lru_lock);
> + if (keep_lru_order(isolated_page)) {
> + putback_page_to_lru(page, &isolated_page->prev_page->lru);
> + spin_unlock_irq(&zone->lru_lock);
> + }
> + else {
> + spin_unlock_irq(&zone->lru_lock);
> + putback_lru_page(page);
> + }
> +
> + kfree(isolated_page);
> + }
> +}
> +
> +
> +/*
> * Restore a potential migration pte to a working pte entry
> */
> static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> diff --git a/mm/swap.c b/mm/swap.c
> index a83ec5a..0cb15b7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
> }
> }
>
> -static void update_page_reclaim_stat(struct zone *zone, struct page *page,
> +void update_page_reclaim_stat(struct zone *zone, struct page *page,
> int file, int rotated)
> {
> struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5196f0c..06a7c9b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -550,6 +550,58 @@ int remove_mapping(struct address_space *mapping, struct page *page)
> return 0;
> }
>
> +/* zone->lru_lock must be hold */
> +bool keep_lru_order(struct pages_lru *pages_lru)
> +{
> + bool ret = false;
> + struct page *prev, *next;
> +
> + if (!pages_lru->prev_page)
> + return ret;
> +
> + prev = pages_lru->prev_page;
> + next = pages_lru->next_page;
> +
> + if (!PageLRU(prev) || !PageLRU(next))
> + return ret;
> +
> + if (prev->lru.next == &next->lru)
> + ret = true;
> +
> + if (unlikely(PageUnevictable(prev)))
> + ret = false;
> +
> + return ret;
> +}
> +
> +/**
> + * putback_page_to_lru - put isolated @page onto @head
> + * @page: page to be put back to appropriate lru list
> + * @head: lru position to be put back
> + *
> + * Insert previously isolated @page to appropriate position of lru list
> + * zone->lru_lock must be hold.
> + */
> +void putback_page_to_lru(struct page *page, struct list_head *head)
> +{
> + int lru, active, file;
> + struct zone *zone = page_zone(page);
> + struct page *prev_page = container_of(head, struct page, lru);
> +
> + lru = page_lru(prev_page);
> + active = is_active_lru(lru);
> + file = is_file_lru(lru);
> +
> + if (active)
> + SetPageActive(page);
> + else
> + ClearPageActive(page);
> +
> + update_page_reclaim_stat(zone, page, file, active);
> + SetPageLRU(page);
> + __add_page_to_lru_list(zone, page, lru, head);
> +}
> +
> /**
> * putback_lru_page - put previously isolated page onto appropriate LRU list's head
> * @page: page to be put back to appropriate lru list
> @@ -959,8 +1011,8 @@ keep_lumpy:
> * not_mapped: page should be not mapped
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, int mode, int file,
> - int not_dirty, int not_mapped)
> +int __isolate_lru_page(struct page *page, int mode, int file, int not_dirty,
> + int not_mapped, struct pages_lru *pages_lru)
> {
> int ret = -EINVAL;
>
> @@ -996,12 +1048,31 @@ int __isolate_lru_page(struct page *page, int mode, int file,
> ret = -EBUSY;
>
> if (likely(get_page_unless_zero(page))) {
> + struct zone *zone = page_zone(page);
> + enum lru_list l = page_lru(page);
> /*
> * Be careful not to clear PageLRU until after we're
> * sure the page is not being freed elsewhere -- the
> * page release code relies on it.
> */
> ClearPageLRU(page);
> +
> + if (!pages_lru)
> + goto skip;
> +
> + pages_lru->page = page;
> + if (&zone->lru[l].list == pages_lru->lru.prev ||
> + &zone->lru[l].list == pages_lru->lru.next) {
> + pages_lru->prev_page = NULL;
> + pages_lru->next_page = NULL;
> + goto skip;
> + }
While I was refactoring the code, I might got sleep.
It should be following as,
@@ -1071,8 +1072,8 @@ int __isolate_lru_page(struct page *page, int
mode, int file, int not_dirty,
goto skip;
pages_lru->page = page;
- if (&zone->lru[l].list == pages_lru->lru.prev ||
- &zone->lru[l].list == pages_lru->lru.next) {
+ if (&zone->lru[l].list == page->lru.prev ||
+ &zone->lru[l].list == page->lru.next) {
pages_lru->prev_page = NULL;
pages_lru->next_page = NULL;
goto skip;
--
Kind regards,
Minchan Kim
I should have add up this piece.
I will resend all after work.
===
diff --git a/mm/compaction.c b/mm/compaction.c
index 95af5bc..59a675c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -327,7 +328,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
/* Successfully isolated */
del_page_from_lru_list(zone, page, page_lru(page));
- list_add(&page->lru, migratelist);
+ list_add(&pages_lru->lru, migratelist);
cc->nr_migratepages++;
nr_isolated++;
@@ -525,7 +525,7 @@ static int compact_zone(struct zone *zone, struct
compact_control *cc)
nr_migrate = cc->nr_migratepages;
migrate_pages(&cc->migratepages, compaction_alloc,
(unsigned long)cc, false,
- cc->sync, false);
+ cc->sync, true);
count_vm_event(PGMIGRATE);
update_nr_listpages(cc);
nr_remaining = cc->nr_migratepages;
==
--
Kind regards,
Minchan Kim
On Wed, 27 Apr 2011 01:25:18 +0900
Minchan Kim <[email protected]> wrote:
> There are some places to isolate lru page and I believe
> users of isolate_lru_page will be growing.
> The purpose of them is each different so part of isolated pages
> should put back to LRU, again.
>
> The problem is when we put back the page into LRU,
> we lose LRU ordering and the page is inserted at head of LRU list.
> It makes unnecessary LRU churning so that vm can evict working set pages
> rather than idle pages.
>
> This patch adds new filter mask when we isolate page in LRU.
> So, we don't isolate pages if we can't handle it.
> It could reduce LRU churning.
>
> This patch shouldn't change old behavior.
> It's just used by next patches.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
nitpick below.
> ---
> include/linux/swap.h | 3 ++-
> mm/compaction.c | 2 +-
> mm/memcontrol.c | 2 +-
> mm/vmscan.c | 26 ++++++++++++++++++++------
> 4 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 384eb5f..baef4ad 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -259,7 +259,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> unsigned int swappiness,
> struct zone *zone,
> unsigned long *nr_scanned);
> -extern int __isolate_lru_page(struct page *page, int mode, int file);
> +extern int __isolate_lru_page(struct page *page, int mode, int file,
> + int not_dirty, int not_mapped);
Hmm, which is better to use 4 binary args or a flag with bitmask ?
Thanks,
-Kame
On Wed, 27 Apr 2011 01:25:19 +0900
Minchan Kim <[email protected]> wrote:
> In async mode, compaction doesn't migrate dirty or writeback pages.
> So, it's meaningless to pick the page and re-add it to lru list.
>
> Of course, when we isolate the page in compaction, the page might
> be dirty or writeback but when we try to migrate the page, the page
> would be not dirty, writeback. So it could be migrated. But it's
> very unlikely as isolate and migration cycle is much faster than
> writeout.
>
> So, this patch helps cpu and prevent unnecessary LRU churning.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
seems very nice to me. Thank you.
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
On Wed, 27 Apr 2011 01:25:20 +0900
Minchan Kim <[email protected]> wrote:
> In some __zone_reclaim case, we don't want to shrink mapped page.
> Nonetheless, we have isolated mapped page and re-add it into
> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>
> Of course, when we isolate the page, the page might be mapped but
> when we try to migrate the page, the page would be not mapped.
> So it could be migrated. But race is rare and although it happens,
> it's no big deal.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Hmm, it seems mm/memcontrol.c::mem_cgroup_isolate_pages() should be updated, too.
But it's okay you start from global LRU.
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/vmscan.c | 11 ++++++-----
> 1 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 71d2da9..e8d6190 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1147,7 +1147,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
> static unsigned long isolate_pages_global(unsigned long nr,
> struct list_head *dst,
> - unsigned long *scanned, int order,
> + unsigned long *scanned,
> + struct scan_control *sc,
> int mode, struct zone *z,
> int active, int file)
> {
> @@ -1156,8 +1157,8 @@ static unsigned long isolate_pages_global(unsigned long nr,
> lru += LRU_ACTIVE;
> if (file)
> lru += LRU_FILE;
> - return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, order,
> - mode, file, 0, 0);
> + return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, sc->order,
> + mode, file, 0, !sc->may_unmap);
> }
>
> /*
> @@ -1407,7 +1408,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>
> if (scanning_global_lru(sc)) {
> nr_taken = isolate_pages_global(nr_to_scan,
> - &page_list, &nr_scanned, sc->order,
> + &page_list, &nr_scanned, sc,
> sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
> ISOLATE_BOTH : ISOLATE_INACTIVE,
> zone, 0, file);
> @@ -1531,7 +1532,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
> spin_lock_irq(&zone->lru_lock);
> if (scanning_global_lru(sc)) {
> nr_taken = isolate_pages_global(nr_pages, &l_hold,
> - &pgscanned, sc->order,
> + &pgscanned, sc,
> ISOLATE_ACTIVE, zone,
> 1, file);
> zone->pages_scanned += pgscanned;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Wed, Apr 27, 2011 at 4:54 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 27 Apr 2011 01:25:18 +0900
> Minchan Kim <[email protected]> wrote:
>
>> There are some places to isolate lru page and I believe
>> users of isolate_lru_page will be growing.
>> The purpose of them is each different so part of isolated pages
>> should put back to LRU, again.
>>
>> The problem is when we put back the page into LRU,
>> we lose LRU ordering and the page is inserted at head of LRU list.
>> It makes unnecessary LRU churning so that vm can evict working set pages
>> rather than idle pages.
>>
>> This patch adds new filter mask when we isolate page in LRU.
>> So, we don't isolate pages if we can't handle it.
>> It could reduce LRU churning.
>>
>> This patch shouldn't change old behavior.
>> It's just used by next patches.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
>
> nitpick below.
>
>> ---
>> include/linux/swap.h | 3 ++-
>> mm/compaction.c | 2 +-
>> mm/memcontrol.c | 2 +-
>> mm/vmscan.c | 26 ++++++++++++++++++++------
>> 4 files changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 384eb5f..baef4ad 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -259,7 +259,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>> unsigned int swappiness,
>> struct zone *zone,
>> unsigned long *nr_scanned);
>> -extern int __isolate_lru_page(struct page *page, int mode, int file);
>> +extern int __isolate_lru_page(struct page *page, int mode, int file,
>> + int not_dirty, int not_mapped);
>
> Hmm, which is better to use 4 binary args or a flag with bitmask ?
Yes. Even I added new flags one more in next patch.
So I try to use bitmask flag in next version.
Thanks.
>
> Thanks,
> -Kame
>
>
--
Kind regards,
Minchan Kim
On Wed, Apr 27, 2011 at 1:25 AM, Minchan Kim <[email protected]> wrote:
> There are some places to isolate lru page and I believe
> users of isolate_lru_page will be growing.
> The purpose of them is each different so part of isolated pages
> should put back to LRU, again.
>
> The problem is when we put back the page into LRU,
> we lose LRU ordering and the page is inserted at head of LRU list.
> It makes unnecessary LRU churning so that vm can evict working set pages
> rather than idle pages.
>
> This patch adds new filter mask when we isolate page in LRU.
> So, we don't isolate pages if we can't handle it.
> It could reduce LRU churning.
>
> This patch shouldn't change old behavior.
> It's just used by next patches.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/swap.h | 3 ++-
> mm/compaction.c | 2 +-
> mm/memcontrol.c | 2 +-
> mm/vmscan.c | 26 ++++++++++++++++++++------
> 4 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 384eb5f..baef4ad 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -259,7 +259,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> unsigned int swappiness,
> struct zone *zone,
> unsigned long *nr_scanned);
> -extern int __isolate_lru_page(struct page *page, int mode, int file);
> +extern int __isolate_lru_page(struct page *page, int mode, int file,
> + int not_dirty, int not_mapped);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 021a296..dea32e3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -335,7 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0, 0, 0) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c2776f1..471e7fd 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> continue;
>
> scan++;
> - ret = __isolate_lru_page(page, mode, file);
> + ret = __isolate_lru_page(page, mode, file, 0, 0);
> switch (ret) {
> case 0:
> list_move(&page->lru, dst);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b3a569f..71d2da9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -954,10 +954,13 @@ keep_lumpy:
> *
> * page: page to consider
> * mode: one of the LRU isolation modes defined above
> - *
> + * file: page be on a file LRU
> + * not_dirty: page should be not dirty or not writeback
> + * not_mapped: page should be not mapped
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, int mode, int file)
> +int __isolate_lru_page(struct page *page, int mode, int file,
> + int not_dirty, int not_mapped)
> {
> int ret = -EINVAL;
>
> @@ -976,6 +979,12 @@ int __isolate_lru_page(struct page *page, int mode, int file)
> if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
> return ret;
>
> + if (not_dirty)
> + if (PageDirty(page) || PageWriteback(page))
> + return ret;
> + if (not_mapped)
> + if (page_mapped(page))
> + return ret;
I should have fixed this return value.
Now caller regards -EINVAL with BUG.
I will fix it in next version.
--
Kind regards,
Minchan Kim
On Wed, 27 Apr 2011 01:25:21 +0900
Minchan Kim <[email protected]> wrote:
> Commonly, putback_lru_page is used with isolated_lru_page.
> The isolated_lru_page picks the page in middle of LRU and
> putback_lru_page insert the lru in head of LRU.
> It means it could make LRU churning so we have to be very careful.
> Let's clear description of putback_lru_page.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
seems good...
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
But is there consensus which side of LRU is tail? head?
I always need to revisit codes when I see a word head/tail....
On Wed, 27 Apr 2011 01:25:22 +0900
Minchan Kim <[email protected]> wrote:
> acct_isolated of compaction uses page_lru_base_type which returns only
> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
I think some comments are necessary to explain why INACTIVE only.
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> ---
> mm/compaction.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9f80b5a..653b02b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -219,8 +219,8 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
> count[lru]++;
> }
>
> - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
> - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
> + cc->nr_anon = count[LRU_INACTIVE_ANON];
> + cc->nr_file = count[LRU_INACTIVE_FILE];
> __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
> __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
> }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Wed, 27 Apr 2011 01:25:23 +0900
Minchan Kim <[email protected]> wrote:
> This patch defines new APIs to putback the page into previous position of LRU.
> The idea is simple.
>
> When we try to putback the page into lru list and if friends(prev, next) of the pages
> still is nearest neighbor, we can insert isolated page into prev's next instead of
> head of LRU list. So it keeps LRU history without losing the LRU information.
>
> Before :
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> Isolate P3 :
> LRU POV : H - P1 - P2 - P4 - T
>
> Putback P3 :
> if (P2->next == P4)
> putback(P3, P2);
> So,
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> For implement, we defines new structure pages_lru which remebers
> both lru friend pages of isolated one and handling functions.
>
> But this approach has a problem on contiguous pages.
> In this case, my idea can not work since friend pages are isolated, too.
> It means prev_page->next == next_page always is false and both pages are not
> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
> So for solving the problem, I can change the idea.
> I think we don't need both friend(prev, next) pages relation but
> just consider either prev or next page that it is still same LRU.
> Worset case in this approach, prev or next page is free and allocate new
> so it's in head of LRU and our isolated page is located on next of head.
> But it's almost same situation with current problem. So it doesn't make worse
> than now and it would be rare. But in this version, I implement based on idea
> discussed at LSF/MM. If my new idea makes sense, I will change it.
>
I think using only 'next'(prev?) pointer will be enough.
Thanks,
-Kame
On Wed, 27 Apr 2011 01:25:25 +0900
Minchan Kim <[email protected]> wrote:
> Compaction is good solution to get contiguos page but it makes
> LRU churing which is not good.
> This patch makes that compaction code use in-order putback so
> after compaction completion, migrated pages are keeping LRU ordering.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 22 +++++++++++++++-------
> 1 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a2f6e96..480d2ac 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -211,11 +211,11 @@ static void isolate_freepages(struct zone *zone,
> /* Update the number of anon and file isolated pages in the zone */
> static void acct_isolated(struct zone *zone, struct compact_control *cc)
> {
> - struct page *page;
> + struct pages_lru *pages_lru;
> unsigned int count[NR_LRU_LISTS] = { 0, };
>
> - list_for_each_entry(page, &cc->migratepages, lru) {
> - int lru = page_lru_base_type(page);
> + list_for_each_entry(pages_lru, &cc->migratepages, lru) {
> + int lru = page_lru_base_type(pages_lru->page);
> count[lru]++;
> }
>
> @@ -281,6 +281,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> spin_lock_irq(&zone->lru_lock);
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
> + struct pages_lru *pages_lru;
> bool locked = true;
>
> /* give a chance to irqs before checking need_resched() */
> @@ -334,10 +335,16 @@ static unsigned long isolate_migratepages(struct zone *zone,
> continue;
> }
>
> + pages_lru = kmalloc(sizeof(struct pages_lru), GFP_ATOMIC);
> + if (pages_lru)
> + continue;
Hmm, can't we use fixed size of statically allocated pages_lru, per-node or
per-zone ? I think using kmalloc() in memory reclaim path is risky.
Thanks,
-Kame
On Wed, Apr 27, 2011 at 5:39 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 27 Apr 2011 01:25:25 +0900
> Minchan Kim <[email protected]> wrote:
>
>> Compaction is good solution to get contiguos page but it makes
>> LRU churing which is not good.
>> This patch makes that compaction code use in-order putback so
>> after compaction completion, migrated pages are keeping LRU ordering.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> mm/compaction.c | 22 +++++++++++++++-------
>> 1 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index a2f6e96..480d2ac 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -211,11 +211,11 @@ static void isolate_freepages(struct zone *zone,
>> /* Update the number of anon and file isolated pages in the zone */
>> static void acct_isolated(struct zone *zone, struct compact_control *cc)
>> {
>> - struct page *page;
>> + struct pages_lru *pages_lru;
>> unsigned int count[NR_LRU_LISTS] = { 0, };
>>
>> - list_for_each_entry(page, &cc->migratepages, lru) {
>> - int lru = page_lru_base_type(page);
>> + list_for_each_entry(pages_lru, &cc->migratepages, lru) {
>> + int lru = page_lru_base_type(pages_lru->page);
>> count[lru]++;
>> }
>>
>> @@ -281,6 +281,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
>> spin_lock_irq(&zone->lru_lock);
>> for (; low_pfn < end_pfn; low_pfn++) {
>> struct page *page;
>> + struct pages_lru *pages_lru;
>> bool locked = true;
>>
>> /* give a chance to irqs before checking need_resched() */
>> @@ -334,10 +335,16 @@ static unsigned long isolate_migratepages(struct zone *zone,
>> continue;
>> }
>>
>> + pages_lru = kmalloc(sizeof(struct pages_lru), GFP_ATOMIC);
>> + if (pages_lru)
>> + continue;
>
> Hmm, can't we use fixed size of statically allocated pages_lru, per-node or
> per-zone ? I think using kmalloc() in memory reclaim path is risky.
Yes. we can enhance it with pagevec-like approach.
It's my TODO list. :)
In compaction POV, it is used by reclaiming big order pages so most of
time order-0 pages are enough. It's basic assumption of compaction so
it shouldn't be a problem.
Thanks for the review, Kame.
--
Kind regards,
Minchan Kim
On Wed, Apr 27, 2011 at 5:03 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 27 Apr 2011 01:25:20 +0900
> Minchan Kim <[email protected]> wrote:
>
>> In some __zone_reclaim case, we don't want to shrink mapped page.
>> Nonetheless, we have isolated mapped page and re-add it into
>> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>>
>> Of course, when we isolate the page, the page might be mapped but
>> when we try to migrate the page, the page would be not mapped.
>> So it could be migrated. But race is rare and although it happens,
>> it's no big deal.
>>
>> Cc: Christoph Lameter <[email protected]>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
>
> Hmm, it seems mm/memcontrol.c::mem_cgroup_isolate_pages() should be updated, too.
>
> But it's okay you start from global LRU.
Yes. That's exactly what I want. :)
I am supposed to consider memcg after concept is approved.
>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Thanks fore the review, Kame.
--
Kind regards,
Minchan Kim
On Wed, Apr 27, 2011 at 5:11 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 27 Apr 2011 01:25:21 +0900
> Minchan Kim <[email protected]> wrote:
>
>> Commonly, putback_lru_page is used with isolated_lru_page.
>> The isolated_lru_page picks the page in middle of LRU and
>> putback_lru_page insert the lru in head of LRU.
>> It means it could make LRU churning so we have to be very careful.
>> Let's clear description of putback_lru_page.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> seems good...
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
>
> But is there consensus which side of LRU is tail? head?
I don't know. I used to think it's head.
If other guys raise a concern as well, let's talk about it. :)
Thanks
> I always need to revisit codes when I see a word head/tail....
>
>
>
--
Kind regards,
Minchan Kim
On Wed, Apr 27, 2011 at 5:15 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 27 Apr 2011 01:25:22 +0900
> Minchan Kim <[email protected]> wrote:
>
>> acct_isolated of compaction uses page_lru_base_type which returns only
>> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
>> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> I think some comments are necessary to explain why INACTIVE only.
As alternative, I can change page_lru_base_type with page_lru.
It's not hot path so I will do.
Thanks, Kame.
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
--
Kind regards,
Minchan Kim
On Wed, Apr 27, 2011 at 5:34 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Wed, 27 Apr 2011 01:25:23 +0900
> Minchan Kim <[email protected]> wrote:
>
>> This patch defines new APIs to putback the page into previous position of LRU.
>> The idea is simple.
>>
>> When we try to putback the page into lru list and if friends(prev, next) of the pages
>> still is nearest neighbor, we can insert isolated page into prev's next instead of
>> head of LRU list. So it keeps LRU history without losing the LRU information.
>>
>> Before :
>> LRU POV : H - P1 - P2 - P3 - P4 -T
>>
>> Isolate P3 :
>> LRU POV : H - P1 - P2 - P4 - T
>>
>> Putback P3 :
>> if (P2->next == P4)
>> putback(P3, P2);
>> So,
>> LRU POV : H - P1 - P2 - P3 - P4 -T
>>
>> For implement, we defines new structure pages_lru which remebers
>> both lru friend pages of isolated one and handling functions.
>>
>> But this approach has a problem on contiguous pages.
>> In this case, my idea can not work since friend pages are isolated, too.
>> It means prev_page->next == next_page always is false and both pages are not
>> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
>> So for solving the problem, I can change the idea.
>> I think we don't need both friend(prev, next) pages relation but
>> just consider either prev or next page that it is still same LRU.
>> Worset case in this approach, prev or next page is free and allocate new
>> so it's in head of LRU and our isolated page is located on next of head.
>> But it's almost same situation with current problem. So it doesn't make worse
>> than now and it would be rare. But in this version, I implement based on idea
>> discussed at LSF/MM. If my new idea makes sense, I will change it.
>>
>
> I think using only 'next'(prev?) pointer will be enough.
I think so but let's wait other's opinion. :)
--
Kind regards,
Minchan Kim
On 04/26/2011 12:25 PM, Minchan Kim wrote:
> But this approach has a problem on contiguous pages.
> In this case, my idea can not work since friend pages are isolated, too.
> It means prev_page->next == next_page always is false and both pages are not
> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
> So for solving the problem, I can change the idea.
> I think we don't need both friend(prev, next) pages relation but
> just consider either prev or next page that it is still same LRU.
> Any comment?
If the friend pages are isolated too, then your condition
"either prev or next page that it is still same LRU" is
likely to be false, no?
--
All rights reversed
Hi Rik,
On Thu, Apr 28, 2011 at 8:46 AM, Rik van Riel <[email protected]> wrote:
> On 04/26/2011 12:25 PM, Minchan Kim wrote:
>
>> But this approach has a problem on contiguous pages.
>> In this case, my idea can not work since friend pages are isolated, too.
>> It means prev_page->next == next_page always is false and both pages are
>> not
>> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
>> So for solving the problem, I can change the idea.
>> I think we don't need both friend(prev, next) pages relation but
>> just consider either prev or next page that it is still same LRU.
>
>> Any comment?
>
> If the friend pages are isolated too, then your condition
> "either prev or next page that it is still same LRU" is
> likely to be false, no?
H - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - T
assume : we isolate pages P3~P7 and we consider only next pointer.
H - P1 - P2 - P8 - P9 - P10 - T
If we start to putback P7 as starting point, next P8 is valid so,
H - P1 - P2 - P7 - P8 - P9 - P10 - T
Then, if we consider P6, next P7 is valid, too. So,
H - P1 - P2 - P6 - P7 - P8 - P9 - P10 - T
continue until P3.
--
Kind regards,
Minchan Kim
On Thu, Apr 28, 2011 at 08:20:32AM +0900, Minchan Kim wrote:
> On Wed, Apr 27, 2011 at 5:11 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Wed, 27 Apr 2011 01:25:21 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> >> Commonly, putback_lru_page is used with isolated_lru_page.
> >> The isolated_lru_page picks the page in middle of LRU and
> >> putback_lru_page insert the lru in head of LRU.
> >> It means it could make LRU churning so we have to be very careful.
> >> Let's clear description of putback_lru_page.
> >>
> >> Cc: KOSAKI Motohiro <[email protected]>
> >> Cc: Mel Gorman <[email protected]>
> >> Cc: Rik van Riel <[email protected]>
> >> Cc: Andrea Arcangeli <[email protected]>
> >> Signed-off-by: Minchan Kim <[email protected]>
> >
> > seems good...
> > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> >
> > But is there consensus which side of LRU is tail? head?
>
> I don't know. I used to think it's head.
> If other guys raise a concern as well, let's talk about it. :)
> Thanks
I suppose we add new pages to the head of the LRU and reclaim old
pages from the tail.
Acked-by: Johannes Weiner <[email protected]>
On Wed, Apr 27, 2011 at 01:25:19AM +0900, Minchan Kim wrote:
> In async mode, compaction doesn't migrate dirty or writeback pages.
> So, it's meaningless to pick the page and re-add it to lru list.
>
> Of course, when we isolate the page in compaction, the page might
> be dirty or writeback but when we try to migrate the page, the page
> would be not dirty, writeback. So it could be migrated. But it's
> very unlikely as isolate and migration cycle is much faster than
> writeout.
>
> So, this patch helps cpu and prevent unnecessary LRU churning.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index dea32e3..9f80b5a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -335,7 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0, 0, 0) != 0)
> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
> continue;
With the suggested flags argument from 1/8, this would look like:
flags = ISOLATE_BOTH;
if (!cc->sync)
flags |= ISOLATE_CLEAN;
?
Anyway, nice change indeed!
Acked-by: Johannes Weiner <[email protected]>
On Wed, Apr 27, 2011 at 05:03:04PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 27 Apr 2011 01:25:20 +0900
> Minchan Kim <[email protected]> wrote:
>
> > In some __zone_reclaim case, we don't want to shrink mapped page.
> > Nonetheless, we have isolated mapped page and re-add it into
> > LRU's head. It's unnecessary CPU overhead and makes LRU churning.
> >
> > Of course, when we isolate the page, the page might be mapped but
> > when we try to migrate the page, the page would be not mapped.
> > So it could be migrated. But race is rare and although it happens,
> > it's no big deal.
> >
> > Cc: Christoph Lameter <[email protected]>
> > Cc: KOSAKI Motohiro <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Cc: Andrea Arcangeli <[email protected]>
> > Signed-off-by: Minchan Kim <[email protected]>
>
>
> Hmm, it seems mm/memcontrol.c::mem_cgroup_isolate_pages() should be updated, too.
memcg reclaim always does sc->may_unmap = 1. What is there to
communicate to mem_cgroup_isolate_pages?
On Wed, Apr 27, 2011 at 01:25:22AM +0900, Minchan Kim wrote:
> acct_isolated of compaction uses page_lru_base_type which returns only
> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> mm/compaction.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9f80b5a..653b02b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -219,8 +219,8 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
> count[lru]++;
> }
>
> - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
> - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
> + cc->nr_anon = count[LRU_INACTIVE_ANON];
> + cc->nr_file = count[LRU_INACTIVE_FILE];
> __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
> __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
> }
I don't see anything using cc->nr_anon and cc->nr_file outside this
code. Would it make sense to remove them completely?
On Thu, Apr 28, 2011 at 08:42:28AM +0900, Minchan Kim wrote:
> On Wed, Apr 27, 2011 at 5:15 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Wed, 27 Apr 2011 01:25:22 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> >> acct_isolated of compaction uses page_lru_base_type which returns only
> >> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> >> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
> >>
> >> Cc: KOSAKI Motohiro <[email protected]>
> >> Cc: Mel Gorman <[email protected]>
> >> Cc: Rik van Riel <[email protected]>
> >> Cc: Andrea Arcangeli <[email protected]>
> >> Signed-off-by: Minchan Kim <[email protected]>
> >
> > I think some comments are necessary to explain why INACTIVE only.
>
> As alternative, I can change page_lru_base_type with page_lru.
> It's not hot path so I will do.
We immediately use those numbers to account
NR_ISOLATED_ANON/NR_ISOLATED_FILE - i.e. 'lru base types', so I think
using page_lru_base_type() makes perfect sense.
On Thu, 28 Apr 2011 10:54:32 +0200
Johannes Weiner <[email protected]> wrote:
> On Wed, Apr 27, 2011 at 05:03:04PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 27 Apr 2011 01:25:20 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> > > In some __zone_reclaim case, we don't want to shrink mapped page.
> > > Nonetheless, we have isolated mapped page and re-add it into
> > > LRU's head. It's unnecessary CPU overhead and makes LRU churning.
> > >
> > > Of course, when we isolate the page, the page might be mapped but
> > > when we try to migrate the page, the page would be not mapped.
> > > So it could be migrated. But race is rare and although it happens,
> > > it's no big deal.
> > >
> > > Cc: Christoph Lameter <[email protected]>
> > > Cc: KOSAKI Motohiro <[email protected]>
> > > Cc: Mel Gorman <[email protected]>
> > > Cc: Rik van Riel <[email protected]>
> > > Cc: Andrea Arcangeli <[email protected]>
> > > Signed-off-by: Minchan Kim <[email protected]>
> >
> >
> > Hmm, it seems mm/memcontrol.c::mem_cgroup_isolate_pages() should be updated, too.
>
> memcg reclaim always does sc->may_unmap = 1. What is there to
> communicate to mem_cgroup_isolate_pages?
>
Hmm, maybe you're right and nothing to do until memcg need to support soft
limit in zone reclaim mode. I hope no more users.
Thanks,
-Kame
On Thu, Apr 28, 2011 at 06:10:46PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 28 Apr 2011 10:54:32 +0200
> Johannes Weiner <[email protected]> wrote:
>
> > On Wed, Apr 27, 2011 at 05:03:04PM +0900, KAMEZAWA Hiroyuki wrote:
> > > On Wed, 27 Apr 2011 01:25:20 +0900
> > > Minchan Kim <[email protected]> wrote:
> > >
> > > > In some __zone_reclaim case, we don't want to shrink mapped page.
> > > > Nonetheless, we have isolated mapped page and re-add it into
> > > > LRU's head. It's unnecessary CPU overhead and makes LRU churning.
> > > >
> > > > Of course, when we isolate the page, the page might be mapped but
> > > > when we try to migrate the page, the page would be not mapped.
> > > > So it could be migrated. But race is rare and although it happens,
> > > > it's no big deal.
> > > >
> > > > Cc: Christoph Lameter <[email protected]>
> > > > Cc: KOSAKI Motohiro <[email protected]>
> > > > Cc: Mel Gorman <[email protected]>
> > > > Cc: Rik van Riel <[email protected]>
> > > > Cc: Andrea Arcangeli <[email protected]>
> > > > Signed-off-by: Minchan Kim <[email protected]>
> > >
> > >
> > > Hmm, it seems mm/memcontrol.c::mem_cgroup_isolate_pages() should be updated, too.
> >
> > memcg reclaim always does sc->may_unmap = 1. What is there to
> > communicate to mem_cgroup_isolate_pages?
> >
>
> Hmm, maybe you're right and nothing to do until memcg need to support soft
> limit in zone reclaim mode. I hope no more users.
Ah, okay. I thought I may have missed something. Thanks for the
clarification, Kame.
Acked-by: Johannes Weiner <[email protected]>
On Wed, Apr 27, 2011 at 01:25:18AM +0900, Minchan Kim wrote:
> There are some places to isolate lru page and I believe
> users of isolate_lru_page will be growing.
> The purpose of them is each different so part of isolated pages
> should put back to LRU, again.
>
> The problem is when we put back the page into LRU,
> we lose LRU ordering and the page is inserted at head of LRU list.
> It makes unnecessary LRU churning so that vm can evict working set pages
> rather than idle pages.
>
> This patch adds new filter mask when we isolate page in LRU.
> So, we don't isolate pages if we can't handle it.
> It could reduce LRU churning.
>
> This patch shouldn't change old behavior.
> It's just used by next patches.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/swap.h | 3 ++-
> mm/compaction.c | 2 +-
> mm/memcontrol.c | 2 +-
> mm/vmscan.c | 26 ++++++++++++++++++++------
> 4 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 384eb5f..baef4ad 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -259,7 +259,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> unsigned int swappiness,
> struct zone *zone,
> unsigned long *nr_scanned);
> -extern int __isolate_lru_page(struct page *page, int mode, int file);
> +extern int __isolate_lru_page(struct page *page, int mode, int file,
> + int not_dirty, int not_mapped);
bool? If you use bitmask later, it's not important though.
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 021a296..dea32e3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -335,7 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0, 0, 0) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c2776f1..471e7fd 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> continue;
>
> scan++;
> - ret = __isolate_lru_page(page, mode, file);
> + ret = __isolate_lru_page(page, mode, file, 0, 0);
> switch (ret) {
> case 0:
> list_move(&page->lru, dst);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b3a569f..71d2da9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -954,10 +954,13 @@ keep_lumpy:
> *
> * page: page to consider
> * mode: one of the LRU isolation modes defined above
> - *
> + * file: page be on a file LRU
> + * not_dirty: page should be not dirty or not writeback
> + * not_mapped: page should be not mapped
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, int mode, int file)
> +int __isolate_lru_page(struct page *page, int mode, int file,
> + int not_dirty, int not_mapped)
> {
> int ret = -EINVAL;
>
> @@ -976,6 +979,12 @@ int __isolate_lru_page(struct page *page, int mode, int file)
> if (mode != ISOLATE_BOTH && page_is_file_cache(page) != file)
> return ret;
>
> + if (not_dirty)
> + if (PageDirty(page) || PageWriteback(page))
> + return ret;
> + if (not_mapped)
> + if (page_mapped(page))
> + return ret;
> /*
> * When this function is being called for lumpy reclaim, we
> * initially look into all LRU pages, active, inactive and
> @@ -1016,12 +1025,15 @@ int __isolate_lru_page(struct page *page, int mode, int file)
> * @order: The caller's attempted allocation order
> * @mode: One of the LRU isolation modes
> * @file: True [1] if isolating file [!anon] pages
> + * @not_dirty: True [1] if isolating file [!dirty] pages
> + * @not_mapped: True [1] if isolating file [!mapped] pages
> *
> * returns how many pages were moved onto *@dst.
> */
> static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> struct list_head *src, struct list_head *dst,
> - unsigned long *scanned, int order, int mode, int file)
> + unsigned long *scanned, int order, int mode, int file,
> + int not_dirty, int not_mapped)
> {
> unsigned long nr_taken = 0;
> unsigned long nr_lumpy_taken = 0;
> @@ -1041,7 +1053,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
> VM_BUG_ON(!PageLRU(page));
>
> - switch (__isolate_lru_page(page, mode, file)) {
> + switch (__isolate_lru_page(page, mode, file,
> + not_dirty, not_mapped)) {
> case 0:
> list_move(&page->lru, dst);
> mem_cgroup_del_lru(page);
> @@ -1100,7 +1113,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> !PageSwapCache(cursor_page))
> break;
>
> - if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> + if (__isolate_lru_page(cursor_page, mode, file,
> + not_dirty, not_mapped) == 0) {
> list_move(&cursor_page->lru, dst);
> mem_cgroup_del_lru(cursor_page);
> nr_taken += hpage_nr_pages(page);
> @@ -1143,7 +1157,7 @@ static unsigned long isolate_pages_global(unsigned long nr,
> if (file)
> lru += LRU_FILE;
> return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, order,
> - mode, file);
> + mode, file, 0, 0);
> }
>
> /*
> --
> 1.7.1
>
--
Mel Gorman
SUSE Labs
On Wed, Apr 27, 2011 at 01:25:19AM +0900, Minchan Kim wrote:
> In async mode, compaction doesn't migrate dirty or writeback pages.
> So, it's meaningless to pick the page and re-add it to lru list.
>
> Of course, when we isolate the page in compaction, the page might
> be dirty or writeback but when we try to migrate the page, the page
> would be not dirty, writeback. So it could be migrated. But it's
> very unlikely as isolate and migration cycle is much faster than
> writeout.
>
> So, this patch helps cpu and prevent unnecessary LRU churning.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
Acked-by: Mel Gorman <[email protected]>
--
Mel Gorman
SUSE Labs
On Wed, Apr 27, 2011 at 01:25:20AM +0900, Minchan Kim wrote:
> In some __zone_reclaim case, we don't want to shrink mapped page.
> Nonetheless, we have isolated mapped page and re-add it into
> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>
> Of course, when we isolate the page, the page might be mapped but
> when we try to migrate the page, the page would be not mapped.
> So it could be migrated. But race is rare and although it happens,
> it's no big deal.
>
> Cc: Christoph Lameter <[email protected]>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> index 71d2da9..e8d6190 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1147,7 +1147,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
> static unsigned long isolate_pages_global(unsigned long nr,
> struct list_head *dst,
> - unsigned long *scanned, int order,
> + unsigned long *scanned,
> + struct scan_control *sc,
> int mode, struct zone *z,
> int active, int file)
> {
> @@ -1156,8 +1157,8 @@ static unsigned long isolate_pages_global(unsigned long nr,
> lru += LRU_ACTIVE;
> if (file)
> lru += LRU_FILE;
> - return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, order,
> - mode, file, 0, 0);
> + return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, sc->order,
> + mode, file, 0, !sc->may_unmap);
> }
>
Why not take may_writepage into account for dirty pages?
> /*
> @@ -1407,7 +1408,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
>
> if (scanning_global_lru(sc)) {
> nr_taken = isolate_pages_global(nr_to_scan,
> - &page_list, &nr_scanned, sc->order,
> + &page_list, &nr_scanned, sc,
> sc->reclaim_mode & RECLAIM_MODE_LUMPYRECLAIM ?
> ISOLATE_BOTH : ISOLATE_INACTIVE,
> zone, 0, file);
> @@ -1531,7 +1532,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
> spin_lock_irq(&zone->lru_lock);
> if (scanning_global_lru(sc)) {
> nr_taken = isolate_pages_global(nr_pages, &l_hold,
> - &pgscanned, sc->order,
> + &pgscanned, sc,
> ISOLATE_ACTIVE, zone,
> 1, file);
> zone->pages_scanned += pgscanned;
> --
> 1.7.1
>
--
Mel Gorman
SUSE Labs
On Wed, Apr 27, 2011 at 01:25:22AM +0900, Minchan Kim wrote:
> acct_isolated of compaction uses page_lru_base_type which returns only
> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
hmm, isolate_migratepages() is doing a linear scan of PFNs and is
calling __isolate_lru_page(..ISOLATE_BOTH..). Using page_lru_base_type
happens to work because we're only interested in the number of isolated
pages and your patch still covers that. Using page_lru might be more
accurate in terms of accountancy but does not seem necessary.
Adding a comment explaining why we account for it as inactive and why
that's ok would be nice although I admit this is something I should have
done when acct_isolated() was introduced.
--
Mel Gorman
SUSE Labs
On Wed, Apr 27, 2011 at 01:25:23AM +0900, Minchan Kim wrote:
> This patch defines new APIs to putback the page into previous position of LRU.
> The idea is simple.
>
> When we try to putback the page into lru list and if friends(prev, next) of the pages
> still is nearest neighbor, we can insert isolated page into prev's next instead of
> head of LRU list. So it keeps LRU history without losing the LRU information.
>
> Before :
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> Isolate P3 :
> LRU POV : H - P1 - P2 - P4 - T
>
> Putback P3 :
> if (P2->next == P4)
> putback(P3, P2);
> So,
> LRU POV : H - P1 - P2 - P3 - P4 -T
>
> For implement, we defines new structure pages_lru which remebers
> both lru friend pages of isolated one and handling functions.
>
> But this approach has a problem on contiguous pages.
> In this case, my idea can not work since friend pages are isolated, too.
> It means prev_page->next == next_page always is false and both pages are not
> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
> So for solving the problem, I can change the idea.
> I think we don't need both friend(prev, next) pages relation but
> just consider either prev or next page that it is still same LRU.
> Worset case in this approach, prev or next page is free and allocate new
> so it's in head of LRU and our isolated page is located on next of head.
> But it's almost same situation with current problem. So it doesn't make worse
> than now and it would be rare. But in this version, I implement based on idea
> discussed at LSF/MM. If my new idea makes sense, I will change it.
>
> Any comment?
>
> Cc: KOSAKI Motohiro <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/migrate.h | 2 +
> include/linux/mm_types.h | 7 ++++
> include/linux/swap.h | 4 ++-
> mm/compaction.c | 3 +-
> mm/internal.h | 2 +
> mm/memcontrol.c | 2 +-
> mm/migrate.c | 36 +++++++++++++++++++++
> mm/swap.c | 2 +-
> mm/vmscan.c | 79 +++++++++++++++++++++++++++++++++++++++++++--
> 9 files changed, 129 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index e39aeec..3aa5ab6 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
> #ifdef CONFIG_MIGRATION
> #define PAGE_MIGRATION 1
>
> +extern void putback_pages_lru(struct list_head *l);
> extern void putback_lru_pages(struct list_head *l);
> extern int migrate_page(struct address_space *,
> struct page *, struct page *);
> @@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
> #else
> #define PAGE_MIGRATION 0
>
> +static inline void putback_pages_lru(struct list_head *l) {}
> static inline void putback_lru_pages(struct list_head *l) {}
> static inline int migrate_pages(struct list_head *l, new_page_t x,
> unsigned long private, bool offlining,
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index ca01ab2..35e80fb 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -102,6 +102,13 @@ struct page {
> #endif
> };
>
> +/* This structure is used for keeping LRU ordering of isolated page */
> +struct pages_lru {
> + struct page *page; /* isolated page */
> + struct page *prev_page; /* previous page of isolate page as LRU order */
> + struct page *next_page; /* next page of isolate page as LRU order */
> + struct list_head lru;
> +};
> /*
So this thing has to be allocated from somewhere. We can't put it
on the stack as we're already in danger there so we must be using
kmalloc. In the reclaim paths, this should be avoided obviously.
For compaction, we might hurt the compaction success rates if pages
are pinned with control structures. It's something to be wary of.
At LSF/MM, I stated a preference for swapping the source and
destination pages in the LRU. This unfortunately means that the LRU
now contains a page in the process of being migrated to and the backout
paths for migration failure become a lot more complex. Reclaim should
be ok as it'll should fail to lock the page and recycle it in the list.
This avoids allocations but I freely admit that I'm not in the position
to implement such a thing right now :(
> * A region containing a mapping of a non-memory backed file under NOMMU
> * conditions. These are held in a global tree and are pinned by the VMAs that
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index baef4ad..4ad0a0c 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *page);
> extern void deactivate_page(struct page *page);
> extern void swap_setup(void);
>
> +extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
> + int file, int rotated);
> extern void add_page_to_unevictable_list(struct page *page);
>
> /**
> @@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> struct zone *zone,
> unsigned long *nr_scanned);
> extern int __isolate_lru_page(struct page *page, int mode, int file,
> - int not_dirty, int not_mapped);
> + int not_dirty, int not_mapped, struct pages_lru *pages_lru);
> extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 653b02b..c453000 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
> }
>
> /* Try isolate the page */
> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0,
> + !cc->sync, 0, NULL) != 0)
> continue;
>
> VM_BUG_ON(PageTransCompound(page));
> diff --git a/mm/internal.h b/mm/internal.h
> index d071d38..3c8182c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn;
> * in mm/vmscan.c:
> */
> extern int isolate_lru_page(struct page *page);
> +extern bool keep_lru_order(struct pages_lru *pages_lru);
> +extern void putback_page_to_lru(struct page *page, struct list_head *head);
> extern void putback_lru_page(struct page *page);
>
> /*
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 471e7fd..92a9046 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
> continue;
>
> scan++;
> - ret = __isolate_lru_page(page, mode, file, 0, 0);
> + ret = __isolate_lru_page(page, mode, file, 0, 0, NULL);
> switch (ret) {
> case 0:
> list_move(&page->lru, dst);
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 819d233..9cfb63b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l)
> }
>
> /*
> + * This function is almost same iwth putback_lru_pages.
> + * The difference is that function receives struct pages_lru list
> + * and if possible, we add pages into original position of LRU
> + * instead of LRU's head.
> + */
> +void putback_pages_lru(struct list_head *l)
> +{
> + struct pages_lru *isolated_page;
> + struct pages_lru *isolated_page2;
> + struct page *page;
> +
> + list_for_each_entry_safe(isolated_page, isolated_page2, l, lru) {
> + struct zone *zone;
> + page = isolated_page->page;
> + list_del(&isolated_page->lru);
> +
> + dec_zone_page_state(page, NR_ISOLATED_ANON +
> + page_is_file_cache(page));
> +
> + zone = page_zone(page);
> + spin_lock_irq(&zone->lru_lock);
> + if (keep_lru_order(isolated_page)) {
> + putback_page_to_lru(page, &isolated_page->prev_page->lru);
> + spin_unlock_irq(&zone->lru_lock);
> + }
> + else {
> + spin_unlock_irq(&zone->lru_lock);
> + putback_lru_page(page);
> + }
> +
> + kfree(isolated_page);
> + }
> +}
I think we also need counters at least at discussion stage to see
how successful this is.
For example, early in the system there is a casual relationship
between the age of a page and its location in physical memory. The
buddy allocator gives pages back in PFN order where possible and
there is a loose relationship between when pages get allocated and
when they get reclaimed. As compaction is a linear scanner, there
is a likelihood (that is highly variable) that physically contiguous
pages will have similar positions in the LRU. They'll be isolated at
the same time meaning they also won't be put back in order.
This might cease to matter when the system is running for some time but
it's a concern.
> +
> +
> +/*
> * Restore a potential migration pte to a working pte entry
> */
> static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
> diff --git a/mm/swap.c b/mm/swap.c
> index a83ec5a..0cb15b7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -252,7 +252,7 @@ void rotate_reclaimable_page(struct page *page)
> }
> }
>
> -static void update_page_reclaim_stat(struct zone *zone, struct page *page,
> +void update_page_reclaim_stat(struct zone *zone, struct page *page,
> int file, int rotated)
> {
> struct zone_reclaim_stat *reclaim_stat = &zone->reclaim_stat;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5196f0c..06a7c9b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -550,6 +550,58 @@ int remove_mapping(struct address_space *mapping, struct page *page)
> return 0;
> }
>
> +/* zone->lru_lock must be hold */
> +bool keep_lru_order(struct pages_lru *pages_lru)
> +{
> + bool ret = false;
> + struct page *prev, *next;
> +
> + if (!pages_lru->prev_page)
> + return ret;
> +
> + prev = pages_lru->prev_page;
> + next = pages_lru->next_page;
> +
> + if (!PageLRU(prev) || !PageLRU(next))
> + return ret;
> +
> + if (prev->lru.next == &next->lru)
> + ret = true;
> +
> + if (unlikely(PageUnevictable(prev)))
> + ret = false;
> +
> + return ret;
> +}
Some whitespace issues there. There are a few formatting issues in the
patch but it's not the right time to worry about them.
> +
> +/**
> + * putback_page_to_lru - put isolated @page onto @head
> + * @page: page to be put back to appropriate lru list
> + * @head: lru position to be put back
> + *
> + * Insert previously isolated @page to appropriate position of lru list
> + * zone->lru_lock must be hold.
> + */
> +void putback_page_to_lru(struct page *page, struct list_head *head)
> +{
> + int lru, active, file;
> + struct zone *zone = page_zone(page);
> + struct page *prev_page = container_of(head, struct page, lru);
> +
> + lru = page_lru(prev_page);
> + active = is_active_lru(lru);
> + file = is_file_lru(lru);
> +
> + if (active)
> + SetPageActive(page);
> + else
> + ClearPageActive(page);
> +
> + update_page_reclaim_stat(zone, page, file, active);
> + SetPageLRU(page);
> + __add_page_to_lru_list(zone, page, lru, head);
> +}
> +
> /**
> * putback_lru_page - put previously isolated page onto appropriate LRU list's head
> * @page: page to be put back to appropriate lru list
> @@ -959,8 +1011,8 @@ keep_lumpy:
> * not_mapped: page should be not mapped
> * returns 0 on success, -ve errno on failure.
> */
> -int __isolate_lru_page(struct page *page, int mode, int file,
> - int not_dirty, int not_mapped)
> +int __isolate_lru_page(struct page *page, int mode, int file, int not_dirty,
> + int not_mapped, struct pages_lru *pages_lru)
> {
> int ret = -EINVAL;
>
> @@ -996,12 +1048,31 @@ int __isolate_lru_page(struct page *page, int mode, int file,
> ret = -EBUSY;
>
> if (likely(get_page_unless_zero(page))) {
> + struct zone *zone = page_zone(page);
> + enum lru_list l = page_lru(page);
> /*
> * Be careful not to clear PageLRU until after we're
> * sure the page is not being freed elsewhere -- the
> * page release code relies on it.
> */
> ClearPageLRU(page);
> +
> + if (!pages_lru)
> + goto skip;
> +
> + pages_lru->page = page;
> + if (&zone->lru[l].list == pages_lru->lru.prev ||
> + &zone->lru[l].list == pages_lru->lru.next) {
> + pages_lru->prev_page = NULL;
> + pages_lru->next_page = NULL;
> + goto skip;
> + }
> +
> + pages_lru->prev_page =
> + list_entry(page->lru.prev, struct page, lru);
> + pages_lru->next_page =
> + list_entry(page->lru.next, struct page, lru);
> +skip:
> ret = 0;
> }
>
> @@ -1054,7 +1125,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> VM_BUG_ON(!PageLRU(page));
>
> switch (__isolate_lru_page(page, mode, file,
> - not_dirty, not_mapped)) {
> + not_dirty, not_mapped, NULL)) {
> case 0:
> list_move(&page->lru, dst);
> mem_cgroup_del_lru(page);
> @@ -1114,7 +1185,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> break;
>
> if (__isolate_lru_page(cursor_page, mode, file,
> - not_dirty, not_mapped) == 0) {
> + not_dirty, not_mapped, NULL) == 0) {
> list_move(&cursor_page->lru, dst);
> mem_cgroup_del_lru(cursor_page);
> nr_taken += hpage_nr_pages(page);
> --
> 1.7.1
>
--
Mel Gorman
SUSE Labs
Hi Hannes,
On Thu, Apr 28, 2011 at 5:48 PM, Johannes Weiner <[email protected]> wrote:
> On Wed, Apr 27, 2011 at 01:25:19AM +0900, Minchan Kim wrote:
>> In async mode, compaction doesn't migrate dirty or writeback pages.
>> So, it's meaningless to pick the page and re-add it to lru list.
>>
>> Of course, when we isolate the page in compaction, the page might
>> be dirty or writeback but when we try to migrate the page, the page
>> would be not dirty, writeback. So it could be migrated. But it's
>> very unlikely as isolate and migration cycle is much faster than
>> writeout.
>>
>> So, this patch helps cpu and prevent unnecessary LRU churning.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> mm/compaction.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index dea32e3..9f80b5a 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -335,7 +335,7 @@ static unsigned long isolate_migratepages(struct zone *zone,
>> }
>>
>> /* Try isolate the page */
>> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0, 0, 0) != 0)
>> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
>> continue;
>
> With the suggested flags argument from 1/8, this would look like:
>
> flags = ISOLATE_BOTH;
> if (!cc->sync)
> flags |= ISOLATE_CLEAN;
>
> ?
Yes. I will change it.
>
> Anyway, nice change indeed!
Thanks!
--
Kind regards,
Minchan Kim
On Thu, Apr 28, 2011 at 7:35 PM, Mel Gorman <[email protected]> wrote:
> On Wed, Apr 27, 2011 at 01:25:20AM +0900, Minchan Kim wrote:
>> In some __zone_reclaim case, we don't want to shrink mapped page.
>> Nonetheless, we have isolated mapped page and re-add it into
>> LRU's head. It's unnecessary CPU overhead and makes LRU churning.
>>
>> Of course, when we isolate the page, the page might be mapped but
>> when we try to migrate the page, the page would be not mapped.
>> So it could be migrated. But race is rare and although it happens,
>> it's no big deal.
>>
>> Cc: Christoph Lameter <[email protected]>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
>> index 71d2da9..e8d6190 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1147,7 +1147,8 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>
>> static unsigned long isolate_pages_global(unsigned long nr,
>> struct list_head *dst,
>> - unsigned long *scanned, int order,
>> + unsigned long *scanned,
>> + struct scan_control *sc,
>> int mode, struct zone *z,
>> int active, int file)
>> {
>> @@ -1156,8 +1157,8 @@ static unsigned long isolate_pages_global(unsigned long nr,
>> lru += LRU_ACTIVE;
>> if (file)
>> lru += LRU_FILE;
>> - return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, order,
>> - mode, file, 0, 0);
>> + return isolate_lru_pages(nr, &z->lru[lru].list, dst, scanned, sc->order,
>> + mode, file, 0, !sc->may_unmap);
>> }
>>
>
> Why not take may_writepage into account for dirty pages?
I missed it.
I will consider it in next version.
Thanks, Mel.
--
Kind regards,
Minchan Kim
On Thu, Apr 28, 2011 at 5:58 PM, Johannes Weiner <[email protected]> wrote:
> On Wed, Apr 27, 2011 at 01:25:22AM +0900, Minchan Kim wrote:
>> acct_isolated of compaction uses page_lru_base_type which returns only
>> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
>> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> mm/compaction.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 9f80b5a..653b02b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -219,8 +219,8 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
>> count[lru]++;
>> }
>>
>> - cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
>> - cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
>> + cc->nr_anon = count[LRU_INACTIVE_ANON];
>> + cc->nr_file = count[LRU_INACTIVE_FILE];
>> __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
>> __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
>> }
>
> I don't see anything using cc->nr_anon and cc->nr_file outside this
> code. Would it make sense to remove them completely?
>
Good idea.
I will remove it totally in next version.
Thanks, Hannes.
--
Kind regards,
Minchan Kim
On Thu, Apr 28, 2011 at 7:50 PM, Mel Gorman <[email protected]> wrote:
> On Wed, Apr 27, 2011 at 01:25:22AM +0900, Minchan Kim wrote:
>> acct_isolated of compaction uses page_lru_base_type which returns only
>> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
>> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>
> hmm, isolate_migratepages() is doing a linear scan of PFNs and is
> calling __isolate_lru_page(..ISOLATE_BOTH..). Using page_lru_base_type
> happens to work because we're only interested in the number of isolated
> pages and your patch still covers that. Using page_lru might be more
> accurate in terms of accountancy but does not seem necessary.
True.
>
> Adding a comment explaining why we account for it as inactive and why
> that's ok would be nice although I admit this is something I should have
> done when acct_isolated() was introduced.
When Kame pointed out comment, I wanted to avoid unnecessary comment
so decided changing it with page_lru although it adds overhead a
little bit. But Hannes, you and maybe Kame don't want it. I don't mind
adding comment.
Okay. fix it in next version.
Thanks.
--
Kind regards,
Minchan Kim
On Thu, Apr 28, 2011 at 8:06 PM, Mel Gorman <[email protected]> wrote:
> On Wed, Apr 27, 2011 at 01:25:23AM +0900, Minchan Kim wrote:
>> This patch defines new APIs to putback the page into previous position of LRU.
>> The idea is simple.
>>
>> When we try to putback the page into lru list and if friends(prev, next) of the pages
>> still is nearest neighbor, we can insert isolated page into prev's next instead of
>> head of LRU list. So it keeps LRU history without losing the LRU information.
>>
>> Before :
>> LRU POV : H - P1 - P2 - P3 - P4 -T
>>
>> Isolate P3 :
>> LRU POV : H - P1 - P2 - P4 - T
>>
>> Putback P3 :
>> if (P2->next == P4)
>> putback(P3, P2);
>> So,
>> LRU POV : H - P1 - P2 - P3 - P4 -T
>>
>> For implement, we defines new structure pages_lru which remebers
>> both lru friend pages of isolated one and handling functions.
>>
>> But this approach has a problem on contiguous pages.
>> In this case, my idea can not work since friend pages are isolated, too.
>> It means prev_page->next == next_page always is false and both pages are not
>> LRU any more at that time. It's pointed out by Rik at LSF/MM summit.
>> So for solving the problem, I can change the idea.
>> I think we don't need both friend(prev, next) pages relation but
>> just consider either prev or next page that it is still same LRU.
>> Worset case in this approach, prev or next page is free and allocate new
>> so it's in head of LRU and our isolated page is located on next of head.
>> But it's almost same situation with current problem. So it doesn't make worse
>> than now and it would be rare. But in this version, I implement based on idea
>> discussed at LSF/MM. If my new idea makes sense, I will change it.
>>
>> Any comment?
>>
>> Cc: KOSAKI Motohiro <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Rik van Riel <[email protected]>
>> Cc: Andrea Arcangeli <[email protected]>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> include/linux/migrate.h | 2 +
>> include/linux/mm_types.h | 7 ++++
>> include/linux/swap.h | 4 ++-
>> mm/compaction.c | 3 +-
>> mm/internal.h | 2 +
>> mm/memcontrol.c | 2 +-
>> mm/migrate.c | 36 +++++++++++++++++++++
>> mm/swap.c | 2 +-
>> mm/vmscan.c | 79 +++++++++++++++++++++++++++++++++++++++++++--
>> 9 files changed, 129 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>> index e39aeec..3aa5ab6 100644
>> --- a/include/linux/migrate.h
>> +++ b/include/linux/migrate.h
>> @@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
>> #ifdef CONFIG_MIGRATION
>> #define PAGE_MIGRATION 1
>>
>> +extern void putback_pages_lru(struct list_head *l);
>> extern void putback_lru_pages(struct list_head *l);
>> extern int migrate_page(struct address_space *,
>> struct page *, struct page *);
>> @@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>> #else
>> #define PAGE_MIGRATION 0
>>
>> +static inline void putback_pages_lru(struct list_head *l) {}
>> static inline void putback_lru_pages(struct list_head *l) {}
>> static inline int migrate_pages(struct list_head *l, new_page_t x,
>> unsigned long private, bool offlining,
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index ca01ab2..35e80fb 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -102,6 +102,13 @@ struct page {
>> #endif
>> };
>>
>> +/* This structure is used for keeping LRU ordering of isolated page */
>> +struct pages_lru {
>> + struct page *page; /* isolated page */
>> + struct page *prev_page; /* previous page of isolate page as LRU order */
>> + struct page *next_page; /* next page of isolate page as LRU order */
>> + struct list_head lru;
>> +};
>> /*
>
> So this thing has to be allocated from somewhere. We can't put it
> on the stack as we're already in danger there so we must be using
> kmalloc. In the reclaim paths, this should be avoided obviously.
> For compaction, we might hurt the compaction success rates if pages
> are pinned with control structures. It's something to be wary of.
Actually, 32 page unit of migration in compaction should be not a big problem.
Maybe, just one page is enough to keep pages_lru arrays but I admit
dynamic allocation in reclaim patch and pinning the page in compaction
path isn't good.
So I have thought pagevec-like approach which is static allocated.
But the approach should be assume the migration unit should be small
like SWAP_CLUSTER_MAX It's true now but actually I don't want to
depends on the size.
But for free the size limitation, we have to allocate new page dynamically,
Hmm, any ideas?
Personally, I don't have any idea other than pagevec-like approach of 32 pages.
>
> At LSF/MM, I stated a preference for swapping the source and
> destination pages in the LRU. This unfortunately means that the LRU
I can't understand your point. swapping source and destination?
Could you elaborate on it?
> now contains a page in the process of being migrated to and the backout
> paths for migration failure become a lot more complex. Reclaim should
> be ok as it'll should fail to lock the page and recycle it in the list.
> This avoids allocations but I freely admit that I'm not in the position
> to implement such a thing right now :(
>
>> * A region containing a mapping of a non-memory backed file under NOMMU
>> * conditions. These are held in a global tree and are pinned by the VMAs that
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index baef4ad..4ad0a0c 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *page);
>> extern void deactivate_page(struct page *page);
>> extern void swap_setup(void);
>>
>> +extern void update_page_reclaim_stat(struct zone *zone, struct page *page,
>> + int file, int rotated);
>> extern void add_page_to_unevictable_list(struct page *page);
>>
>> /**
>> @@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>> struct zone *zone,
>> unsigned long *nr_scanned);
>> extern int __isolate_lru_page(struct page *page, int mode, int file,
>> - int not_dirty, int not_mapped);
>> + int not_dirty, int not_mapped, struct pages_lru *pages_lru);
>> extern unsigned long shrink_all_memory(unsigned long nr_pages);
>> extern int vm_swappiness;
>> extern int remove_mapping(struct address_space *mapping, struct page *page);
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 653b02b..c453000 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zone *zone,
>> }
>>
>> /* Try isolate the page */
>> - if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0)
>> + if (__isolate_lru_page(page, ISOLATE_BOTH, 0,
>> + !cc->sync, 0, NULL) != 0)
>> continue;
>>
>> VM_BUG_ON(PageTransCompound(page));
>> diff --git a/mm/internal.h b/mm/internal.h
>> index d071d38..3c8182c 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn;
>> * in mm/vmscan.c:
>> */
>> extern int isolate_lru_page(struct page *page);
>> +extern bool keep_lru_order(struct pages_lru *pages_lru);
>> +extern void putback_page_to_lru(struct page *page, struct list_head *head);
>> extern void putback_lru_page(struct page *page);
>>
>> /*
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 471e7fd..92a9046 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
>> continue;
>>
>> scan++;
>> - ret = __isolate_lru_page(page, mode, file, 0, 0);
>> + ret = __isolate_lru_page(page, mode, file, 0, 0, NULL);
>> switch (ret) {
>> case 0:
>> list_move(&page->lru, dst);
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 819d233..9cfb63b 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l)
>> }
>>
>> /*
>> + * This function is almost same iwth putback_lru_pages.
>> + * The difference is that function receives struct pages_lru list
>> + * and if possible, we add pages into original position of LRU
>> + * instead of LRU's head.
>> + */
>> +void putback_pages_lru(struct list_head *l)
>> +{
>> + struct pages_lru *isolated_page;
>> + struct pages_lru *isolated_page2;
>> + struct page *page;
>> +
>> + list_for_each_entry_safe(isolated_page, isolated_page2, l, lru) {
>> + struct zone *zone;
>> + page = isolated_page->page;
>> + list_del(&isolated_page->lru);
>> +
>> + dec_zone_page_state(page, NR_ISOLATED_ANON +
>> + page_is_file_cache(page));
>> +
>> + zone = page_zone(page);
>> + spin_lock_irq(&zone->lru_lock);
>> + if (keep_lru_order(isolated_page)) {
>> + putback_page_to_lru(page, &isolated_page->prev_page->lru);
>> + spin_unlock_irq(&zone->lru_lock);
>> + }
>> + else {
>> + spin_unlock_irq(&zone->lru_lock);
>> + putback_lru_page(page);
>> + }
>> +
>> + kfree(isolated_page);
>> + }
>> +}
>
> I think we also need counters at least at discussion stage to see
> how successful this is.
Actually, I had some numbers but don't published it.
(4core, 1G DRAM and THP always, big stress(qemu, compile, web
browsing, eclipse and other programs fork, successful rate is about
50%)
That's because I have another idea to solve contiguous PFN problem.
Please, see the description and reply of Rik's question on this thread.
I guess the approach could make a high successful rate if there isn't
concurrent direct compaction processes.
>
> For example, early in the system there is a casual relationship
> between the age of a page and its location in physical memory. The
> buddy allocator gives pages back in PFN order where possible and
> there is a loose relationship between when pages get allocated and
> when they get reclaimed. As compaction is a linear scanner, there
> is a likelihood (that is highly variable) that physically contiguous
> pages will have similar positions in the LRU. They'll be isolated at
> the same time meaning they also won't be put back in order.
Indeed! but I hope my second idea would solve the problem.
What do you think about it?
>
> This might cease to matter when the system is running for some time but
> it's a concern.
Yes. It depends on aging of workloads.
I think it's not easy to get a meaningful number to justify all. :(
--
Kind regards,
Minchan Kim
> > With the suggested flags argument from 1/8, this would look like:
> >
> > flags = ISOLATE_BOTH;
> > if (!cc->sync)
> > flags |= ISOLATE_CLEAN;
> >
> > ?
>
> Yes. I will change it.
>
> >
> > Anyway, nice change indeed!
>
> Thanks!
Yeah. That's very nice.
Reviewed-by: KOSAKI Motohiro <[email protected]>
> On Thu, Apr 28, 2011 at 08:20:32AM +0900, Minchan Kim wrote:
> > On Wed, Apr 27, 2011 at 5:11 PM, KAMEZAWA Hiroyuki
> > <[email protected]> wrote:
> > > On Wed, 27 Apr 2011 01:25:21 +0900
> > > Minchan Kim <[email protected]> wrote:
> > >
> > >> Commonly, putback_lru_page is used with isolated_lru_page.
> > >> The isolated_lru_page picks the page in middle of LRU and
> > >> putback_lru_page insert the lru in head of LRU.
> > >> It means it could make LRU churning so we have to be very careful.
> > >> Let's clear description of putback_lru_page.
> > >>
> > >> Cc: KOSAKI Motohiro <[email protected]>
> > >> Cc: Mel Gorman <[email protected]>
> > >> Cc: Rik van Riel <[email protected]>
> > >> Cc: Andrea Arcangeli <[email protected]>
> > >> Signed-off-by: Minchan Kim <[email protected]>
> > >
> > > seems good...
> > > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> > >
> > > But is there consensus which side of LRU is tail? head?
> >
> > I don't know. I used to think it's head.
> > If other guys raise a concern as well, let's talk about it. :)
> > Thanks
>
> I suppose we add new pages to the head of the LRU and reclaim old
> pages from the tail.
>
> Acked-by: Johannes Weiner <[email protected]>
So, It would be better if isolate_lru_page() also have "LRU tail blah blah blah"
comments.
anyway,
Reviewed-by: KOSAKI Motohiro <[email protected]>
> On Thu, Apr 28, 2011 at 7:50 PM, Mel Gorman <[email protected]> wrote:
> > On Wed, Apr 27, 2011 at 01:25:22AM +0900, Minchan Kim wrote:
> >> acct_isolated of compaction uses page_lru_base_type which returns only
> >> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
> >> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
> >>
> >> Cc: KOSAKI Motohiro <[email protected]>
> >> Cc: Mel Gorman <[email protected]>
> >> Cc: Rik van Riel <[email protected]>
> >> Cc: Andrea Arcangeli <[email protected]>
> >> Signed-off-by: Minchan Kim <[email protected]>
> >
> > hmm, isolate_migratepages() is doing a linear scan of PFNs and is
> > calling __isolate_lru_page(..ISOLATE_BOTH..). Using page_lru_base_type
> > happens to work because we're only interested in the number of isolated
> > pages and your patch still covers that. Using page_lru might be more
> > accurate in terms of accountancy but does not seem necessary.
>
> True.
>
> >
> > Adding a comment explaining why we account for it as inactive and why
> > that's ok would be nice although I admit this is something I should have
> > done when acct_isolated() was introduced.
>
> When Kame pointed out comment, I wanted to avoid unnecessary comment
> so decided changing it with page_lru although it adds overhead a
> little bit. But Hannes, you and maybe Kame don't want it. I don't mind
> adding comment.
> Okay. fix it in next version.
Or
unsigned int count[2];
list_for_each_entry(page, &cc->migratepages, lru) {
count[page_is_file_cache(page)]++;
}
is also clear to me.
> > +/* This structure is used for keeping LRU ordering of isolated page */
> > +struct pages_lru {
> > + struct page *page; /* isolated page */
> > + struct page *prev_page; /* previous page of isolate page as LRU order */
> > + struct page *next_page; /* next page of isolate page as LRU order */
> > + struct list_head lru;
> > +};
> > /*
>
> So this thing has to be allocated from somewhere. We can't put it
> on the stack as we're already in danger there so we must be using
> kmalloc. In the reclaim paths, this should be avoided obviously.
> For compaction, we might hurt the compaction success rates if pages
> are pinned with control structures. It's something to be wary of.
>
> At LSF/MM, I stated a preference for swapping the source and
> destination pages in the LRU. This unfortunately means that the LRU
> now contains a page in the process of being migrated to and the backout
> paths for migration failure become a lot more complex. Reclaim should
> be ok as it'll should fail to lock the page and recycle it in the list.
> This avoids allocations but I freely admit that I'm not in the position
> to implement such a thing right now :(
I like swaping to fake page. one way pointer might become dangerous. vmscan can
detect fake page and ignore it.
ie,
is_fake_page(page)
{
if (is_stack_addr((void*)page))
return true;
return false;
}
Also, I like to use stack rather than kmalloc in compaction.
On Sun, May 1, 2011 at 10:19 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> On Thu, Apr 28, 2011 at 7:50 PM, Mel Gorman <[email protected]> wrote:
>> > On Wed, Apr 27, 2011 at 01:25:22AM +0900, Minchan Kim wrote:
>> >> acct_isolated of compaction uses page_lru_base_type which returns only
>> >> base type of LRU list so it never returns LRU_ACTIVE_ANON or LRU_ACTIVE_FILE.
>> >> So it's pointless to add lru[LRU_ACTIVE_[ANON|FILE]] to get sum.
>> >>
>> >> Cc: KOSAKI Motohiro <[email protected]>
>> >> Cc: Mel Gorman <[email protected]>
>> >> Cc: Rik van Riel <[email protected]>
>> >> Cc: Andrea Arcangeli <[email protected]>
>> >> Signed-off-by: Minchan Kim <[email protected]>
>> >
>> > hmm, isolate_migratepages() is doing a linear scan of PFNs and is
>> > calling __isolate_lru_page(..ISOLATE_BOTH..). Using page_lru_base_type
>> > happens to work because we're only interested in the number of isolated
>> > pages and your patch still covers that. Using page_lru might be more
>> > accurate in terms of accountancy but does not seem necessary.
>>
>> True.
>>
>> >
>> > Adding a comment explaining why we account for it as inactive and why
>> > that's ok would be nice although I admit this is something I should have
>> > done when acct_isolated() was introduced.
>>
>> When Kame pointed out comment, I wanted to avoid unnecessary comment
>> so decided changing it with page_lru although it adds overhead a
>> little bit. But Hannes, you and maybe Kame don't want it. I don't mind
>> adding comment.
>> Okay. fix it in next version.
>
> Or
>
> unsigned int count[2];
>
> list_for_each_entry(page, &cc->migratepages, lru) {
> count[page_is_file_cache(page)]++;
> }
>
> is also clear to me.
>
That's very clear to me, too.
Thanks, KOSAKI.
--
Kind regards,
Minchan Kim
On Sun, May 1, 2011 at 10:13 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> On Thu, Apr 28, 2011 at 08:20:32AM +0900, Minchan Kim wrote:
>> > On Wed, Apr 27, 2011 at 5:11 PM, KAMEZAWA Hiroyuki
>> > <[email protected]> wrote:
>> > > On Wed, 27 Apr 2011 01:25:21 +0900
>> > > Minchan Kim <[email protected]> wrote:
>> > >
>> > >> Commonly, putback_lru_page is used with isolated_lru_page.
>> > >> The isolated_lru_page picks the page in middle of LRU and
>> > >> putback_lru_page insert the lru in head of LRU.
>> > >> It means it could make LRU churning so we have to be very careful.
>> > >> Let's clear description of putback_lru_page.
>> > >>
>> > >> Cc: KOSAKI Motohiro <[email protected]>
>> > >> Cc: Mel Gorman <[email protected]>
>> > >> Cc: Rik van Riel <[email protected]>
>> > >> Cc: Andrea Arcangeli <[email protected]>
>> > >> Signed-off-by: Minchan Kim <[email protected]>
>> > >
>> > > seems good...
>> > > Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
>> > >
>> > > But is there consensus which side of LRU is tail? head?
>> >
>> > I don't know. I used to think it's head.
>> > If other guys raise a concern as well, let's talk about it. :)
>> > Thanks
>>
>> I suppose we add new pages to the head of the LRU and reclaim old
>> pages from the tail.
>>
>> Acked-by: Johannes Weiner <[email protected]>
>
> So, It would be better if isolate_lru_page() also have "LRU tail blah blah blah"
> comments.
Okay. I will try it.
>
> anyway,
> Reviewed-by: KOSAKI Motohiro <[email protected]>
>
Thanks.
--
Kind regards,
Minchan Kim
On Sun, May 1, 2011 at 10:47 PM, KOSAKI Motohiro
<[email protected]> wrote:
>> > +/* This structure is used for keeping LRU ordering of isolated page */
>> > +struct pages_lru {
>> > + struct page *page; /* isolated page */
>> > + struct page *prev_page; /* previous page of isolate page as LRU order */
>> > + struct page *next_page; /* next page of isolate page as LRU order */
>> > + struct list_head lru;
>> > +};
>> > /*
>>
>> So this thing has to be allocated from somewhere. We can't put it
>> on the stack as we're already in danger there so we must be using
>> kmalloc. In the reclaim paths, this should be avoided obviously.
>> For compaction, we might hurt the compaction success rates if pages
>> are pinned with control structures. It's something to be wary of.
>>
>> At LSF/MM, I stated a preference for swapping the source and
>> destination pages in the LRU. This unfortunately means that the LRU
>> now contains a page in the process of being migrated to and the backout
>> paths for migration failure become a lot more complex. Reclaim should
>> be ok as it'll should fail to lock the page and recycle it in the list.
>> This avoids allocations but I freely admit that I'm not in the position
>> to implement such a thing right now :(
>
> I like swaping to fake page. one way pointer might become dangerous. vmscan can
> detect fake page and ignore it.
I guess it means swapping between migrated-from page and migrated-to page.
Right? If so, migrated-from page is already removed from LRU list and
migrated-to page isn't LRU as it's page allocated newly so they don't
have any LRU information. How can we swap them? We need space keeps
LRU information before removing the page from LRU list. :(
Could you explain in detail about swapping if I miss something?
About one way pointer, I think it's no problem. Worst case I imagine
is to put the page in head of LRU list. It means it's same issue now.
So it doesn't make worse than now.
>
> ie,
> is_fake_page(page)
> {
> if (is_stack_addr((void*)page))
> return true;
> return false;
> }
>
> Also, I like to use stack rather than kmalloc in compaction.
>
Compaction is a procedure of reclaim. As you know, we had a problem
about using of stack during reclaim path.
I admit kmalloc-thing isn't good.
I will try to solve the issue as TODO.
Thanks for the review, KOSAKI.
--
Kind regards,
Minchan Kim
> On Sun, May 1, 2011 at 10:47 PM, KOSAKI Motohiro
> <[email protected]> wrote:
> >> > +/* This structure is used for keeping LRU ordering of isolated page */
> >> > +struct pages_lru {
> >> > + struct page *page; /* isolated page */
> >> > + struct page *prev_page; /* previous page of isolate page as LRU order */
> >> > + struct page *next_page; /* next page of isolate page as LRU order */
> >> > + struct list_head lru;
> >> > +};
> >> > /*
> >>
> >> So this thing has to be allocated from somewhere. We can't put it
> >> on the stack as we're already in danger there so we must be using
> >> kmalloc. In the reclaim paths, this should be avoided obviously.
> >> For compaction, we might hurt the compaction success rates if pages
> >> are pinned with control structures. It's something to be wary of.
> >>
> >> At LSF/MM, I stated a preference for swapping the source and
> >> destination pages in the LRU. This unfortunately means that the LRU
> >> now contains a page in the process of being migrated to and the backout
> >> paths for migration failure become a lot more complex. Reclaim should
> >> be ok as it'll should fail to lock the page and recycle it in the list.
> >> This avoids allocations but I freely admit that I'm not in the position
> >> to implement such a thing right now :(
> >
> > I like swaping to fake page. one way pointer might become dangerous. vmscan can
> > detect fake page and ignore it.
>
>
> I guess it means swapping between migrated-from page and migrated-to page.
> Right?
no. I was intend to use fake struct page. but this idea is also good to me.
> If so, migrated-from page is already removed from LRU list and
> migrated-to page isn't LRU as it's page allocated newly so they don't
> have any LRU information. How can we swap them? We need space keeps
> LRU information before removing the page from LRU list. :(
pure fake struct page or preallocation migrated-to page?
>
> Could you explain in detail about swapping if I miss something?
>
> About one way pointer, I think it's no problem. Worst case I imagine
> is to put the page in head of LRU list. It means it's same issue now.
> So it doesn't make worse than now.
>
> >
> > ie,
> > is_fake_page(page)
> > {
> > if (is_stack_addr((void*)page))
> > return true;
> > return false;
> > }
> >
> > Also, I like to use stack rather than kmalloc in compaction.
> >
>
> Compaction is a procedure of reclaim. As you know, we had a problem
> about using of stack during reclaim path.
> I admit kmalloc-thing isn't good.
> I will try to solve the issue as TODO.
It depend on stack consumption size. because we don't call pageout()
from compaction path. It's big different from regular reclaim path.
>
> Thanks for the review, KOSAKI.
Yeah, thank you for making very good patch!