2013-05-17 09:48:16

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 0/5] Obey mark_page_accessed hint given by filesystems v3r1

This series could still do with some Tested-by's from the original bug
reporters. Andrew Perepechko?

Changelog since V2
o Beef up the comments in a number of places (akpm)
o Remove lru parameter from __lru_cache_add, lru_cache_add_lru (akpm)
o Use congestion_wait instead of wait_on_page_writeback in case
of storage disconnects (akpm)

Changelog since V1
o Add tracepoint to model age of page types (mel)

Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too quickly
and be reclaimed.

Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page.

Patch 1 adds two tracepoints for LRU page activation and insertion. Using
these processes it's possible to build a model of pages in the
LRU that can be processed offline.

Patch 2 defers making the decision on what LRU to add a page to until when
the pagevec is drained.

Patch 3 searches the local pagevec for pages to mark PageActive on
mark_page_accessed. The changelog explains why only the local
pagevec is examined.

Patches 4 and 5 tidy up the API.

postmark, a dd-based test and fs-mark both single and threaded mode were
run but none of them showed any performance degradation or gain as a result
of the patch.

Using patch 1, I built a *very* basic model of the LRU to examine
offline what the average age of different page types on the LRU were in
milliseconds. Of course, capturing the trace distorts the test as it's
written to local disk but it does not matter for the purposes of this test.
The average age of pages in milliseconds were

vanilla deferdrain
Average age mapped anon: 1454 1250
Average age mapped file: 127841 155552
Average age unmapped anon: 85 235
Average age unmapped file: 73633 38884
Average age unmapped buffers: 74054 116155

The LRU activity was mostly files which you'd expect for a dd-based
workload. Note that the average age of buffer pages is increased by the
series and it is expected this is due to the fact that the buffer pages are
now getting added to the active list when drained from the pagevecs. Note
that the average age of the unmapped file data is decreased as they are
still added to the inactive list and are reclaimed before the buffers. There
is no guarantee this is a universal win for all workloads and it would be
nice if the filesystem people gave some thought as to whether this decision
is generally a win or a loss.

fs/cachefiles/rdwr.c | 30 +++---------
fs/nfs/dir.c | 7 +--
include/linux/pagevec.h | 34 +-------------
include/linux/swap.h | 11 +++--
include/trace/events/pagemap.h | 89 +++++++++++++++++++++++++++++++++++
mm/rmap.c | 7 +--
mm/swap.c | 103 ++++++++++++++++++++++++++---------------
mm/vmscan.c | 4 +-
8 files changed, 176 insertions(+), 109 deletions(-)
create mode 100644 include/trace/events/pagemap.h

--
1.8.1.4


2013-05-17 09:48:18

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 1/5] mm: Add tracepoints for LRU activation and insertions

Using these tracepoints it is possible to model LRU activity and the
average residency of pages of different types. This can be used to
debug problems related to premature reclaim of pages of particular
types.

Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
---
include/trace/events/pagemap.h | 89 ++++++++++++++++++++++++++++++++++++++++++
mm/swap.c | 5 +++
2 files changed, 94 insertions(+)
create mode 100644 include/trace/events/pagemap.h

diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
new file mode 100644
index 0000000..1c9fabd
--- /dev/null
+++ b/include/trace/events/pagemap.h
@@ -0,0 +1,89 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pagemap
+
+#if !defined(_TRACE_PAGEMAP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGEMAP_H
+
+#include <linux/tracepoint.h>
+#include <linux/mm.h>
+
+#define PAGEMAP_MAPPED 0x0001u
+#define PAGEMAP_ANONYMOUS 0x0002u
+#define PAGEMAP_FILE 0x0004u
+#define PAGEMAP_SWAPCACHE 0x0008u
+#define PAGEMAP_SWAPBACKED 0x0010u
+#define PAGEMAP_MAPPEDDISK 0x0020u
+#define PAGEMAP_BUFFERS 0x0040u
+
+#define trace_pagemap_flags(page) ( \
+ (PageAnon(page) ? PAGEMAP_ANONYMOUS : PAGEMAP_FILE) | \
+ (page_mapped(page) ? PAGEMAP_MAPPED : 0) | \
+ (PageSwapCache(page) ? PAGEMAP_SWAPCACHE : 0) | \
+ (PageSwapBacked(page) ? PAGEMAP_SWAPBACKED : 0) | \
+ (PageMappedToDisk(page) ? PAGEMAP_MAPPEDDISK : 0) | \
+ (page_has_private(page) ? PAGEMAP_BUFFERS : 0) \
+ )
+
+TRACE_EVENT(mm_lru_insertion,
+
+ TP_PROTO(
+ struct page *page,
+ unsigned long pfn,
+ int lru,
+ unsigned long flags
+ ),
+
+ TP_ARGS(page, pfn, lru, flags),
+
+ TP_STRUCT__entry(
+ __field(struct page *, page )
+ __field(unsigned long, pfn )
+ __field(int, lru )
+ __field(unsigned long, flags )
+ ),
+
+ TP_fast_assign(
+ __entry->page = page;
+ __entry->pfn = pfn;
+ __entry->lru = lru;
+ __entry->flags = flags;
+ ),
+
+ /* Flag format is based on page-types.c formatting for pagemap */
+ TP_printk("page=%p pfn=%lu lru=%d flags=%s%s%s%s%s%s",
+ __entry->page,
+ __entry->pfn,
+ __entry->lru,
+ __entry->flags & PAGEMAP_MAPPED ? "M" : " ",
+ __entry->flags & PAGEMAP_ANONYMOUS ? "a" : "f",
+ __entry->flags & PAGEMAP_SWAPCACHE ? "s" : " ",
+ __entry->flags & PAGEMAP_SWAPBACKED ? "b" : " ",
+ __entry->flags & PAGEMAP_MAPPEDDISK ? "d" : " ",
+ __entry->flags & PAGEMAP_BUFFERS ? "B" : " ")
+);
+
+TRACE_EVENT(mm_lru_activate,
+
+ TP_PROTO(struct page *page, unsigned long pfn),
+
+ TP_ARGS(page, pfn),
+
+ TP_STRUCT__entry(
+ __field(struct page *, page )
+ __field(unsigned long, pfn )
+ ),
+
+ TP_fast_assign(
+ __entry->page = page;
+ __entry->pfn = pfn;
+ ),
+
+ /* Flag format is based on page-types.c formatting for pagemap */
+ TP_printk("page=%p pfn=%lu", __entry->page, __entry->pfn)
+
+);
+
+#endif /* _TRACE_PAGEMAP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/swap.c b/mm/swap.c
index dfd7d71..53c9ceb 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -34,6 +34,9 @@

#include "internal.h"

+#define CREATE_TRACE_POINTS
+#include <trace/events/pagemap.h>
+
/* How many pages do we try to swap or page in/out together? */
int page_cluster;

@@ -384,6 +387,7 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
SetPageActive(page);
lru += LRU_ACTIVE;
add_page_to_lru_list(page, lruvec, lru);
+ trace_mm_lru_activate(page, page_to_pfn(page));

__count_vm_event(PGACTIVATE);
update_page_reclaim_stat(lruvec, file, 1);
@@ -808,6 +812,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
SetPageActive(page);
add_page_to_lru_list(page, lruvec, lru);
update_page_reclaim_stat(lruvec, file, active);
+ trace_mm_lru_insertion(page, page_to_pfn(page), lru, trace_pagemap_flags(page));
}

/*
--
1.8.1.4

2013-05-17 09:48:23

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 4/5] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API

Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this is
that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
are misleading as the caller no longer has direct control over what LRU
the page is added to. Unused helpers are removed by this patch and existing
users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
directly and use the per-cpu pagevecs instead of creating their own pagevec.

Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
---
fs/cachefiles/rdwr.c | 30 +++++++-----------------------
fs/nfs/dir.c | 7 ++-----
include/linux/pagevec.h | 34 +---------------------------------
mm/swap.c | 12 ++++--------
4 files changed, 14 insertions(+), 69 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 317f9ee..ebaff36 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -12,6 +12,7 @@
#include <linux/mount.h>
#include <linux/slab.h>
#include <linux/file.h>
+#include <linux/swap.h>
#include "internal.h"

/*
@@ -227,8 +228,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
*/
static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
struct fscache_retrieval *op,
- struct page *netpage,
- struct pagevec *pagevec)
+ struct page *netpage)
{
struct cachefiles_one_read *monitor;
struct address_space *bmapping;
@@ -237,8 +237,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,

_enter("");

- pagevec_reinit(pagevec);
-
_debug("read back %p{%lu,%d}",
netpage, netpage->index, page_count(netpage));

@@ -283,9 +281,7 @@ installed_new_backing_page:
backpage = newpage;
newpage = NULL;

- page_cache_get(backpage);
- pagevec_add(pagevec, backpage);
- __pagevec_lru_add_file(pagevec);
+ lru_cache_add_file(backpage);

read_backing_page:
ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
if (block) {
/* submit the apparently valid page to the backing fs to be
* read from disk */
- ret = cachefiles_read_backing_file_one(object, op, page,
- &pagevec);
+ ret = cachefiles_read_backing_file_one(object, op, page);
} else if (cachefiles_has_space(cache, 0, 1) == 0) {
/* there's space in the cache we can use */
fscache_mark_page_cached(op, page);
@@ -482,14 +477,11 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
{
struct cachefiles_one_read *monitor = NULL;
struct address_space *bmapping = object->backer->d_inode->i_mapping;
- struct pagevec lru_pvec;
struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
int ret = 0;

_enter("");

- pagevec_init(&lru_pvec, 0);
-
list_for_each_entry_safe(netpage, _n, list, lru) {
list_del(&netpage->lru);

@@ -534,9 +526,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
backpage = newpage;
newpage = NULL;

- page_cache_get(backpage);
- if (!pagevec_add(&lru_pvec, backpage))
- __pagevec_lru_add_file(&lru_pvec);
+ lru_cache_add_file(backpage);

reread_backing_page:
ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -559,9 +549,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
goto nomem;
}

- page_cache_get(netpage);
- if (!pagevec_add(&lru_pvec, netpage))
- __pagevec_lru_add_file(&lru_pvec);
+ lru_cache_add_file(netpage);

/* install a monitor */
page_cache_get(netpage);
@@ -643,9 +631,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,

fscache_mark_page_cached(op, netpage);

- page_cache_get(netpage);
- if (!pagevec_add(&lru_pvec, netpage))
- __pagevec_lru_add_file(&lru_pvec);
+ lru_cache_add_file(netpage);

/* the netpage is unlocked and marked up to date here */
fscache_end_io(op, netpage, 0);
@@ -661,8 +647,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,

out:
/* tidy up */
- pagevec_lru_add_file(&lru_pvec);
-
if (newpage)
page_cache_release(newpage);
if (netpage)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e093e73..fcce7b7 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -33,6 +33,7 @@
#include <linux/pagevec.h>
#include <linux/namei.h>
#include <linux/mount.h>
+#include <linux/swap.h>
#include <linux/sched.h>
#include <linux/kmemleak.h>
#include <linux/xattr.h>
@@ -1759,7 +1760,6 @@ EXPORT_SYMBOL_GPL(nfs_unlink);
*/
int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
{
- struct pagevec lru_pvec;
struct page *page;
char *kaddr;
struct iattr attr;
@@ -1799,11 +1799,8 @@ int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
* No big deal if we can't add this page to the page cache here.
* READLINK will get the missing page from the server if needed.
*/
- pagevec_init(&lru_pvec, 0);
- if (!add_to_page_cache(page, dentry->d_inode->i_mapping, 0,
+ if (!add_to_page_cache_lru(page, dentry->d_inode->i_mapping, 0,
GFP_KERNEL)) {
- pagevec_add(&lru_pvec, page);
- pagevec_lru_add_file(&lru_pvec);
SetPageUptodate(page);
unlock_page(page);
} else
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 2aa12b8..e4dbfab3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -21,7 +21,7 @@ struct pagevec {
};

void __pagevec_release(struct pagevec *pvec);
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
+void __pagevec_lru_add(struct pagevec *pvec);
unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
pgoff_t start, unsigned nr_pages);
unsigned pagevec_lookup_tag(struct pagevec *pvec,
@@ -64,36 +64,4 @@ static inline void pagevec_release(struct pagevec *pvec)
__pagevec_release(pvec);
}

-static inline void __pagevec_lru_add_anon(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_INACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_active_anon(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_ACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_file(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_INACTIVE_FILE);
-}
-
-static inline void __pagevec_lru_add_active_file(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_ACTIVE_FILE);
-}
-
-static inline void pagevec_lru_add_file(struct pagevec *pvec)
-{
- if (pagevec_count(pvec))
- __pagevec_lru_add_file(pvec);
-}
-
-static inline void pagevec_lru_add_anon(struct pagevec *pvec)
-{
- if (pagevec_count(pvec))
- __pagevec_lru_add_anon(pvec);
-}
-
#endif /* _LINUX_PAGEVEC_H */
diff --git a/mm/swap.c b/mm/swap.c
index c53d161..6a9d0c4 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -505,7 +505,7 @@ void __lru_cache_add(struct page *page, enum lru_list lru)

page_cache_get(page);
if (!pagevec_space(pvec))
- __pagevec_lru_add(pvec, lru);
+ __pagevec_lru_add(pvec);
pagevec_add(pvec, page);
put_cpu_var(lru_add_pvec);
}
@@ -628,7 +628,7 @@ void lru_add_drain_cpu(int cpu)
struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);

if (pagevec_count(pvec))
- __pagevec_lru_add(pvec, NR_LRU_LISTS);
+ __pagevec_lru_add(pvec);

pvec = &per_cpu(lru_rotate_pvecs, cpu);
if (pagevec_count(pvec)) {
@@ -832,12 +832,10 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
void *arg)
{
- enum lru_list requested_lru = (enum lru_list)arg;
int file = page_is_file_cache(page);
int active = PageActive(page);
enum lru_list lru = page_lru(page);

- WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
VM_BUG_ON(PageUnevictable(page));
VM_BUG_ON(PageLRU(page));

@@ -851,11 +849,9 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
* Add the passed pages to the LRU, then drop the caller's refcount
* on them. Reinitialises the caller's pagevec.
*/
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
+void __pagevec_lru_add(struct pagevec *pvec)
{
- VM_BUG_ON(is_unevictable_lru(lru));
-
- pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, (void *)lru);
+ pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
}
EXPORT_SYMBOL(__pagevec_lru_add);

--
1.8.1.4

2013-05-17 09:48:49

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 5/5] mm: Remove lru parameter from __lru_cache_add and lru_cache_add_lru

Similar to __pagevec_lru_add, this patch removes the LRU parameter
from __lru_cache_add and lru_cache_add_lru as the caller does not
control the exact LRU the page gets added to. lru_cache_add_lru gets
renamed to lru_cache_add the name is silly without the lru parameter.
With the parameter removed, it is required that the caller indicate
if they want the page added to the active or inactive list by setting
or clearing PageActive respectively.

[[email protected]: Suggested the patch]
Signed-off-by: Mel Gorman <[email protected]>
---
include/linux/swap.h | 11 +++++++----
mm/rmap.c | 7 ++++---
mm/swap.c | 14 ++++----------
mm/vmscan.c | 4 +---
4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1701ce4..85d7437 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -10,6 +10,7 @@
#include <linux/node.h>
#include <linux/fs.h>
#include <linux/atomic.h>
+#include <linux/page-flags.h>
#include <asm/page.h>

struct notifier_block;
@@ -233,8 +234,8 @@ extern unsigned long nr_free_pagecache_pages(void);


/* linux/mm/swap.c */
-extern void __lru_cache_add(struct page *, enum lru_list lru);
-extern void lru_cache_add_lru(struct page *, enum lru_list lru);
+extern void __lru_cache_add(struct page *);
+extern void lru_cache_add(struct page *);
extern void lru_add_page_tail(struct page *page, struct page *page_tail,
struct lruvec *lruvec, struct list_head *head);
extern void activate_page(struct page *);
@@ -254,12 +255,14 @@ extern void add_page_to_unevictable_list(struct page *page);
*/
static inline void lru_cache_add_anon(struct page *page)
{
- __lru_cache_add(page, LRU_INACTIVE_ANON);
+ ClearPageActive(page);
+ __lru_cache_add(page);
}

static inline void lru_cache_add_file(struct page *page)
{
- __lru_cache_add(page, LRU_INACTIVE_FILE);
+ ClearPageActive(page);
+ __lru_cache_add(page);
}

/* linux/mm/vmscan.c */
diff --git a/mm/rmap.c b/mm/rmap.c
index 6280da8..e22ceeb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1093,9 +1093,10 @@ void page_add_new_anon_rmap(struct page *page,
else
__inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
__page_set_anon_rmap(page, vma, address, 1);
- if (!mlocked_vma_newpage(vma, page))
- lru_cache_add_lru(page, LRU_ACTIVE_ANON);
- else
+ if (!mlocked_vma_newpage(vma, page)) {
+ SetPageActive(page);
+ lru_cache_add(page);
+ } else
add_page_to_unevictable_list(page);
}

diff --git a/mm/swap.c b/mm/swap.c
index 6a9d0c4..ac23602 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -494,15 +494,10 @@ EXPORT_SYMBOL(mark_page_accessed);
* pagevec is drained. This gives a chance for the caller of __lru_cache_add()
* have the page added to the active list using mark_page_accessed().
*/
-void __lru_cache_add(struct page *page, enum lru_list lru)
+void __lru_cache_add(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);

- if (is_active_lru(lru))
- SetPageActive(page);
- else
- ClearPageActive(page);
-
page_cache_get(page);
if (!pagevec_space(pvec))
__pagevec_lru_add(pvec);
@@ -512,11 +507,10 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
EXPORT_SYMBOL(__lru_cache_add);

/**
- * lru_cache_add_lru - add a page to a page list
+ * lru_cache_add - add a page to a page list
* @page: the page to be added to the LRU.
- * @lru: the LRU list to which the page is added.
*/
-void lru_cache_add_lru(struct page *page, enum lru_list lru)
+void lru_cache_add(struct page *page)
{
if (PageActive(page)) {
VM_BUG_ON(PageUnevictable(page));
@@ -525,7 +519,7 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
}

VM_BUG_ON(PageLRU(page));
- __lru_cache_add(page, lru);
+ __lru_cache_add(page);
}

/**
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa6a853..50088ba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -546,7 +546,6 @@ int remove_mapping(struct address_space *mapping, struct page *page)
void putback_lru_page(struct page *page)
{
int lru;
- int active = !!TestClearPageActive(page);
int was_unevictable = PageUnevictable(page);

VM_BUG_ON(PageLRU(page));
@@ -561,8 +560,7 @@ redo:
* unevictable page on [in]active list.
* We know how to handle that.
*/
- lru = active + page_lru_base_type(page);
- lru_cache_add_lru(page, lru);
+ lru_cache_add(page);
} else {
/*
* Put unevictable pages directly on zone's unevictable
--
1.8.1.4

2013-05-17 09:49:13

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 3/5] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec

If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the LRU
is selected at LRU drain time, mark pages PageActive if they are on the
local pagevec so it gets moved to the correct list at LRU drain time.
Using a debugging patch it was found that for a simple git checkout based
workload that pages were never added to the active file list in practice
but with this patch applied they are.

before after
LRU Add Active File 0 750583
LRU Add Active Anon 2640587 2702818
LRU Add Inactive File 8833662 8068353
LRU Add Inactive Anon 207 200

Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed, migrated
or on a remote pagevec that is currently being drained. Marking it PageActive
is vunerable to races where PageLRU and Active bits are checked at the
wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
race hits, it could also free a PageActive page to the page allocator and
trigger a bad_page warning. Similarly a potential race exists between a
per-cpu drain on a pagevec list and an activation on a remote CPU.

lru_add_drain_cpu
__pagevec_lru_add
lru = page_lru(page);
mark_page_accessed
if (PageLRU(page))
activate_page
else
SetPageActive
SetPageLRU(page);
add_page_to_lru_list(page, lruvec, lru);

In this case a PageActive page is added to the inactivate list and later the
inactive/active stats will get skewed. While the PageActive checks in vmscan
could be removed and potentially dealt with, a skew in the statistics would
be very difficult to detect. Hence this patch deals just with the common case
where a page being marked accessed has just been added to the local pagevec.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/swap.c | 41 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 868b493..c53d161 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -432,6 +432,33 @@ void activate_page(struct page *page)
}
#endif

+static void __lru_cache_activate_page(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+ int i;
+
+ /*
+ * Search backwards on the optimistic assumption that the page being
+ * activated has just been added to this pagevec. Note that only
+ * the local pagevec is examined as a !PageLRU page could be in the
+ * process of being released, reclaimed, migrated or on a remote
+ * pagevec that is currently being drained. Furthermore, marking
+ * a remote pagevec's page PageActive potentially hits a race where
+ * a page is marked PageActive just after it is added to the inactive
+ * list causing accounting errors and BUG_ON checks to trigger.
+ */
+ for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
+ struct page *pagevec_page = pvec->pages[i];
+
+ if (pagevec_page == page) {
+ SetPageActive(page);
+ break;
+ }
+ }
+
+ put_cpu_var(lru_add_pvec);
+}
+
/*
* Mark a page as having seen activity.
*
@@ -442,8 +469,18 @@ void activate_page(struct page *page)
void mark_page_accessed(struct page *page)
{
if (!PageActive(page) && !PageUnevictable(page) &&
- PageReferenced(page) && PageLRU(page)) {
- activate_page(page);
+ PageReferenced(page)) {
+
+ /*
+ * If the page is on the LRU, queue it for activation via
+ * activate_page_pvecs. Otherwise, assume the page is on a
+ * pagevec, mark it active and it'll be moved to the active
+ * LRU on the next drain.
+ */
+ if (PageLRU(page))
+ activate_page(page);
+ else
+ __lru_cache_activate_page(page);
ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
--
1.8.1.4

2013-05-17 09:49:43

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 2/5] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time

mark_page_accessed cannot activate an inactive page that is located on
an inactive LRU pagevec. Hints from filesystems may be ignored as a
result. In preparation for fixing that problem, this patch removes the
per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
added to is deferred until the pagevec is drained.

This means that fewer pagevecs are available and potentially there is
greater contention on the LRU lock. However, this only applies in the case
where there is an almost perfect mix of file, anon, active and inactive
pages being added to the LRU. In practice I expect that we are adding
stream of pages of a particular time and that the changes in contention
will barely be measurable.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
---
mm/swap.c | 47 +++++++++++++++++++++--------------------------
1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 53c9ceb..868b493 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -40,7 +40,7 @@
/* How many pages do we try to swap or page in/out together? */
int page_cluster;

-static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

@@ -452,22 +452,25 @@ void mark_page_accessed(struct page *page)
EXPORT_SYMBOL(mark_page_accessed);

/*
- * Order of operations is important: flush the pagevec when it's already
- * full, not when adding the last page, to make sure that last page is
- * not added to the LRU directly when passed to this function. Because
- * mark_page_accessed() (called after this when writing) only activates
- * pages that are on the LRU, linear writes in subpage chunks would see
- * every PAGEVEC_SIZE page activated, which is unexpected.
+ * Queue the page for addition to the LRU via pagevec. The decision on whether
+ * to add the page to the [in]active [file|anon] list is deferred until the
+ * pagevec is drained. This gives a chance for the caller of __lru_cache_add()
+ * have the page added to the active list using mark_page_accessed().
*/
void __lru_cache_add(struct page *page, enum lru_list lru)
{
- struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
+ struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+
+ if (is_active_lru(lru))
+ SetPageActive(page);
+ else
+ ClearPageActive(page);

page_cache_get(page);
if (!pagevec_space(pvec))
__pagevec_lru_add(pvec, lru);
pagevec_add(pvec, page);
- put_cpu_var(lru_add_pvecs);
+ put_cpu_var(lru_add_pvec);
}
EXPORT_SYMBOL(__lru_cache_add);

@@ -480,13 +483,11 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
{
if (PageActive(page)) {
VM_BUG_ON(PageUnevictable(page));
- ClearPageActive(page);
} else if (PageUnevictable(page)) {
VM_BUG_ON(PageActive(page));
- ClearPageUnevictable(page);
}

- VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
+ VM_BUG_ON(PageLRU(page));
__lru_cache_add(page, lru);
}

@@ -587,15 +588,10 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
*/
void lru_add_drain_cpu(int cpu)
{
- struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
- struct pagevec *pvec;
- int lru;
+ struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);

- for_each_lru(lru) {
- pvec = &pvecs[lru - LRU_BASE];
- if (pagevec_count(pvec))
- __pagevec_lru_add(pvec, lru);
- }
+ if (pagevec_count(pvec))
+ __pagevec_lru_add(pvec, NR_LRU_LISTS);

pvec = &per_cpu(lru_rotate_pvecs, cpu);
if (pagevec_count(pvec)) {
@@ -799,17 +795,16 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
void *arg)
{
- enum lru_list lru = (enum lru_list)arg;
- int file = is_file_lru(lru);
- int active = is_active_lru(lru);
+ enum lru_list requested_lru = (enum lru_list)arg;
+ int file = page_is_file_cache(page);
+ int active = PageActive(page);
+ enum lru_list lru = page_lru(page);

- VM_BUG_ON(PageActive(page));
+ WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
VM_BUG_ON(PageUnevictable(page));
VM_BUG_ON(PageLRU(page));

SetPageLRU(page);
- if (active)
- SetPageActive(page);
add_page_to_lru_list(page, lruvec, lru);
update_page_reclaim_stat(lruvec, file, active);
trace_mm_lru_insertion(page, page_to_pfn(page), lru, trace_pagemap_flags(page));
--
1.8.1.4

2013-05-17 18:43:25

by Andrew

[permalink] [raw]
Subject: Re: [PATCH 0/5] Obey mark_page_accessed hint given by filesystems v3r1

Mel, thank you very much for the patches!

Unfortunately, it may take some time for us to test them,
because currently we depend very much on an older (2.6.32-based)
kernel.

Andrew

On 05/17/2013 01:48 PM, Mel Gorman wrote:
> This series could still do with some Tested-by's from the original bug
> reporters. Andrew Perepechko?
>
> Changelog since V2
> o Beef up the comments in a number of places (akpm)
> o Remove lru parameter from __lru_cache_add, lru_cache_add_lru (akpm)
> o Use congestion_wait instead of wait_on_page_writeback in case
> of storage disconnects (akpm)
>
> Changelog since V1
> o Add tracepoint to model age of page types (mel)
>
> Andrew Perepechko reported a problem whereby pages are being prematurely
> evicted as the mark_page_accessed() hint is ignored for pages that are
> currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
> Alexey Lyahkov and Robin Dong have also reported problems recently that
> could be due to hot pages reaching the end of the inactive list too quickly
> and be reclaimed.
>
> Rather than addressing this on a per-filesystem basis, this series aims
> to fix the mark_page_accessed() interface by deferring what LRU a page
> is added to pagevec drain time and allowing mark_page_accessed() to call
> SetPageActive on a pagevec page.
>
> Patch 1 adds two tracepoints for LRU page activation and insertion. Using
> these processes it's possible to build a model of pages in the
> LRU that can be processed offline.
>
> Patch 2 defers making the decision on what LRU to add a page to until when
> the pagevec is drained.
>
> Patch 3 searches the local pagevec for pages to mark PageActive on
> mark_page_accessed. The changelog explains why only the local
> pagevec is examined.
>
> Patches 4 and 5 tidy up the API.
>
> postmark, a dd-based test and fs-mark both single and threaded mode were
> run but none of them showed any performance degradation or gain as a result
> of the patch.
>
> Using patch 1, I built a *very* basic model of the LRU to examine
> offline what the average age of different page types on the LRU were in
> milliseconds. Of course, capturing the trace distorts the test as it's
> written to local disk but it does not matter for the purposes of this test.
> The average age of pages in milliseconds were
>
> vanilla deferdrain
> Average age mapped anon: 1454 1250
> Average age mapped file: 127841 155552
> Average age unmapped anon: 85 235
> Average age unmapped file: 73633 38884
> Average age unmapped buffers: 74054 116155
>
> The LRU activity was mostly files which you'd expect for a dd-based
> workload. Note that the average age of buffer pages is increased by the
> series and it is expected this is due to the fact that the buffer pages are
> now getting added to the active list when drained from the pagevecs. Note
> that the average age of the unmapped file data is decreased as they are
> still added to the inactive list and are reclaimed before the buffers. There
> is no guarantee this is a universal win for all workloads and it would be
> nice if the filesystem people gave some thought as to whether this decision
> is generally a win or a loss.
>
> fs/cachefiles/rdwr.c | 30 +++---------
> fs/nfs/dir.c | 7 +--
> include/linux/pagevec.h | 34 +-------------
> include/linux/swap.h | 11 +++--
> include/trace/events/pagemap.h | 89 +++++++++++++++++++++++++++++++++++
> mm/rmap.c | 7 +--
> mm/swap.c | 103 ++++++++++++++++++++++++++---------------
> mm/vmscan.c | 4 +-
> 8 files changed, 176 insertions(+), 109 deletions(-)
> create mode 100644 include/trace/events/pagemap.h
>

2013-05-17 20:38:07

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 2/5] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time

On Fri, May 17, 2013 at 10:48:04AM +0100, Mel Gorman wrote:
> mark_page_accessed cannot activate an inactive page that is located on
> an inactive LRU pagevec. Hints from filesystems may be ignored as a
> result. In preparation for fixing that problem, this patch removes the
> per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> added to is deferred until the pagevec is drained.
>
> This means that fewer pagevecs are available and potentially there is
> greater contention on the LRU lock. However, this only applies in the case
> where there is an almost perfect mix of file, anon, active and inactive
> pages being added to the LRU. In practice I expect that we are adding
> stream of pages of a particular time and that the changes in contention
> will barely be measurable.
>
> Signed-off-by: Mel Gorman <[email protected]>
> Acked-by: Rik van Riel <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

2013-05-17 20:39:14

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 3/5] mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec

On Fri, May 17, 2013 at 10:48:05AM +0100, Mel Gorman wrote:
> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> may fail to move a page to the active list as expected. Now that the LRU
> is selected at LRU drain time, mark pages PageActive if they are on the
> local pagevec so it gets moved to the correct list at LRU drain time.
> Using a debugging patch it was found that for a simple git checkout based
> workload that pages were never added to the active file list in practice
> but with this patch applied they are.
>
> before after
> LRU Add Active File 0 750583
> LRU Add Active Anon 2640587 2702818
> LRU Add Inactive File 8833662 8068353
> LRU Add Inactive Anon 207 200
>
> Note that only pages on the local pagevec are considered on purpose. A
> !PageLRU page could be in the process of being released, reclaimed, migrated
> or on a remote pagevec that is currently being drained. Marking it PageActive
> is vunerable to races where PageLRU and Active bits are checked at the
> wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
> race hits, it could also free a PageActive page to the page allocator and
> trigger a bad_page warning. Similarly a potential race exists between a
> per-cpu drain on a pagevec list and an activation on a remote CPU.
>
> lru_add_drain_cpu
> __pagevec_lru_add
> lru = page_lru(page);
> mark_page_accessed
> if (PageLRU(page))
> activate_page
> else
> SetPageActive
> SetPageLRU(page);
> add_page_to_lru_list(page, lruvec, lru);
>
> In this case a PageActive page is added to the inactivate list and later the
> inactive/active stats will get skewed. While the PageActive checks in vmscan
> could be removed and potentially dealt with, a skew in the statistics would
> be very difficult to detect. Hence this patch deals just with the common case
> where a page being marked accessed has just been added to the local pagevec.
>
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

2013-05-17 20:45:08

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API

On Fri, May 17, 2013 at 10:48:06AM +0100, Mel Gorman wrote:
> Now that the LRU to add a page to is decided at LRU-add time, remove the
> misleading lru parameter from __pagevec_lru_add. A consequence of this is
> that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> are misleading as the caller no longer has direct control over what LRU
> the page is added to. Unused helpers are removed by this patch and existing
> users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> directly and use the per-cpu pagevecs instead of creating their own pagevec.
>
> Signed-off-by: Mel Gorman <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> Reviewed-by: Rik van Riel <[email protected]>

> @@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
> if (block) {
> /* submit the apparently valid page to the backing fs to be
> * read from disk */
> - ret = cachefiles_read_backing_file_one(object, op, page,
> - &pagevec);
> + ret = cachefiles_read_backing_file_one(object, op, page);

Also remove the declaration and pagevec_init a few lines up? Minor
detail, though.

Acked-by: Johannes Weiner <[email protected]>

2013-05-17 20:50:04

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 5/5] mm: Remove lru parameter from __lru_cache_add and lru_cache_add_lru

On Fri, May 17, 2013 at 10:48:07AM +0100, Mel Gorman wrote:
> Similar to __pagevec_lru_add, this patch removes the LRU parameter
> from __lru_cache_add and lru_cache_add_lru as the caller does not
> control the exact LRU the page gets added to. lru_cache_add_lru gets
> renamed to lru_cache_add the name is silly without the lru parameter.
> With the parameter removed, it is required that the caller indicate
> if they want the page added to the active or inactive list by setting
> or clearing PageActive respectively.
>
> [[email protected]: Suggested the patch]
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

2013-05-20 14:38:56

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 4/5] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API

On Fri, May 17, 2013 at 04:44:52PM -0400, Johannes Weiner wrote:
> On Fri, May 17, 2013 at 10:48:06AM +0100, Mel Gorman wrote:
> > Now that the LRU to add a page to is decided at LRU-add time, remove the
> > misleading lru parameter from __pagevec_lru_add. A consequence of this is
> > that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> > are misleading as the caller no longer has direct control over what LRU
> > the page is added to. Unused helpers are removed by this patch and existing
> > users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> > directly and use the per-cpu pagevecs instead of creating their own pagevec.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
> > Reviewed-by: Jan Kara <[email protected]>
> > Reviewed-by: Rik van Riel <[email protected]>
>
> > @@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
> > if (block) {
> > /* submit the apparently valid page to the backing fs to be
> > * read from disk */
> > - ret = cachefiles_read_backing_file_one(object, op, page,
> > - &pagevec);
> > + ret = cachefiles_read_backing_file_one(object, op, page);
>
> Also remove the declaration and pagevec_init a few lines up? Minor
> detail, though.
>

It's still used and passed to fscache_mark_pages_cached

> Acked-by: Johannes Weiner <[email protected]>

Thanks

--
Mel Gorman
SUSE Labs