2013-04-29 16:31:56

by Mel Gorman

[permalink] [raw]
Subject: [RFC PATCH 0/3] Obey mark_page_accessed hint given by filesystems

Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too quickly
and be reclaimed.

Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page. This opens some important races that
I think should be harmless but needs double checking. The races and the
VM_BUG_ON checks that are removed are all described in patch 2.

This series received only very light testing but it did not immediately
blow up and a debugging patch confirmed that pages are now getting added
to the active file LRU list that would previously have been added to the
inactive list.

fs/cachefiles/rdwr.c | 30 ++++++------------------
fs/nfs/dir.c | 7 ++----
include/linux/pagevec.h | 34 +--------------------------
mm/swap.c | 61 ++++++++++++++++++++++++-------------------------
mm/vmscan.c | 3 ---
5 files changed, 40 insertions(+), 95 deletions(-)

--
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>


2013-04-29 16:31:57

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 1/3] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time

mark_page_accessed cannot activate an inactive page that is located on
an inactive LRU pagevec. Hints from filesystems may be ignored as a
result. In preparation for fixing that problem, this patch removes the
per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
added to is deferred until the pagevec is drained.

This means that fewer pagevecs are available and potentially there is
greater contention on the LRU lock. However, this only applies in the case
where there is an almost perfect mix of file, anon, active and inactive
pages being added to the LRU. In practice I expect that we are adding
stream of pages of a particular time and that the changes in contention
will barely be measurable.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/swap.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 8a529a0..80fbc37 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -36,7 +36,7 @@
/* How many pages do we try to swap or page in/out together? */
int page_cluster;

-static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

@@ -456,13 +456,18 @@ EXPORT_SYMBOL(mark_page_accessed);
*/
void __lru_cache_add(struct page *page, enum lru_list lru)
{
- struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
+ struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+
+ if (is_active_lru(lru))
+ SetPageActive(page);
+ else
+ ClearPageActive(page);

page_cache_get(page);
if (!pagevec_space(pvec))
__pagevec_lru_add(pvec, lru);
pagevec_add(pvec, page);
- put_cpu_var(lru_add_pvecs);
+ put_cpu_var(lru_add_pvec);
}
EXPORT_SYMBOL(__lru_cache_add);

@@ -475,13 +480,11 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
{
if (PageActive(page)) {
VM_BUG_ON(PageUnevictable(page));
- ClearPageActive(page);
} else if (PageUnevictable(page)) {
VM_BUG_ON(PageActive(page));
- ClearPageUnevictable(page);
}

- VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
+ VM_BUG_ON(PageLRU(page));
__lru_cache_add(page, lru);
}

@@ -582,15 +585,10 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
*/
void lru_add_drain_cpu(int cpu)
{
- struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
- struct pagevec *pvec;
- int lru;
+ struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);

- for_each_lru(lru) {
- pvec = &pvecs[lru - LRU_BASE];
- if (pagevec_count(pvec))
- __pagevec_lru_add(pvec, lru);
- }
+ if (pagevec_count(pvec))
+ __pagevec_lru_add(pvec, NR_LRU_LISTS);

pvec = &per_cpu(lru_rotate_pvecs, cpu);
if (pagevec_count(pvec)) {
@@ -789,17 +787,16 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
void *arg)
{
- enum lru_list lru = (enum lru_list)arg;
- int file = is_file_lru(lru);
- int active = is_active_lru(lru);
+ enum lru_list requested_lru = (enum lru_list)arg;
+ int file = page_is_file_cache(page);
+ int active = PageActive(page);
+ enum lru_list lru = page_lru(page);

- VM_BUG_ON(PageActive(page));
+ WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
VM_BUG_ON(PageUnevictable(page));
VM_BUG_ON(PageLRU(page));

SetPageLRU(page);
- if (active)
- SetPageActive(page);
add_page_to_lru_list(page, lruvec, lru);
update_page_reclaim_stat(lruvec, file, active);
}
--
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-29 16:31:58

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the
LRU is selected at LRU drain time, mark pages PageActive if they are
on a pagevec so it gets moved to the correct list at LRU drain time.
Using a debugging patch it was found that for a simple git checkout
based workload that pages were never added to the active file list in
practice but with this patch applied they are.

before after
LRU Add Active File 0 757121
LRU Add Active Anon 2678833 2633924
LRU Add Inactive File 8821711 8085543
LRU Add Inactive Anon 183 200

The question to consider is if this is universally safe. If the page
was isolated for reclaim and there is a parallel mark_page_accessed()
then vmscan.c will get upset when it finds an isolated PageActive page.
Similarly a potential race exists between a per-cpu drain on a pagevec
list and an activation on a remote CPU.

lru_add_drain_cpu
__pagevec_lru_add
lru = page_lru(page);
mark_page_accessed
if (PageLRU(page))
activate_page
else
SetPageActive
SetPageLRU(page);
add_page_to_lru_list(page, lruvec, lru);

A PageActive page is now added to the inactivate list.

While this looks strange, I think it is sufficiently harmless that additional
barriers to address the case is not justified. Unfortunately, while I never
witnessed it myself, these parallel updates potentially trigger defensive
DEBUG_VM checks on PageActive and hence they are removed by this patch.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/swap.c | 18 ++++++++++++------
mm/vmscan.c | 3 ---
2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 80fbc37..2a10d08 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -437,8 +437,17 @@ void activate_page(struct page *page)
void mark_page_accessed(struct page *page)
{
if (!PageActive(page) && !PageUnevictable(page) &&
- PageReferenced(page) && PageLRU(page)) {
- activate_page(page);
+ PageReferenced(page)) {
+
+ /*
+ * If the page is on the LRU, promote immediately. Otherwise,
+ * assume the page is on a pagevec, mark it active and it'll
+ * be moved to the active LRU on the next drain
+ */
+ if (PageLRU(page))
+ activate_page(page);
+ else
+ SetPageActive(page);
ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
@@ -478,11 +487,8 @@ EXPORT_SYMBOL(__lru_cache_add);
*/
void lru_cache_add_lru(struct page *page, enum lru_list lru)
{
- if (PageActive(page)) {
+ if (PageActive(page))
VM_BUG_ON(PageUnevictable(page));
- } else if (PageUnevictable(page)) {
- VM_BUG_ON(PageActive(page));
- }

VM_BUG_ON(PageLRU(page));
__lru_cache_add(page, lru);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 88c5fed..751b897 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -704,7 +704,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
if (!trylock_page(page))
goto keep;

- VM_BUG_ON(PageActive(page));
VM_BUG_ON(page_zone(page) != zone);

sc->nr_scanned++;
@@ -935,7 +934,6 @@ activate_locked:
/* Not a candidate for swapping, so reclaim swap space. */
if (PageSwapCache(page) && vm_swap_full())
try_to_free_swap(page);
- VM_BUG_ON(PageActive(page));
SetPageActive(page);
pgactivate++;
keep_locked:
@@ -3488,7 +3486,6 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
if (page_evictable(page)) {
enum lru_list lru = page_lru_base_type(page);

- VM_BUG_ON(PageActive(page));
ClearPageUnevictable(page);
del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
add_page_to_lru_list(page, lruvec, lru);
--
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-29 16:31:59

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 3/3] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API

Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this is
that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
are misleading as the caller no longer has direct control over what LRU
the page is added to. Unused helpers are removed by this patch and existing
users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
directly and use the per-cpu pagevecs instead of creating their own pagevec.

Signed-off-by: Mel Gorman <[email protected]>
---
fs/cachefiles/rdwr.c | 30 +++++++-----------------------
fs/nfs/dir.c | 7 ++-----
include/linux/pagevec.h | 34 +---------------------------------
mm/swap.c | 12 ++++--------
4 files changed, 14 insertions(+), 69 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 4809922..d24c13e 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -12,6 +12,7 @@
#include <linux/mount.h>
#include <linux/slab.h>
#include <linux/file.h>
+#include <linux/swap.h>
#include "internal.h"

/*
@@ -227,8 +228,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
*/
static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
struct fscache_retrieval *op,
- struct page *netpage,
- struct pagevec *pagevec)
+ struct page *netpage)
{
struct cachefiles_one_read *monitor;
struct address_space *bmapping;
@@ -237,8 +237,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,

_enter("");

- pagevec_reinit(pagevec);
-
_debug("read back %p{%lu,%d}",
netpage, netpage->index, page_count(netpage));

@@ -283,9 +281,7 @@ installed_new_backing_page:
backpage = newpage;
newpage = NULL;

- page_cache_get(backpage);
- pagevec_add(pagevec, backpage);
- __pagevec_lru_add_file(pagevec);
+ lru_cache_add_file(backpage);

read_backing_page:
ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
if (block) {
/* submit the apparently valid page to the backing fs to be
* read from disk */
- ret = cachefiles_read_backing_file_one(object, op, page,
- &pagevec);
+ ret = cachefiles_read_backing_file_one(object, op, page);
} else if (cachefiles_has_space(cache, 0, 1) == 0) {
/* there's space in the cache we can use */
fscache_mark_page_cached(op, page);
@@ -482,14 +477,11 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
{
struct cachefiles_one_read *monitor = NULL;
struct address_space *bmapping = object->backer->d_inode->i_mapping;
- struct pagevec lru_pvec;
struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
int ret = 0;

_enter("");

- pagevec_init(&lru_pvec, 0);
-
list_for_each_entry_safe(netpage, _n, list, lru) {
list_del(&netpage->lru);

@@ -534,9 +526,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
backpage = newpage;
newpage = NULL;

- page_cache_get(backpage);
- if (!pagevec_add(&lru_pvec, backpage))
- __pagevec_lru_add_file(&lru_pvec);
+ lru_cache_add_file(backpage);

reread_backing_page:
ret = bmapping->a_ops->readpage(NULL, backpage);
@@ -559,9 +549,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
goto nomem;
}

- page_cache_get(netpage);
- if (!pagevec_add(&lru_pvec, netpage))
- __pagevec_lru_add_file(&lru_pvec);
+ lru_cache_add_file(netpage);

/* install a monitor */
page_cache_get(netpage);
@@ -643,9 +631,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,

fscache_mark_page_cached(op, netpage);

- page_cache_get(netpage);
- if (!pagevec_add(&lru_pvec, netpage))
- __pagevec_lru_add_file(&lru_pvec);
+ lru_cache_add_file(netpage);

/* the netpage is unlocked and marked up to date here */
fscache_end_io(op, netpage, 0);
@@ -661,8 +647,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,

out:
/* tidy up */
- pagevec_lru_add_file(&lru_pvec);
-
if (newpage)
page_cache_release(newpage);
if (netpage)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index f23f455..787d89c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -33,6 +33,7 @@
#include <linux/pagevec.h>
#include <linux/namei.h>
#include <linux/mount.h>
+#include <linux/swap.h>
#include <linux/sched.h>
#include <linux/kmemleak.h>
#include <linux/xattr.h>
@@ -1757,7 +1758,6 @@ EXPORT_SYMBOL_GPL(nfs_unlink);
*/
int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
{
- struct pagevec lru_pvec;
struct page *page;
char *kaddr;
struct iattr attr;
@@ -1797,11 +1797,8 @@ int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
* No big deal if we can't add this page to the page cache here.
* READLINK will get the missing page from the server if needed.
*/
- pagevec_init(&lru_pvec, 0);
- if (!add_to_page_cache(page, dentry->d_inode->i_mapping, 0,
+ if (!add_to_page_cache_lru(page, dentry->d_inode->i_mapping, 0,
GFP_KERNEL)) {
- pagevec_add(&lru_pvec, page);
- pagevec_lru_add_file(&lru_pvec);
SetPageUptodate(page);
unlock_page(page);
} else
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 2aa12b8..e4dbfab3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -21,7 +21,7 @@ struct pagevec {
};

void __pagevec_release(struct pagevec *pvec);
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
+void __pagevec_lru_add(struct pagevec *pvec);
unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
pgoff_t start, unsigned nr_pages);
unsigned pagevec_lookup_tag(struct pagevec *pvec,
@@ -64,36 +64,4 @@ static inline void pagevec_release(struct pagevec *pvec)
__pagevec_release(pvec);
}

-static inline void __pagevec_lru_add_anon(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_INACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_active_anon(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_ACTIVE_ANON);
-}
-
-static inline void __pagevec_lru_add_file(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_INACTIVE_FILE);
-}
-
-static inline void __pagevec_lru_add_active_file(struct pagevec *pvec)
-{
- __pagevec_lru_add(pvec, LRU_ACTIVE_FILE);
-}
-
-static inline void pagevec_lru_add_file(struct pagevec *pvec)
-{
- if (pagevec_count(pvec))
- __pagevec_lru_add_file(pvec);
-}
-
-static inline void pagevec_lru_add_anon(struct pagevec *pvec)
-{
- if (pagevec_count(pvec))
- __pagevec_lru_add_anon(pvec);
-}
-
#endif /* _LINUX_PAGEVEC_H */
diff --git a/mm/swap.c b/mm/swap.c
index 2a10d08..83eff2f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -474,7 +474,7 @@ void __lru_cache_add(struct page *page, enum lru_list lru)

page_cache_get(page);
if (!pagevec_space(pvec))
- __pagevec_lru_add(pvec, lru);
+ __pagevec_lru_add(pvec);
pagevec_add(pvec, page);
put_cpu_var(lru_add_pvec);
}
@@ -594,7 +594,7 @@ void lru_add_drain_cpu(int cpu)
struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);

if (pagevec_count(pvec))
- __pagevec_lru_add(pvec, NR_LRU_LISTS);
+ __pagevec_lru_add(pvec);

pvec = &per_cpu(lru_rotate_pvecs, cpu);
if (pagevec_count(pvec)) {
@@ -793,12 +793,10 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
void *arg)
{
- enum lru_list requested_lru = (enum lru_list)arg;
int file = page_is_file_cache(page);
int active = PageActive(page);
enum lru_list lru = page_lru(page);

- WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
VM_BUG_ON(PageUnevictable(page));
VM_BUG_ON(PageLRU(page));

@@ -811,11 +809,9 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
* Add the passed pages to the LRU, then drop the caller's refcount
* on them. Reinitialises the caller's pagevec.
*/
-void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
+void __pagevec_lru_add(struct pagevec *pvec)
{
- VM_BUG_ON(is_unevictable_lru(lru));
-
- pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, (void *)lru);
+ pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
}
EXPORT_SYMBOL(__pagevec_lru_add);

--
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-29 16:50:40

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time

On 04/29/2013 12:31 PM, Mel Gorman wrote:
> mark_page_accessed cannot activate an inactive page that is located on
> an inactive LRU pagevec. Hints from filesystems may be ignored as a
> result. In preparation for fixing that problem, this patch removes the
> per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> added to is deferred until the pagevec is drained.
>
> This means that fewer pagevecs are available and potentially there is
> greater contention on the LRU lock. However, this only applies in the case
> where there is an almost perfect mix of file, anon, active and inactive
> pages being added to the LRU. In practice I expect that we are adding
> stream of pages of a particular time and that the changes in contention
> will barely be measurable.
>
> Signed-off-by: Mel Gorman <[email protected]>

Acked-by: Rik van Riel <[email protected]>


--
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-29 17:12:03

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

On 04/29/2013 12:31 PM, Mel Gorman wrote:

> A PageActive page is now added to the inactivate list.
>
> While this looks strange, I think it is sufficiently harmless that additional
> barriers to address the case is not justified. Unfortunately, while I never
> witnessed it myself, these parallel updates potentially trigger defensive
> DEBUG_VM checks on PageActive and hence they are removed by this patch.

Could this not cause issues with __page_cache_release, called from
munmap, exit, truncate, etc.?

Could the eventual skewing of active vs inactive numbers break page
reclaim heuristics?

I wonder if we would need to move to a scheme where the PG_active bit
is always the authoritive one, and we never pass an overriding "lru"
parameter to __pagevec_lru_add.

Would memory ordering between SetPageLRU and testing for PageLRU be
enough to then prevent the statistics from going off?

--
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-29 21:53:27

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

On Mon, Apr 29, 2013 at 01:12:03PM -0400, Rik van Riel wrote:
> On 04/29/2013 12:31 PM, Mel Gorman wrote:
>
> >A PageActive page is now added to the inactivate list.
> >
> >While this looks strange, I think it is sufficiently harmless that additional
> >barriers to address the case is not justified. Unfortunately, while I never
> >witnessed it myself, these parallel updates potentially trigger defensive
> >DEBUG_VM checks on PageActive and hence they are removed by this patch.
>
> Could this not cause issues with __page_cache_release, called from
> munmap, exit, truncate, etc.?
>

Possibly if the page was activated before it was freed to the page allocator.

> Could the eventual skewing of active vs inactive numbers break page
> reclaim heuristics?
>

Yes, good point and it would be very difficult to detect that it happened.

> I wonder if we would need to move to a scheme where the PG_active bit
> is always the authoritive one, and we never pass an overriding "lru"
> parameter to __pagevec_lru_add.
>

I don't think that necessarily gets away from the various differnet races.

> Would memory ordering between SetPageLRU and testing for PageLRU be
> enough to then prevent the statistics from going off?
>

I do not think all the holes in every cases can be closed. There are a
lot of races and bugs would creep in eventually. As the common case of
interest is when a page has recently been added to the local CPUs pagevec
the following should be sufficient.

---8<---
mm: Activate !PageLRU pages on mark_page_accessed if page is on local pagevec

If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the LRU
is selected at LRU drain time, mark pages PageActive if they are on the
local pagevec so it gets moved to the correct list at LRU drain time.
Using a debugging patch it was found that for a simple git checkout based
workload that pages were never added to the active file list in practice
but with this patch applied they are.

before after
LRU Add Active File 0 750583
LRU Add Active Anon 2640587 2702818
LRU Add Inactive File 8833662 8068353
LRU Add Inactive Anon 207 200

Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed, migrated
or on a remote pagevec that is currently being drained. Marking it PageActive
is vunerable to races where PageLRU and Active bits are checked at the
wrong time. Page reclaim will trigger VM_BUG_ONs but depending on when the
race hits, it could also free a PageActive page to the page allocator and
trigger a bad_page warning. Similarly a potential race exists between a
per-cpu drain on a pagevec list and an activation on a remote CPU.

lru_add_drain_cpu
__pagevec_lru_add
lru = page_lru(page);
mark_page_accessed
if (PageLRU(page))
activate_page
else
SetPageActive
SetPageLRU(page);
add_page_to_lru_list(page, lruvec, lru);

In this case a PageActive page is added to the inactivate list and later the
inactive/active stats will get skewed. While the PageActive checks in vmscan
could be removed and potentially dealt with, a skew in the statistics would
be very difficult to detect. Hence this patch deals just with the common case
where a page being marked accessed has just been added to the local pagevec.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/swap.c | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 80fbc37..96565eb 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -427,6 +427,27 @@ void activate_page(struct page *page)
}
#endif

+static void __lru_cache_activate_page(struct page *page)
+{
+ struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+ int i;
+
+ /*
+ * Search backwards on the optimistic assumption that the page being
+ * activated has just been added to this pagevec
+ */
+ for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
+ struct page *pagevec_page = pvec->pages[i];
+
+ if (pagevec_page == page) {
+ SetPageActive(page);
+ break;
+ }
+ }
+
+ put_cpu_var(lru_add_pvec);
+}
+
/*
* Mark a page as having seen activity.
*
@@ -437,8 +458,17 @@ void activate_page(struct page *page)
void mark_page_accessed(struct page *page)
{
if (!PageActive(page) && !PageUnevictable(page) &&
- PageReferenced(page) && PageLRU(page)) {
- activate_page(page);
+ PageReferenced(page)) {
+
+ /*
+ * If the page is on the LRU, promote immediately. Otherwise,
+ * assume the page is on a pagevec, mark it active and it'll
+ * be moved to the active LRU on the next drain
+ */
+ if (PageLRU(page))
+ activate_page(page);
+ else
+ __lru_cache_activate_page(page);
ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-01 05:41:34

by Sam Ben

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

Hi Mel,
On 04/30/2013 12:31 AM, Mel Gorman wrote:
> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> may fail to move a page to the active list as expected. Now that the
> LRU is selected at LRU drain time, mark pages PageActive if they are
> on a pagevec so it gets moved to the correct list at LRU drain time.
> Using a debugging patch it was found that for a simple git checkout
> based workload that pages were never added to the active file list in

Could you show us the details of your workload?

> practice but with this patch applied they are.
>
> before after
> LRU Add Active File 0 757121
> LRU Add Active Anon 2678833 2633924
> LRU Add Inactive File 8821711 8085543
> LRU Add Inactive Anon 183 200
>
> The question to consider is if this is universally safe. If the page
> was isolated for reclaim and there is a parallel mark_page_accessed()
> then vmscan.c will get upset when it finds an isolated PageActive page.
> Similarly a potential race exists between a per-cpu drain on a pagevec
> list and an activation on a remote CPU.
>
> lru_add_drain_cpu
> __pagevec_lru_add
> lru = page_lru(page);
> mark_page_accessed
> if (PageLRU(page))
> activate_page
> else
> SetPageActive
> SetPageLRU(page);
> add_page_to_lru_list(page, lruvec, lru);
>
> A PageActive page is now added to the inactivate list.
>
> While this looks strange, I think it is sufficiently harmless that additional
> barriers to address the case is not justified. Unfortunately, while I never
> witnessed it myself, these parallel updates potentially trigger defensive
> DEBUG_VM checks on PageActive and hence they are removed by this patch.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/swap.c | 18 ++++++++++++------
> mm/vmscan.c | 3 ---
> 2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 80fbc37..2a10d08 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -437,8 +437,17 @@ void activate_page(struct page *page)
> void mark_page_accessed(struct page *page)
> {
> if (!PageActive(page) && !PageUnevictable(page) &&
> - PageReferenced(page) && PageLRU(page)) {
> - activate_page(page);
> + PageReferenced(page)) {
> +
> + /*
> + * If the page is on the LRU, promote immediately. Otherwise,
> + * assume the page is on a pagevec, mark it active and it'll
> + * be moved to the active LRU on the next drain
> + */
> + if (PageLRU(page))
> + activate_page(page);
> + else
> + SetPageActive(page);
> ClearPageReferenced(page);
> } else if (!PageReferenced(page)) {
> SetPageReferenced(page);
> @@ -478,11 +487,8 @@ EXPORT_SYMBOL(__lru_cache_add);
> */
> void lru_cache_add_lru(struct page *page, enum lru_list lru)
> {
> - if (PageActive(page)) {
> + if (PageActive(page))
> VM_BUG_ON(PageUnevictable(page));
> - } else if (PageUnevictable(page)) {
> - VM_BUG_ON(PageActive(page));
> - }
>
> VM_BUG_ON(PageLRU(page));
> __lru_cache_add(page, lru);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 88c5fed..751b897 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -704,7 +704,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> if (!trylock_page(page))
> goto keep;
>
> - VM_BUG_ON(PageActive(page));
> VM_BUG_ON(page_zone(page) != zone);
>
> sc->nr_scanned++;
> @@ -935,7 +934,6 @@ activate_locked:
> /* Not a candidate for swapping, so reclaim swap space. */
> if (PageSwapCache(page) && vm_swap_full())
> try_to_free_swap(page);
> - VM_BUG_ON(PageActive(page));
> SetPageActive(page);
> pgactivate++;
> keep_locked:
> @@ -3488,7 +3486,6 @@ void check_move_unevictable_pages(struct page **pages, int nr_pages)
> if (page_evictable(page)) {
> enum lru_list lru = page_lru_base_type(page);
>
> - VM_BUG_ON(PageActive(page));
> ClearPageUnevictable(page);
> del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
> add_page_to_lru_list(page, lruvec, lru);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-01 06:28:17

by Will Huck

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] Obey mark_page_accessed hint given by filesystems

Hi Mel,
On 04/30/2013 12:31 AM, Mel Gorman wrote:
> Andrew Perepechko reported a problem whereby pages are being prematurely
> evicted as the mark_page_accessed() hint is ignored for pages that are
> currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
> Alexey Lyahkov and Robin Dong have also reported problems recently that
> could be due to hot pages reaching the end of the inactive list too quickly
> and be reclaimed.

Both shrink_active_list and shrink_inactive_list can call
lru_add_drain(), why the hot pages can't be mark Actived during this time?

> Rather than addressing this on a per-filesystem basis, this series aims
> to fix the mark_page_accessed() interface by deferring what LRU a page
> is added to pagevec drain time and allowing mark_page_accessed() to call
> SetPageActive on a pagevec page. This opens some important races that
> I think should be harmless but needs double checking. The races and the
> VM_BUG_ON checks that are removed are all described in patch 2.
>
> This series received only very light testing but it did not immediately
> blow up and a debugging patch confirmed that pages are now getting added
> to the active file LRU list that would previously have been added to the
> inactive list.
>
> fs/cachefiles/rdwr.c | 30 ++++++------------------
> fs/nfs/dir.c | 7 ++----
> include/linux/pagevec.h | 34 +--------------------------
> mm/swap.c | 61 ++++++++++++++++++++++++-------------------------
> mm/vmscan.c | 3 ---
> 5 files changed, 40 insertions(+), 95 deletions(-)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-01 08:06:44

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

On Wed, May 01, 2013 at 01:41:34PM +0800, Sam Ben wrote:
> Hi Mel,
> On 04/30/2013 12:31 AM, Mel Gorman wrote:
> >If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> >may fail to move a page to the active list as expected. Now that the
> >LRU is selected at LRU drain time, mark pages PageActive if they are
> >on a pagevec so it gets moved to the correct list at LRU drain time.
> >Using a debugging patch it was found that for a simple git checkout
> >based workload that pages were never added to the active file list in
>
> Could you show us the details of your workload?
>

The workload is git checkouts of a fixed number of commits for the
kernel git tree. It starts with a warm-up run that is not timed and then
records the time for a number of iterations.

--
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-01 08:08:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] Obey mark_page_accessed hint given by filesystems

On Wed, May 01, 2013 at 02:28:17PM +0800, Will Huck wrote:
> Hi Mel,
> On 04/30/2013 12:31 AM, Mel Gorman wrote:
> >Andrew Perepechko reported a problem whereby pages are being prematurely
> >evicted as the mark_page_accessed() hint is ignored for pages that are
> >currently on a pagevec -- http://www.spinics.net/lists/linux-ext4/msg37340.html .
> >Alexey Lyahkov and Robin Dong have also reported problems recently that
> >could be due to hot pages reaching the end of the inactive list too quickly
> >and be reclaimed.
>
> Both shrink_active_list and shrink_inactive_list can call
> lru_add_drain(), why the hot pages can't be mark Actived during this
> time?
>

Because by then it's not known that the filesystem had called
mark_page_accessed() to indicate the page should be activated.

--
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-01 08:14:25

by Ric Mason

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

On 05/01/2013 04:06 PM, Mel Gorman wrote:
> On Wed, May 01, 2013 at 01:41:34PM +0800, Sam Ben wrote:
>> Hi Mel,
>> On 04/30/2013 12:31 AM, Mel Gorman wrote:
>>> If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
>>> may fail to move a page to the active list as expected. Now that the
>>> LRU is selected at LRU drain time, mark pages PageActive if they are
>>> on a pagevec so it gets moved to the correct list at LRU drain time.
>>> Using a debugging patch it was found that for a simple git checkout
>>> based workload that pages were never added to the active file list in
>> Could you show us the details of your workload?
>>
> The workload is git checkouts of a fixed number of commits for the

Is there script which you used?

> kernel git tree. It starts with a warm-up run that is not timed and then
> records the time for a number of iterations.

How to record the time for a number of iterations? Is the iteration here
means lru scan?

>


2013-05-01 08:31:39

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: Ensure that mark_page_accessed moves pages to the active list

On Wed, May 01, 2013 at 04:14:16PM +0800, Ric Mason wrote:
> On 05/01/2013 04:06 PM, Mel Gorman wrote:
> >On Wed, May 01, 2013 at 01:41:34PM +0800, Sam Ben wrote:
> >>Hi Mel,
> >>On 04/30/2013 12:31 AM, Mel Gorman wrote:
> >>>If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
> >>>may fail to move a page to the active list as expected. Now that the
> >>>LRU is selected at LRU drain time, mark pages PageActive if they are
> >>>on a pagevec so it gets moved to the correct list at LRU drain time.
> >>>Using a debugging patch it was found that for a simple git checkout
> >>>based workload that pages were never added to the active file list in
> >>Could you show us the details of your workload?
> >>
> >The workload is git checkouts of a fixed number of commits for the
>
> Is there script which you used?
>

mmtests with config-global-dhp__io-gitcheckout-randread-starvation . Parallel
randread was to see if the random file read would push the metadata blocks
out or not. I expected it would not be enough to trigger the reported
problem but it would be enough to determine if file pages were getting
added to the active lists or not.

> >kernel git tree. It starts with a warm-up run that is not timed and then
> >records the time for a number of iterations.
>
> How to record the time for a number of iterations? Is the iteration
> here means lru scan?
>

/usr/bin/time

--
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-03 07:51:58

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time

On Mon 29-04-13 17:31:57, Mel Gorman wrote:
> mark_page_accessed cannot activate an inactive page that is located on
> an inactive LRU pagevec. Hints from filesystems may be ignored as a
> result. In preparation for fixing that problem, this patch removes the
> per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
> added to is deferred until the pagevec is drained.
>
> This means that fewer pagevecs are available and potentially there is
> greater contention on the LRU lock. However, this only applies in the case
> where there is an almost perfect mix of file, anon, active and inactive
> pages being added to the LRU. In practice I expect that we are adding
> stream of pages of a particular time and that the changes in contention
> will barely be measurable.
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> mm/swap.c | 37 +++++++++++++++++--------------------
> 1 file changed, 17 insertions(+), 20 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 8a529a0..80fbc37 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -36,7 +36,7 @@
> /* How many pages do we try to swap or page in/out together? */
> int page_cluster;
>
> -static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
> +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
> static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
> static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
>
> @@ -456,13 +456,18 @@ EXPORT_SYMBOL(mark_page_accessed);
> */
> void __lru_cache_add(struct page *page, enum lru_list lru)
> {
> - struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
> + struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
> +
> + if (is_active_lru(lru))
> + SetPageActive(page);
> + else
> + ClearPageActive(page);
>
> page_cache_get(page);
> if (!pagevec_space(pvec))
> __pagevec_lru_add(pvec, lru);
> pagevec_add(pvec, page);
> - put_cpu_var(lru_add_pvecs);
> + put_cpu_var(lru_add_pvec);
> }
> EXPORT_SYMBOL(__lru_cache_add);
>
> @@ -475,13 +480,11 @@ void lru_cache_add_lru(struct page *page, enum lru_list lru)
> {
> if (PageActive(page)) {
> VM_BUG_ON(PageUnevictable(page));
> - ClearPageActive(page);
> } else if (PageUnevictable(page)) {
> VM_BUG_ON(PageActive(page));
> - ClearPageUnevictable(page);
> }
>
> - VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
> + VM_BUG_ON(PageLRU(page));
> __lru_cache_add(page, lru);
> }
>
> @@ -582,15 +585,10 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
> */
> void lru_add_drain_cpu(int cpu)
> {
> - struct pagevec *pvecs = per_cpu(lru_add_pvecs, cpu);
> - struct pagevec *pvec;
> - int lru;
> + struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
>
> - for_each_lru(lru) {
> - pvec = &pvecs[lru - LRU_BASE];
> - if (pagevec_count(pvec))
> - __pagevec_lru_add(pvec, lru);
> - }
> + if (pagevec_count(pvec))
> + __pagevec_lru_add(pvec, NR_LRU_LISTS);
>
> pvec = &per_cpu(lru_rotate_pvecs, cpu);
> if (pagevec_count(pvec)) {
> @@ -789,17 +787,16 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
> static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
> void *arg)
> {
> - enum lru_list lru = (enum lru_list)arg;
> - int file = is_file_lru(lru);
> - int active = is_active_lru(lru);
> + enum lru_list requested_lru = (enum lru_list)arg;
> + int file = page_is_file_cache(page);
> + int active = PageActive(page);
> + enum lru_list lru = page_lru(page);
>
> - VM_BUG_ON(PageActive(page));
> + WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
Hum, so __lru_cache_add() calls this with 'requested_lru' set to whatever
LRU we currently want to add a page. How should this always be equal to the
LRU of all the pages we have cached in the pagevec?

And if I'm right, there doesn't seem to be a reason to pass requested_lru
to this function at all, does it?

> VM_BUG_ON(PageUnevictable(page));
> VM_BUG_ON(PageLRU(page));
>
> SetPageLRU(page);
> - if (active)
> - SetPageActive(page);
> add_page_to_lru_list(page, lruvec, lru);
> update_page_reclaim_stat(lruvec, file, active);
> }
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-03 08:00:43

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: Remove lru parameter from __pagevec_lru_add and remove parts of pagevec API

On Mon 29-04-13 17:31:59, Mel Gorman wrote:
> Now that the LRU to add a page to is decided at LRU-add time, remove the
> misleading lru parameter from __pagevec_lru_add. A consequence of this is
> that the pagevec_lru_add_file, pagevec_lru_add_anon and similar helpers
> are misleading as the caller no longer has direct control over what LRU
> the page is added to. Unused helpers are removed by this patch and existing
> users of pagevec_lru_add_file() are converted to use lru_cache_add_file()
> directly and use the per-cpu pagevecs instead of creating their own pagevec.
Ah, OK, in this patch you remove the bogus argument and the warning. You
can add:
Reviewed-by: Jan Kara <[email protected]>
for whatever it is worth for mm related patches :)

Honza
>
> Signed-off-by: Mel Gorman <[email protected]>
> ---
> fs/cachefiles/rdwr.c | 30 +++++++-----------------------
> fs/nfs/dir.c | 7 ++-----
> include/linux/pagevec.h | 34 +---------------------------------
> mm/swap.c | 12 ++++--------
> 4 files changed, 14 insertions(+), 69 deletions(-)
>
> diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
> index 4809922..d24c13e 100644
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -12,6 +12,7 @@
> #include <linux/mount.h>
> #include <linux/slab.h>
> #include <linux/file.h>
> +#include <linux/swap.h>
> #include "internal.h"
>
> /*
> @@ -227,8 +228,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op)
> */
> static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
> struct fscache_retrieval *op,
> - struct page *netpage,
> - struct pagevec *pagevec)
> + struct page *netpage)
> {
> struct cachefiles_one_read *monitor;
> struct address_space *bmapping;
> @@ -237,8 +237,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
>
> _enter("");
>
> - pagevec_reinit(pagevec);
> -
> _debug("read back %p{%lu,%d}",
> netpage, netpage->index, page_count(netpage));
>
> @@ -283,9 +281,7 @@ installed_new_backing_page:
> backpage = newpage;
> newpage = NULL;
>
> - page_cache_get(backpage);
> - pagevec_add(pagevec, backpage);
> - __pagevec_lru_add_file(pagevec);
> + lru_cache_add_file(backpage);
>
> read_backing_page:
> ret = bmapping->a_ops->readpage(NULL, backpage);
> @@ -452,8 +448,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op,
> if (block) {
> /* submit the apparently valid page to the backing fs to be
> * read from disk */
> - ret = cachefiles_read_backing_file_one(object, op, page,
> - &pagevec);
> + ret = cachefiles_read_backing_file_one(object, op, page);
> } else if (cachefiles_has_space(cache, 0, 1) == 0) {
> /* there's space in the cache we can use */
> fscache_mark_page_cached(op, page);
> @@ -482,14 +477,11 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
> {
> struct cachefiles_one_read *monitor = NULL;
> struct address_space *bmapping = object->backer->d_inode->i_mapping;
> - struct pagevec lru_pvec;
> struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
> int ret = 0;
>
> _enter("");
>
> - pagevec_init(&lru_pvec, 0);
> -
> list_for_each_entry_safe(netpage, _n, list, lru) {
> list_del(&netpage->lru);
>
> @@ -534,9 +526,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
> backpage = newpage;
> newpage = NULL;
>
> - page_cache_get(backpage);
> - if (!pagevec_add(&lru_pvec, backpage))
> - __pagevec_lru_add_file(&lru_pvec);
> + lru_cache_add_file(backpage);
>
> reread_backing_page:
> ret = bmapping->a_ops->readpage(NULL, backpage);
> @@ -559,9 +549,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
> goto nomem;
> }
>
> - page_cache_get(netpage);
> - if (!pagevec_add(&lru_pvec, netpage))
> - __pagevec_lru_add_file(&lru_pvec);
> + lru_cache_add_file(netpage);
>
> /* install a monitor */
> page_cache_get(netpage);
> @@ -643,9 +631,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>
> fscache_mark_page_cached(op, netpage);
>
> - page_cache_get(netpage);
> - if (!pagevec_add(&lru_pvec, netpage))
> - __pagevec_lru_add_file(&lru_pvec);
> + lru_cache_add_file(netpage);
>
> /* the netpage is unlocked and marked up to date here */
> fscache_end_io(op, netpage, 0);
> @@ -661,8 +647,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>
> out:
> /* tidy up */
> - pagevec_lru_add_file(&lru_pvec);
> -
> if (newpage)
> page_cache_release(newpage);
> if (netpage)
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index f23f455..787d89c 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -33,6 +33,7 @@
> #include <linux/pagevec.h>
> #include <linux/namei.h>
> #include <linux/mount.h>
> +#include <linux/swap.h>
> #include <linux/sched.h>
> #include <linux/kmemleak.h>
> #include <linux/xattr.h>
> @@ -1757,7 +1758,6 @@ EXPORT_SYMBOL_GPL(nfs_unlink);
> */
> int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
> {
> - struct pagevec lru_pvec;
> struct page *page;
> char *kaddr;
> struct iattr attr;
> @@ -1797,11 +1797,8 @@ int nfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
> * No big deal if we can't add this page to the page cache here.
> * READLINK will get the missing page from the server if needed.
> */
> - pagevec_init(&lru_pvec, 0);
> - if (!add_to_page_cache(page, dentry->d_inode->i_mapping, 0,
> + if (!add_to_page_cache_lru(page, dentry->d_inode->i_mapping, 0,
> GFP_KERNEL)) {
> - pagevec_add(&lru_pvec, page);
> - pagevec_lru_add_file(&lru_pvec);
> SetPageUptodate(page);
> unlock_page(page);
> } else
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 2aa12b8..e4dbfab3 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -21,7 +21,7 @@ struct pagevec {
> };
>
> void __pagevec_release(struct pagevec *pvec);
> -void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru);
> +void __pagevec_lru_add(struct pagevec *pvec);
> unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
> pgoff_t start, unsigned nr_pages);
> unsigned pagevec_lookup_tag(struct pagevec *pvec,
> @@ -64,36 +64,4 @@ static inline void pagevec_release(struct pagevec *pvec)
> __pagevec_release(pvec);
> }
>
> -static inline void __pagevec_lru_add_anon(struct pagevec *pvec)
> -{
> - __pagevec_lru_add(pvec, LRU_INACTIVE_ANON);
> -}
> -
> -static inline void __pagevec_lru_add_active_anon(struct pagevec *pvec)
> -{
> - __pagevec_lru_add(pvec, LRU_ACTIVE_ANON);
> -}
> -
> -static inline void __pagevec_lru_add_file(struct pagevec *pvec)
> -{
> - __pagevec_lru_add(pvec, LRU_INACTIVE_FILE);
> -}
> -
> -static inline void __pagevec_lru_add_active_file(struct pagevec *pvec)
> -{
> - __pagevec_lru_add(pvec, LRU_ACTIVE_FILE);
> -}
> -
> -static inline void pagevec_lru_add_file(struct pagevec *pvec)
> -{
> - if (pagevec_count(pvec))
> - __pagevec_lru_add_file(pvec);
> -}
> -
> -static inline void pagevec_lru_add_anon(struct pagevec *pvec)
> -{
> - if (pagevec_count(pvec))
> - __pagevec_lru_add_anon(pvec);
> -}
> -
> #endif /* _LINUX_PAGEVEC_H */
> diff --git a/mm/swap.c b/mm/swap.c
> index 2a10d08..83eff2f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -474,7 +474,7 @@ void __lru_cache_add(struct page *page, enum lru_list lru)
>
> page_cache_get(page);
> if (!pagevec_space(pvec))
> - __pagevec_lru_add(pvec, lru);
> + __pagevec_lru_add(pvec);
> pagevec_add(pvec, page);
> put_cpu_var(lru_add_pvec);
> }
> @@ -594,7 +594,7 @@ void lru_add_drain_cpu(int cpu)
> struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
>
> if (pagevec_count(pvec))
> - __pagevec_lru_add(pvec, NR_LRU_LISTS);
> + __pagevec_lru_add(pvec);
>
> pvec = &per_cpu(lru_rotate_pvecs, cpu);
> if (pagevec_count(pvec)) {
> @@ -793,12 +793,10 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
> static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
> void *arg)
> {
> - enum lru_list requested_lru = (enum lru_list)arg;
> int file = page_is_file_cache(page);
> int active = PageActive(page);
> enum lru_list lru = page_lru(page);
>
> - WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
> VM_BUG_ON(PageUnevictable(page));
> VM_BUG_ON(PageLRU(page));
>
> @@ -811,11 +809,9 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
> * Add the passed pages to the LRU, then drop the caller's refcount
> * on them. Reinitialises the caller's pagevec.
> */
> -void __pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
> +void __pagevec_lru_add(struct pagevec *pvec)
> {
> - VM_BUG_ON(is_unevictable_lru(lru));
> -
> - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, (void *)lru);
> + pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
> }
> EXPORT_SYMBOL(__pagevec_lru_add);
>
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-05-03 08:37:49

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: pagevec: Defer deciding what LRU to add a page to until pagevec drain time

On Fri, May 03, 2013 at 09:51:58AM +0200, Jan Kara wrote:
> > @@ -789,17 +787,16 @@ void lru_add_page_tail(struct page *page, struct page *page_tail,
> > static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
> > void *arg)
> > {
> > - enum lru_list lru = (enum lru_list)arg;
> > - int file = is_file_lru(lru);
> > - int active = is_active_lru(lru);
> > + enum lru_list requested_lru = (enum lru_list)arg;
> > + int file = page_is_file_cache(page);
> > + int active = PageActive(page);
> > + enum lru_list lru = page_lru(page);
> >
> > - VM_BUG_ON(PageActive(page));
> > + WARN_ON_ONCE(requested_lru < NR_LRU_LISTS && requested_lru != lru);
> Hum, so __lru_cache_add() calls this with 'requested_lru' set to whatever
> LRU we currently want to add a page. How should this always be equal to the
> LRU of all the pages we have cached in the pagevec?
>

It wouldn't necessarily be and and for a pagevec drain, it's ignored
completely.

> And if I'm right, there doesn't seem to be a reason to pass requested_lru
> to this function at all, does it?
>

You've already noticed that it gets thrown away later in the third
patch. It was left in this patch as a debugging aid in case there was a
direct pagevec user that expected to place pages on an LRU that was at
odds with the page flags.

--
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>