2011-06-24 13:49:37

by Andrea Righi

[permalink] [raw]
Subject: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

There were some reported problems in the past about trashing page cache
when a backup software (i.e., rsync) touches a huge amount of pages (see
for example [1]).

This problem has been almost fixed by the Minchan Kim's patch [2] and a
proper use of fadvise() in the backup software. For example this patch
set [3] has been proposed for inclusion in rsync.

However, there can be still other similar trashing problems: when the
backup software reads all the source files, some of them may be part of
the actual working set of the system. When a POSIX_FADV_DONTNEED is
performed _all_ pages are evicted from pagecache, both the working set
and the use-once pages touched only by the backup software.

A previous proposal [4] tried to resolve this problem being less
agressive in invalidating active pages, moving them to the inactive list
intead of just evict them from the page cache.

However, this approach changed completely the old behavior of
invalidate_mapping_pages(), that is not only used by fadvise.

The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
invalidation policy.

With POSIX_FADV_NOREUSE active pages are moved to the tail of the
inactive list, and pages in the inactive list are just removed from page
cache. Pages mapped by other processes or unevictable pages are not
touched at all.

In this way if the backup was the only user of a page, that page will be
immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
If the page was also touched by other tasks it'll be moved to the
inactive list, having another chance of being re-added to the working
set, or simply reclaimed when memory is needed.

In conclusion, now userspace applications that want to drop some page
cache pages can choose between the following advices:

POSIX_FADV_DONTNEED = drop page cache if possible
POSIX_FADV_NOREUSE = reduce page cache eligibility

Testcase:

- create a 1GB file called "zero"
- run md5sum zero to read all the pages in page cache (this is to
simulate the user activity on this file)
- run rsync zero zero_copy
- re-run md5sum zero (user activity on the working set) and measure
the time to complete this command

rsync has been patched with [3], using POSIX_FADV_DONTNEED in one case and
POSIX_FADV_NOREUSE in the other case.

Results:

- after the backup run:
# perf stat -e block:block_bio_queue md5sum zero

avg elapsed time block:block_bio_queue
rsync-dontneed 4.24s 2,072
rsync-noreuse 2.22s 0

[1] http://marc.info/?l=rsync&m=128885034930933&w=2
[2] https://lkml.org/lkml/2011/2/20/57
[3] http://lists.samba.org/archive/rsync/2010-November/025827.html
[4] https://lkml.org/lkml/2011/6/23/35

ChangeLog v2 -> v3:
- do not change the old POSIX_FADV_DONTNEED behavior and implement the less
aggressive page cache invalidation policy using POSIX_FADV_NOREUSE
- fix comments in the code

[PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
[PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE

include/linux/fs.h | 7 +++++--
mm/fadvise.c | 11 ++++++-----
mm/swap.c | 2 +-
mm/truncate.c | 40 ++++++++++++++++++++++++++++++----------
4 files changed, 42 insertions(+), 18 deletions(-)


2011-06-24 13:49:46

by Andrea Righi

[permalink] [raw]
Subject: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()

This new function accepts an additional parameter respect to the old
invalidate_mapping_pages() that allows to specify when we want to apply
an aggressive policy to drop file cache pages or when we just want to
reduce cache eligibility.

The new prototype is the following:

unsigned long __invalidate_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t end, bool force)

When force is true pages are always dropped if possible. When force is
false inactive pages are dropped and active pages are moved to the
inactive list.

This can be used to apply different levels of page cache invalidation
(e.g, by fadvise).

The old invalidate_mapping_pages() behavior can be mapped to
__invalidate_mapping_pages(..., true) using a C-preprocessor macro.

Signed-off-by: Andrea Righi <[email protected]>
---
include/linux/fs.h | 7 +++++--
mm/swap.c | 2 +-
mm/truncate.c | 40 ++++++++++++++++++++++++++++++----------
3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e73e2e..33beefe 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2149,8 +2149,11 @@ extern int check_disk_change(struct block_device *);
extern int __invalidate_device(struct block_device *, bool);
extern int invalidate_partition(struct gendisk *, int);
#endif
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
- pgoff_t start, pgoff_t end);
+
+#define invalidate_mapping_pages(__mapping, __start, __end) \
+ __invalidate_mapping_pages(__mapping, __start, __end, true)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+ pgoff_t start, pgoff_t end, bool force);

static inline void invalidate_remote_inode(struct inode *inode)
{
diff --git a/mm/swap.c b/mm/swap.c
index 3a442f1..a8fe6ac 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -413,7 +413,7 @@ void add_page_to_unevictable_list(struct page *page)
* 2. active, dirty/writeback page -> inactive, head, PG_reclaim
* 3. inactive, mapped page -> none
* 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
- * 5. inactive, clean -> inactive, tail
+ * 5. [in]active, clean -> inactive, tail
* 6. Others -> none
*
* In 4, why it moves inactive's head, the VM expects the page would
diff --git a/mm/truncate.c b/mm/truncate.c
index 3a29a61..90f3a97 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
EXPORT_SYMBOL(truncate_inode_pages);

/**
- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
+ * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
* @mapping: the address_space which holds the pages to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
+ * @force: always drop pages when true (otherwise, reduce cache eligibility)
*
* This function only removes the unlocked pages, if you want to
* remove all the pages of one inode, you must call truncate_inode_pages.
*
- * invalidate_mapping_pages() will not block on IO activity. It will not
- * invalidate pages which are dirty, locked, under writeback or mapped into
- * pagetables.
+ * The @force parameter can be used to apply a more aggressive policy (when
+ * true) that will always drop pages from page cache when possible, or to just
+ * reduce cache eligibility (when false). In the last case active pages will be
+ * moved to the tail of the inactive list by deactivate_page(); inactive pages
+ * will be dropped in both cases.
+ *
+ * __invalidate_mapping_pages() will not block on IO activity. It will not
+ * invalidate pages which are dirty, locked, under writeback, mapped into
+ * pagetables, or on active lru when @force is false.
*/
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
- pgoff_t start, pgoff_t end)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+ pgoff_t start, pgoff_t end, bool force)
{
struct pagevec pvec;
pgoff_t next = start;
@@ -356,11 +363,24 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
next++;
if (lock_failed)
continue;
-
- ret = invalidate_inode_page(page);
+ /*
+ * Invalidation of active page is rather aggressive as
+ * we can't make sure it's not a working set of other
+ * processes.
+ *
+ * When force is false, deactivate_page() would move
+ * active page into inactive's tail so the page will
+ * have a chance to activate again if other processes
+ * touch it.
+ */
+ if (!force && PageActive(page))
+ ret = 0;
+ else
+ ret = invalidate_inode_page(page);
unlock_page(page);
/*
- * Invalidation is a hint that the page is no longer
+ * Invalidation of an inactive page (or any page when
+ * force is true) is a hint that the page is no longer
* of interest and try to speed up its reclaim.
*/
if (!ret)
@@ -375,7 +395,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
}
return count;
}
-EXPORT_SYMBOL(invalidate_mapping_pages);
+EXPORT_SYMBOL(__invalidate_mapping_pages);

/*
* This is like invalidate_complete_page(), except it ignores the page's
--
1.7.4.1

2011-06-24 13:50:03

by Andrea Righi

[permalink] [raw]
Subject: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE

There were some reported problems in the past about trashing page cache
when a backup software (i.e., rsync) touches a huge amount of pages (see
for example [1]).

This problem has been almost fixed by the Minchan Kim's patch [2] and a
proper use of fadvise() in the backup software. For example this patch
set [3] has been proposed for inclusion in rsync.

However, there can be still other similar trashing problems: when the
backup software reads all the source files, some of them may be part of
the actual working set of the system. When a POSIX_FADV_DONTNEED is
performed _all_ pages are evicted from pagecache, both the working set
and the use-once pages touched only by the backup software.

A previous proposal [4] tried to resolve this problem being less
agressive in invalidating active pages, moving them to the inactive list
intead of just evict them from the page cache.

However, this approach changed completely the old behavior of
invalidate_mapping_pages(), that is not only used by fadvise.

The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
invalidation policy.

With POSIX_FADV_NOREUSE active pages are moved to the tail of the
inactive list, and pages in the inactive list are just removed from page
cache. Pages mapped by other processes or unevictable pages are not
touched at all.

In this way if the backup was the only user of a page, that page will be
immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
If the page was also touched by other tasks it'll be moved to the
inactive list, having another chance of being re-added to the working
set, or simply reclaimed when memory is needed.

In conclusion, now userspace applications that want to drop some page
cache pages can choose between the following advices:

POSIX_FADV_DONTNEED = drop page cache if possible
POSIX_FADV_NOREUSE = reduce page cache eligibility

[1] http://marc.info/?l=rsync&m=128885034930933&w=2
[2] https://lkml.org/lkml/2011/2/20/57
[3] http://lists.samba.org/archive/rsync/2010-November/025827.html
[4] https://lkml.org/lkml/2011/6/23/35

Signed-off-by: Andrea Righi <[email protected]>
---
mm/fadvise.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 8d723c9..bcc79ac 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -33,7 +33,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
pgoff_t start_index;
pgoff_t end_index;
unsigned long nrpages;
- int ret = 0;
+ int ret = 0, force = true;

if (!file)
return -EBADF;
@@ -106,7 +106,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
nrpages = end_index - start_index + 1;
if (!nrpages)
nrpages = ~0UL;
-
+
ret = force_page_cache_readahead(mapping, file,
start_index,
nrpages);
@@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
ret = 0;
break;
case POSIX_FADV_NOREUSE:
- break;
+ /* Reduce cache eligibility */
+ force = false;
case POSIX_FADV_DONTNEED:
if (!bdi_write_congested(mapping->backing_dev_info))
filemap_flush(mapping);
@@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
end_index = (endbyte >> PAGE_CACHE_SHIFT);

if (end_index >= start_index)
- invalidate_mapping_pages(mapping, start_index,
- end_index);
+ __invalidate_mapping_pages(mapping, start_index,
+ end_index, force);
break;
default:
ret = -EINVAL;
--
1.7.4.1

2011-06-26 22:49:15

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()

On 06/24/2011 09:49 AM, Andrea Righi wrote:

> diff --git a/mm/truncate.c b/mm/truncate.c
> index 3a29a61..90f3a97 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
> EXPORT_SYMBOL(truncate_inode_pages);
>
> /**
> - * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> + * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> * @mapping: the address_space which holds the pages to invalidate
> * @start: the offset 'from' which to invalidate
> * @end: the offset 'to' which to invalidate (inclusive)
> + * @force: always drop pages when true (otherwise, reduce cache eligibility)

I don't like the parameter name "force".

The parameter determines whether or not pages actually get
invalidated, so I'm guessing the parameter name should
reflect the function...

Maybe something like "invalidate"?

--
All rights reversed

2011-06-26 22:49:54

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE

On 06/24/2011 09:49 AM, Andrea Righi wrote:

> @@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> ret = 0;
> break;
> case POSIX_FADV_NOREUSE:
> - break;
> + /* Reduce cache eligibility */
> + force = false;
> case POSIX_FADV_DONTNEED:
> if (!bdi_write_congested(mapping->backing_dev_info))
> filemap_flush(mapping);

And the same is true here. "force" is just not a very
descriptive name.

> @@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> end_index = (endbyte>> PAGE_CACHE_SHIFT);
>
> if (end_index>= start_index)
> - invalidate_mapping_pages(mapping, start_index,
> - end_index);
> + __invalidate_mapping_pages(mapping, start_index,
> + end_index, force);
> break;
> default:
> ret = -EINVAL;


--
All rights reversed

2011-06-27 03:05:53

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

(2011/06/24 22:49), Andrea Righi wrote:
> There were some reported problems in the past about trashing page cache
> when a backup software (i.e., rsync) touches a huge amount of pages (see
> for example [1]).
>
> This problem has been almost fixed by the Minchan Kim's patch [2] and a
> proper use of fadvise() in the backup software. For example this patch
> set [3] has been proposed for inclusion in rsync.
>
> However, there can be still other similar trashing problems: when the
> backup software reads all the source files, some of them may be part of
> the actual working set of the system. When a POSIX_FADV_DONTNEED is
> performed _all_ pages are evicted from pagecache, both the working set
> and the use-once pages touched only by the backup software.
>
> A previous proposal [4] tried to resolve this problem being less
> agressive in invalidating active pages, moving them to the inactive list
> intead of just evict them from the page cache.
>
> However, this approach changed completely the old behavior of
> invalidate_mapping_pages(), that is not only used by fadvise.
>
> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> invalidation policy.
>
> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> inactive list, and pages in the inactive list are just removed from page
> cache. Pages mapped by other processes or unevictable pages are not
> touched at all.
>
> In this way if the backup was the only user of a page, that page will be
> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> If the page was also touched by other tasks it'll be moved to the
> inactive list, having another chance of being re-added to the working
> set, or simply reclaimed when memory is needed.
>
> In conclusion, now userspace applications that want to drop some page
> cache pages can choose between the following advices:
>
> POSIX_FADV_DONTNEED = drop page cache if possible
> POSIX_FADV_NOREUSE = reduce page cache eligibility

Eeek.

Your POSIX_FADV_NOREUSE is very different from POSIX definition.
POSIX says,

POSIX_FADV_NOREUSE
Specifies that the application expects to access the specified data once and then
not reuse it thereafter.

IfI understand correctly, it designed for calling _before_ data access
and to be expected may prevent lru activation. But your NORESE is designed
for calling _after_ data access. Big difference might makes a chance of
portability issue.




2011-06-27 07:07:45

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()

On Sun, Jun 26, 2011 at 06:46:46PM -0400, Rik van Riel wrote:
> On 06/24/2011 09:49 AM, Andrea Righi wrote:
>
> >diff --git a/mm/truncate.c b/mm/truncate.c
> >index 3a29a61..90f3a97 100644
> >--- a/mm/truncate.c
> >+++ b/mm/truncate.c
> >@@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
> > EXPORT_SYMBOL(truncate_inode_pages);
> >
> > /**
> >- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> >+ * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> > * @mapping: the address_space which holds the pages to invalidate
> > * @start: the offset 'from' which to invalidate
> > * @end: the offset 'to' which to invalidate (inclusive)
> >+ * @force: always drop pages when true (otherwise, reduce cache eligibility)
>
> I don't like the parameter name "force".

Agreed.

>
> The parameter determines whether or not pages actually get
> invalidated, so I'm guessing the parameter name should
> reflect the function...
>
> Maybe something like "invalidate"?

Sounds better.

-Andrea

2011-06-27 07:07:51

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE

On Sun, Jun 26, 2011 at 06:47:37PM -0400, Rik van Riel wrote:
> On 06/24/2011 09:49 AM, Andrea Righi wrote:
>
> >@@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> > ret = 0;
> > break;
> > case POSIX_FADV_NOREUSE:
> >- break;
> >+ /* Reduce cache eligibility */
> >+ force = false;
> > case POSIX_FADV_DONTNEED:
> > if (!bdi_write_congested(mapping->backing_dev_info))
> > filemap_flush(mapping);
>
> And the same is true here. "force" is just not a very
> descriptive name.

OK, I'll change the name to "invalidate" in the next version of the
patch.

Thanks,
-Andrea

>
> >@@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> > end_index = (endbyte>> PAGE_CACHE_SHIFT);
> >
> > if (end_index>= start_index)
> >- invalidate_mapping_pages(mapping, start_index,
> >- end_index);
> >+ __invalidate_mapping_pages(mapping, start_index,
> >+ end_index, force);
> > break;
> > default:
> > ret = -EINVAL;
>
>
> --
> All rights reversed

2011-06-27 07:13:05

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
> (2011/06/24 22:49), Andrea Righi wrote:
> > There were some reported problems in the past about trashing page cache
> > when a backup software (i.e., rsync) touches a huge amount of pages (see
> > for example [1]).
> >
> > This problem has been almost fixed by the Minchan Kim's patch [2] and a
> > proper use of fadvise() in the backup software. For example this patch
> > set [3] has been proposed for inclusion in rsync.
> >
> > However, there can be still other similar trashing problems: when the
> > backup software reads all the source files, some of them may be part of
> > the actual working set of the system. When a POSIX_FADV_DONTNEED is
> > performed _all_ pages are evicted from pagecache, both the working set
> > and the use-once pages touched only by the backup software.
> >
> > A previous proposal [4] tried to resolve this problem being less
> > agressive in invalidating active pages, moving them to the inactive list
> > intead of just evict them from the page cache.
> >
> > However, this approach changed completely the old behavior of
> > invalidate_mapping_pages(), that is not only used by fadvise.
> >
> > The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> > invalidation policy.
> >
> > With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> > inactive list, and pages in the inactive list are just removed from page
> > cache. Pages mapped by other processes or unevictable pages are not
> > touched at all.
> >
> > In this way if the backup was the only user of a page, that page will be
> > immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> > If the page was also touched by other tasks it'll be moved to the
> > inactive list, having another chance of being re-added to the working
> > set, or simply reclaimed when memory is needed.
> >
> > In conclusion, now userspace applications that want to drop some page
> > cache pages can choose between the following advices:
> >
> > POSIX_FADV_DONTNEED = drop page cache if possible
> > POSIX_FADV_NOREUSE = reduce page cache eligibility
>
> Eeek.
>
> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
> POSIX says,
>
> POSIX_FADV_NOREUSE
> Specifies that the application expects to access the specified data once and then
> not reuse it thereafter.
>
> IfI understand correctly, it designed for calling _before_ data access
> and to be expected may prevent lru activation. But your NORESE is designed
> for calling _after_ data access. Big difference might makes a chance of
> portability issue.

You're right. NOREUSE is designed to implement drop behind policy.

I'll post a new patch that will plug this logic in DONTNEED (like the
presious version), but without breaking the old /proc/sys/vm/drop_caches
behavior.

Thanks,
-Andrea

2011-06-27 07:45:23

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

>>> POSIX_FADV_DONTNEED = drop page cache if possible
>>> POSIX_FADV_NOREUSE = reduce page cache eligibility
>>
>> Eeek.
>>
>> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
>> POSIX says,
>>
>> POSIX_FADV_NOREUSE
>> Specifies that the application expects to access the specified data once and then
>> not reuse it thereafter.
>>
>> IfI understand correctly, it designed for calling _before_ data access
>> and to be expected may prevent lru activation. But your NORESE is designed
>> for calling _after_ data access. Big difference might makes a chance of
>> portability issue.
>
> You're right. NOREUSE is designed to implement drop behind policy.
>
> I'll post a new patch that will plug this logic in DONTNEED (like the
> presious version), but without breaking the old /proc/sys/vm/drop_caches
> behavior.

Great!

thanks.


2011-06-27 10:19:07

by Pádraig Brady

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

On 27/06/11 08:11, Andrea Righi wrote:
> On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
>> (2011/06/24 22:49), Andrea Righi wrote:
>>> There were some reported problems in the past about trashing page cache
>>> when a backup software (i.e., rsync) touches a huge amount of pages (see
>>> for example [1]).
>>>
>>> This problem has been almost fixed by the Minchan Kim's patch [2] and a
>>> proper use of fadvise() in the backup software. For example this patch
>>> set [3] has been proposed for inclusion in rsync.
>>>
>>> However, there can be still other similar trashing problems: when the
>>> backup software reads all the source files, some of them may be part of
>>> the actual working set of the system. When a POSIX_FADV_DONTNEED is
>>> performed _all_ pages are evicted from pagecache, both the working set
>>> and the use-once pages touched only by the backup software.
>>>
>>> A previous proposal [4] tried to resolve this problem being less
>>> agressive in invalidating active pages, moving them to the inactive list
>>> intead of just evict them from the page cache.
>>>
>>> However, this approach changed completely the old behavior of
>>> invalidate_mapping_pages(), that is not only used by fadvise.
>>>
>>> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
>>> invalidation policy.
>>>
>>> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
>>> inactive list, and pages in the inactive list are just removed from page
>>> cache. Pages mapped by other processes or unevictable pages are not
>>> touched at all.
>>>
>>> In this way if the backup was the only user of a page, that page will be
>>> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
>>> If the page was also touched by other tasks it'll be moved to the
>>> inactive list, having another chance of being re-added to the working
>>> set, or simply reclaimed when memory is needed.
>>>
>>> In conclusion, now userspace applications that want to drop some page
>>> cache pages can choose between the following advices:
>>>
>>> POSIX_FADV_DONTNEED = drop page cache if possible
>>> POSIX_FADV_NOREUSE = reduce page cache eligibility
>>
>> Eeek.
>>
>> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
>> POSIX says,
>>
>> POSIX_FADV_NOREUSE
>> Specifies that the application expects to access the specified data once and then
>> not reuse it thereafter.
>>
>> IfI understand correctly, it designed for calling _before_ data access
>> and to be expected may prevent lru activation. But your NORESE is designed
>> for calling _after_ data access. Big difference might makes a chance of
>> portability issue.
>
> You're right. NOREUSE is designed to implement drop behind policy.

Hmm fair enough.
NOREUSE is meant for specifying you _will_ need the data _once_

Isn't this what rsync actually wants though?
I.E. to specify NOREUSE for the file up front
so it would drop from cache automatically as processed,
(if not already in cache).

I realize that would be a more invasive patch.

> I'll post a new patch that will plug this logic in DONTNEED (like the
> presious version), but without breaking the old /proc/sys/vm/drop_caches
> behavior.

But will that break existing apps (running as root) that expect DONTNEED
to drop cache for a _file_. Perhaps posix_fadvise() is meant to have
process rather than system scope, but that has not been the case until now.

cheers,
P?draig.

2011-06-27 10:31:01

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

On Mon, Jun 27, 2011 at 11:17:51AM +0100, P?draig Brady wrote:
> On 27/06/11 08:11, Andrea Righi wrote:
> > On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
> >> (2011/06/24 22:49), Andrea Righi wrote:
> >>> There were some reported problems in the past about trashing page cache
> >>> when a backup software (i.e., rsync) touches a huge amount of pages (see
> >>> for example [1]).
> >>>
> >>> This problem has been almost fixed by the Minchan Kim's patch [2] and a
> >>> proper use of fadvise() in the backup software. For example this patch
> >>> set [3] has been proposed for inclusion in rsync.
> >>>
> >>> However, there can be still other similar trashing problems: when the
> >>> backup software reads all the source files, some of them may be part of
> >>> the actual working set of the system. When a POSIX_FADV_DONTNEED is
> >>> performed _all_ pages are evicted from pagecache, both the working set
> >>> and the use-once pages touched only by the backup software.
> >>>
> >>> A previous proposal [4] tried to resolve this problem being less
> >>> agressive in invalidating active pages, moving them to the inactive list
> >>> intead of just evict them from the page cache.
> >>>
> >>> However, this approach changed completely the old behavior of
> >>> invalidate_mapping_pages(), that is not only used by fadvise.
> >>>
> >>> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> >>> invalidation policy.
> >>>
> >>> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> >>> inactive list, and pages in the inactive list are just removed from page
> >>> cache. Pages mapped by other processes or unevictable pages are not
> >>> touched at all.
> >>>
> >>> In this way if the backup was the only user of a page, that page will be
> >>> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> >>> If the page was also touched by other tasks it'll be moved to the
> >>> inactive list, having another chance of being re-added to the working
> >>> set, or simply reclaimed when memory is needed.
> >>>
> >>> In conclusion, now userspace applications that want to drop some page
> >>> cache pages can choose between the following advices:
> >>>
> >>> POSIX_FADV_DONTNEED = drop page cache if possible
> >>> POSIX_FADV_NOREUSE = reduce page cache eligibility
> >>
> >> Eeek.
> >>
> >> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
> >> POSIX says,
> >>
> >> POSIX_FADV_NOREUSE
> >> Specifies that the application expects to access the specified data once and then
> >> not reuse it thereafter.
> >>
> >> IfI understand correctly, it designed for calling _before_ data access
> >> and to be expected may prevent lru activation. But your NORESE is designed
> >> for calling _after_ data access. Big difference might makes a chance of
> >> portability issue.
> >
> > You're right. NOREUSE is designed to implement drop behind policy.
>
> Hmm fair enough.
> NOREUSE is meant for specifying you _will_ need the data _once_
>
> Isn't this what rsync actually wants though?
> I.E. to specify NOREUSE for the file up front
> so it would drop from cache automatically as processed,
> (if not already in cache).
>
> I realize that would be a more invasive patch.
>
> > I'll post a new patch that will plug this logic in DONTNEED (like the
> > presious version), but without breaking the old /proc/sys/vm/drop_caches
> > behavior.
>
> But will that break existing apps (running as root) that expect DONTNEED
> to drop cache for a _file_. Perhaps posix_fadvise() is meant to have
> process rather than system scope, but that has not been the case until now.

The actual problem I think is that apps expect that DONTNEED can be used
to drop cache, but this is not written anywhere in the POSIX standard.

I would also like to have both functionalities: 1) be sure to drop page
cache pages (now there's only a system-wide knob to do this:
/proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
not reuse some pages in the future.

The standard can only provide 2). If we also want 1) at the file
granularity, I think we'd need to introduce something linux specific to
avoid having portability problems.

-Andrea

2011-06-27 11:55:29

by Pádraig Brady

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

On 27/06/11 11:29, Andrea Righi wrote:
> The actual problem I think is that apps expect that DONTNEED can be used
> to drop cache, but this is not written anywhere in the POSIX standard.
>
> I would also like to have both functionalities: 1) be sure to drop page
> cache pages (now there's only a system-wide knob to do this:
> /proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
> not reuse some pages in the future.
>
> The standard can only provide 2). If we also want 1) at the file
> granularity, I think we'd need to introduce something linux specific to
> avoid having portability problems.

True, though Linux is the reference for posix_fadvise() implementations,
given its lack of support on other platforms.

So just to summarize for _my_ reference.
You're changing DONTNEED to mean "drop if !PageActive()".
I.E. according to http://linux-mm.org/PageReplacementDesign
"drop if files only accessed once".

This will mean that there is no way currently to
remove a particular file from the cache on linux.
Hopefully that won't affect any of:
http://codesearch.google.com/#search/&q=POSIX_FADV_DONTNEED

Ideally I'd like cache functions for:
DROP, ADD, ADD if space?
which could correspond to:
DONTNEED, WILLNEED, NOREUSE
but what we're going for are these somewhat overlapping functions:
DROP if used once?, ADD, ADD if space

cheers,
P?draig.

? Not implemented yet.

? Hopefully there are no access patterns a single
process can do to make a PageActive as that would
probably not be desired in relation to "Drop if used once"
functionality.

2011-06-27 12:41:09

by Andrea Righi

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE

On Mon, Jun 27, 2011 at 12:53:53PM +0100, P?draig Brady wrote:
> On 27/06/11 11:29, Andrea Righi wrote:
> > The actual problem I think is that apps expect that DONTNEED can be used
> > to drop cache, but this is not written anywhere in the POSIX standard.
> >
> > I would also like to have both functionalities: 1) be sure to drop page
> > cache pages (now there's only a system-wide knob to do this:
> > /proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
> > not reuse some pages in the future.
> >
> > The standard can only provide 2). If we also want 1) at the file
> > granularity, I think we'd need to introduce something linux specific to
> > avoid having portability problems.
>
> True, though Linux is the reference for posix_fadvise() implementations,
> given its lack of support on other platforms.
>
> So just to summarize for _my_ reference.
> You're changing DONTNEED to mean "drop if !PageActive()".
> I.E. according to http://linux-mm.org/PageReplacementDesign
> "drop if files only accessed once".

Drop if pages were only accessed once, they're not mapped by any other
process and they're not unevictable.

>
> This will mean that there is no way currently to
> remove a particular file from the cache on linux.

Correct. There's not a way to do this for a single file (except running
POSIX_FADV_DONTNEED twice...).

> Hopefully that won't affect any of:
> http://codesearch.google.com/#search/&q=POSIX_FADV_DONTNEED
>
> Ideally I'd like cache functions for:
> DROP, ADD, ADD if space?
> which could correspond to:
> DONTNEED, WILLNEED, NOREUSE
> but what we're going for are these somewhat overlapping functions:
> DROP if used once?, ADD, ADD if space

IIUC, NOREUSE means "the application will use this range of the file
once". It's something that we do _before_ accessing the file. And the
kernel needs to remember the ranges of NOREUSE data for each file, so
that page cache can be immediately dropped after the data has been
accessed (if possible).

-Andrea

>
> cheers,
> P?draig.
>
> ? Not implemented yet.
>
> ? Hopefully there are no access patterns a single
> process can do to make a PageActive as that would
> probably not be desired in relation to "Drop if used once"
> functionality.