Hi, Dan.
I reviewed quickly. So I may be wrong. :)
On Fri, May 28, 2010 at 10:36:10AM -0700, Dan Magenheimer wrote:
> [PATCH V2 3/7] Cleancache (was Transcendent Memory): VFS hooks
>
> Implement core hooks in VFS for: initializing cleancache
> per filesystem; capturing clean pages evicted by page cache;
> attempting to get pages from cleancache before filesystem
> read; and ensuring coherency between pagecache, disk,
> and cleancache. All hooks become no-ops if CONFIG_CLEANCACHE
> is unset, or become compare-pointer-to-NULL if
> CONFIG_CLEANCACHE is set but no cleancache "backend" has
> claimed cleancache_ops.
>
> Signed-off-by: Chris Mason <[email protected]>
> Signed-off-by: Dan Magenheimer <[email protected]>
>
> Diffstat:
> fs/buffer.c | 5 +++++
> fs/mpage.c | 7 +++++++
> fs/super.c | 8 ++++++++
> mm/filemap.c | 11 +++++++++++
> mm/truncate.c | 10 ++++++++++
> 5 files changed, 41 insertions(+)
>
> --- linux-2.6.34/fs/super.c 2010-05-16 15:17:36.000000000 -0600
> +++ linux-2.6.34-cleancache/fs/super.c 2010-05-24 12:15:20.000000000 -0600
> @@ -38,6 +38,7 @@
> #include <linux/mutex.h>
> #include <linux/file.h>
> #include <linux/backing-dev.h>
> +#include <linux/cleancache.h>
> #include <asm/uaccess.h>
> #include "internal.h"
>
> @@ -105,6 +106,7 @@ static struct super_block *alloc_super(s
> s->s_qcop = sb_quotactl_ops;
> s->s_op = &default_op;
> s->s_time_gran = 1000000000;
> + s->cleancache_poolid = -1;
> }
> out:
> return s;
> @@ -195,6 +197,11 @@ void deactivate_super(struct super_block
> vfs_dq_off(s, 0);
> down_write(&s->s_umount);
> fs->kill_sb(s);
> + if (s->cleancache_poolid > 0) {
> + int cleancache_poolid = s->cleancache_poolid;
> + s->cleancache_poolid = -1; /* avoid races */
> + cleancache_flush_fs(cleancache_poolid);
> + }
> put_filesystem(fs);
> put_super(s);
> }
> @@ -221,6 +228,7 @@ void deactivate_locked_super(struct supe
> spin_unlock(&sb_lock);
> vfs_dq_off(s, 0);
> fs->kill_sb(s);
> + cleancache_flush_fs(s->cleancache_poolid);
> put_filesystem(fs);
> put_super(s);
> } else {
> --- linux-2.6.34/fs/buffer.c 2010-05-16 15:17:36.000000000 -0600
> +++ linux-2.6.34-cleancache/fs/buffer.c 2010-05-24 12:14:44.000000000 -0600
> @@ -41,6 +41,7 @@
> #include <linux/bitops.h>
> #include <linux/mpage.h>
> #include <linux/bit_spinlock.h>
> +#include <linux/cleancache.h>
>
> static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
>
> @@ -276,6 +277,10 @@ void invalidate_bdev(struct block_device
>
> invalidate_bh_lrus();
> invalidate_mapping_pages(mapping, 0, -1);
> + /* 99% of the time, we don't need to flush the cleancache on the bdev.
> + * But, for the strange corners, lets be cautious
> + */
> + cleancache_flush_inode(mapping);
> }
> EXPORT_SYMBOL(invalidate_bdev);
>
> --- linux-2.6.34/fs/mpage.c 2010-05-16 15:17:36.000000000 -0600
> +++ linux-2.6.34-cleancache/fs/mpage.c 2010-05-24 12:29:28.000000000 -0600
> @@ -27,6 +27,7 @@
> #include <linux/writeback.h>
> #include <linux/backing-dev.h>
> #include <linux/pagevec.h>
> +#include <linux/cleancache.h>
>
> /*
> * I/O completion handler for multipage BIOs.
> @@ -286,6 +287,12 @@ do_mpage_readpage(struct bio *bio, struc
> SetPageMappedToDisk(page);
> }
>
> + if (fully_mapped && blocks_per_page == 1 && !PageUptodate(page) &&
> + cleancache_get_page(page) == CLEANCACHE_GET_PAGE_SUCCESS) {
> + SetPageUptodate(page);
> + goto confused;
> + }
> +
> /*
> * This page will go to BIO. Do we need to send this BIO off first?
> */
> --- linux-2.6.34/mm/filemap.c 2010-05-16 15:17:36.000000000 -0600
> +++ linux-2.6.34-cleancache/mm/filemap.c 2010-05-24 12:14:44.000000000 -0600
> @@ -34,6 +34,7 @@
> #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
> #include <linux/memcontrol.h>
> #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#include <linux/cleancache.h>
> #include "internal.h"
>
> /*
> @@ -119,6 +120,16 @@ void __remove_from_page_cache(struct pag
> {
> struct address_space *mapping = page->mapping;
>
> + /*
> + * if we're uptodate, flush out into the cleancache, otherwise
> + * invalidate any existing cleancache entries. We can't leave
> + * stale data around in the cleancache once our page is gone
> + */
> + if (PageUptodate(page))
> + cleancache_put_page(page);
> + else
> + cleancache_flush_page(mapping, page);
> +
> radix_tree_delete(&mapping->page_tree, page->index);
I doubt it's right place related to PFRA.
1)
You mentiond PFRA in you description and I understood cleancache has
a cold clean page which is evicted by reclaimer.
But __remove_from_page_cache can be called by other call sites.
For example, shmem_write page calls it for moving the page from page cache
to swap cache. Although there isn't the page in page cache, it is in swap cache.
So next read/write of shmem until swapout happens can be read/write in swap cache.
I didn't looked into whole of callsites. But please review again them.
2)
While I review this, I found GFP_ATOMIC of add_to_swap_cache.
I don't know why it is in there. But at least when you review them,
please consider gfp_flag of the context.
If PFRA is going on, it means there is system memory pressure.
So we can't allocate page easily. So each functions cold be use GFP_NOIO,
GFP_NOFS, GFP_ATOMIC and so one. It means your backend can't call alloc_pages
freely.
I think it's the best if you call your hooks in no limited place.
Maybe you already have done it. :)
3)
Please consider system memory pressure.
Without this, PFRA might reclaim the page but cleancache's backend(non-virtualized)
may consume another page for putting clean page. It could change system behivor
althought it can reduce I/O cost.
I don't know which is important. Let us discuss it in another thread in next chance.
In this thread, my concern is that we need new hook which notify urgent memory
reclaim. (ex, direct reclaim)
I think at least if system momory pressure is high, backend need to free own
cache memory. Of course, backend can monitor it but it's awkward.
I think frontend should it.
And I hope Nitin consider this, too.
--
Kind regards,
Minchan Kim
> Hi, Dan.
> I reviewed quickly. So I may be wrong. :)
Hi Minchan --
Thanks for your thorough review! I don't think anyone
else yet has examined the semantics of the cleancache
patch as deeply as you have. Excellent!
> > + /*
> > + * if we're uptodate, flush out into the cleancache, otherwise
> > + * invalidate any existing cleancache entries. We can't leave
> > + * stale data around in the cleancache once our page is gone
> > + */
> > + if (PageUptodate(page))
> > + cleancache_put_page(page);
> > + else
> > + cleancache_flush_page(mapping, page);
>
> I doubt it's right place related to PFRA.
I agree it doesn't seem to be the right place, but it does work
and there doesn't seem to be a better place.
> 1)
> You mentiond PFRA in you description and I understood cleancache has
> a cold clean page which is evicted by reclaimer.
> But __remove_from_page_cache can be called by other call sites.
>
> For example, shmem_write page calls it for moving the page from page
> cache
> to swap cache. Although there isn't the page in page cache, it is in
> swap cache.
> So next read/write of shmem until swapout happens can be read/write in
> swap cache.
>
> I didn't looked into whole of callsites. But please review again them.
I think the "if (PageUptodate(page))" eliminates all the cases
where bad things can happen.
Note that there may be cases where some unnecessary puts/flushes
occur. The focus of the patch is on correctness first; it may
be possible to increase performance (marginally) in the future by
reducing unnecessary cases.
> 3) Please consider system memory pressure.
> And I hope Nitin consider this, too.
This is definitely very important but remember that cleancache
provides a great deal of flexibility: Any page in cleancache
can be thrown away at any time as every page is clean! It
can even accept a page and throw it away immediately. Clearly
the backend needs to do this intelligently so this will
take some policy work.
Since I saw you sent a separate response to Nitin, I'll
let him answer for his in-kernel page cache compression
work. The solution to the similar problem for Xen is
described in the tmem internals document that I think
I pointed to earlier here:
http://oss.oracle.com/projects/tmem/documentation/internals/
Thanks,
Dan
On Fri, Jun 04, 2010 at 08:13:14AM -0700, Dan Magenheimer wrote:
> > 1)
> > You mentiond PFRA in you description and I understood cleancache has
> > a cold clean page which is evicted by reclaimer.
> > But __remove_from_page_cache can be called by other call sites.
> >
> > For example, shmem_write page calls it for moving the page from page
> > cache
> > to swap cache. Although there isn't the page in page cache, it is in
> > swap cache.
> > So next read/write of shmem until swapout happens can be read/write in
> > swap cache.
> >
> > I didn't looked into whole of callsites. But please review again them.
>
> I think the "if (PageUptodate(page))" eliminates all the cases
> where bad things can happen.
I missed it. my fisrt concern has gone. :)
>
> Note that there may be cases where some unnecessary puts/flushes
> occur. The focus of the patch is on correctness first; it may
> be possible to increase performance (marginally) in the future by
> reducing unnecessary cases.
I think it wouldn't be marginally. It depends on implementation
of backend.
I think frontend would be better to notify to backend in
only exact place. As your descrption, we can call it in shrink_page_list
with some check or change __remove_mapping which adding a argument to tell
"this is calling of reclaim path".
>
> > 3) Please consider system memory pressure.
> > And I hope Nitin consider this, too.
>
> This is definitely very important but remember that cleancache
> provides a great deal of flexibility: Any page in cleancache
> can be thrown away at any time as every page is clean! It
> can even accept a page and throw it away immediately. Clearly
> the backend needs to do this intelligently so this will
> take some policy work.
I admit design goal of cleancache is to give a greate deal of flexibility.
But I think system memory pressure(ie, direct reclaim and even OOM) is
exceptional. Whenever we implement various backend, every backend(non-virtual
environemnt)have to implement policy which deal with system memory
pressure emergency to prevent system hang, I think.
And backend might need some hack to know the situation. It's horrible.
So I hope frontend gives little information to backend, at least.
If some backend don't need it, it can just ignore.
But if some backend need it, it can be a big deal. :)
>
> Since I saw you sent a separate response to Nitin, I'll
> let him answer for his in-kernel page cache compression
> work. The solution to the similar problem for Xen is
> described in the tmem internals document that I think
> I pointed to earlier here:
> http://oss.oracle.com/projects/tmem/documentation/internals/
I will read it when I have a time.
Thanks for quick reply but I can't.
It's time to sleep and weekend.
See you soon and have a nice weekend.
>
> Thanks,
> Dan
>
--
Kind regards,
Minchan Kim