2009-04-03 09:42:16

by David Howells

[permalink] [raw]
Subject: [PATCH 1/4] CacheFiles: Use the ->write() file op rather than a special kernel aop

Make CacheFiles use the ->write() file operation rather than a special kernel
address space operation optimised for writing whole pages at page boundaries.

This reverts the patch:

CacheFiles: Add a hook to write a single page of data to an inode

Add an address space operation to write one single page of data to an
inode at a page-aligned location (thus permitting the implementation to
be highly optimised). The data source is a single page.

This is used by CacheFiles to store the contents of netfs pages into
their backing file pages.

Supply a generic implementation for this that uses the write_begin()
and write_end() address_space operations to bind a copy directly into
the page cache.

Hook the Ext2 and Ext3 operations to the generic implementation.

And then opens the target file for each write, calls ->write() and then
fput()'s the file. The file has to be released, rather than being cached for
multiple uses, as caching it is liable to cause ENFILE errors to start cropping
up when a large tar job is run on a netfs. Optimisation may be possible to
write several pages to the same file in one go, thus reducing the amount of
dentry_open()/fput() calls required.

Despite executing a fair bit more code, this appears to perform much better
than use of the write_one_page() address space op. There are two possible
reasons for this:

(1) ->write() is doing some scheduling stuff that write_one_page() is not.

(2) fput() is doing some scheduling stuff, probably by way of the ->release()
file op.

Whether the cache would still perform better under a real load with this patch
is hard to measure.

Signed-off-by: David Howells <[email protected]>
---

fs/cachefiles/namei.c | 3 +--
fs/cachefiles/rdwr.c | 42 ++++++++++++++++++++++++++++++++++--------
2 files changed, 35 insertions(+), 10 deletions(-)


diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 83b1988..4ce818a 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -446,8 +446,7 @@ lookup_again:

ret = -EPERM;
aops = object->dentry->d_inode->i_mapping->a_ops;
- if (!aops->bmap ||
- !aops->write_one_page)
+ if (!aops->bmap)
goto check_error;

object->backer = object->dentry;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 84bb8b7..a69787e 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -9,6 +9,8 @@
* 2 of the Licence, or (at your option) any later version.
*/

+#include <linux/mount.h>
+#include <linux/file.h>
#include "internal.h"

/*
@@ -796,7 +798,11 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op,
int cachefiles_write_page(struct fscache_storage *op, struct page *page)
{
struct cachefiles_object *object;
- struct address_space *mapping;
+ struct cachefiles_cache *cache;
+ mm_segment_t old_fs;
+ struct file *file;
+ loff_t pos;
+ void *data;
int ret;

ASSERT(op != NULL);
@@ -814,13 +820,33 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page)

ASSERT(S_ISREG(object->backer->d_inode->i_mode));

- /* copy the page to ext3 and let it store it in its own time */
- mapping = object->backer->d_inode->i_mapping;
- ret = -EIO;
- if (mapping->a_ops->write_one_page) {
- ret = mapping->a_ops->write_one_page(mapping, page->index,
- page);
- _debug("write_one_page -> %d", ret);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ /* write the page to the backing filesystem and let it store it in its
+ * own time */
+ dget(object->backer);
+ mntget(cache->mnt);
+ file = dentry_open(object->backer, cache->mnt, O_RDWR,
+ cache->cache_cred);
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ } else {
+ ret = -EIO;
+ if (file->f_op->write) {
+ pos = (loff_t) page->index << PAGE_SHIFT;
+ data = kmap(page);
+ old_fs = get_fs();
+ set_fs(KERNEL_DS);
+ ret = file->f_op->write(
+ file, (const void __user *) data, PAGE_SIZE,
+ &pos);
+ set_fs(old_fs);
+ kunmap(page);
+ if (ret != PAGE_SIZE)
+ ret = -EIO;
+ }
+ fput(file);
}

if (ret < 0) {


2009-04-03 09:42:34

by David Howells

[permalink] [raw]
Subject: [PATCH 2/4] CacheFiles: Revert the addition of write_one_page()

Revert the addition of the write_one_page() address space op as CacheFiles now
uses the write() file op instead.

Signed-off-by: David Howells <[email protected]>
---

fs/ext2/inode.c | 2 --
fs/ext3/inode.c | 3 ---
include/linux/fs.h | 7 ------
mm/filemap.c | 61 ----------------------------------------------------
4 files changed, 0 insertions(+), 73 deletions(-)


diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 2366895..b43b955 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -799,7 +799,6 @@ const struct address_space_operations ext2_aops = {
.writepages = ext2_writepages,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
- .write_one_page = generic_file_buffered_write_one_page,
};

const struct address_space_operations ext2_aops_xip = {
@@ -818,7 +817,6 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage = buffer_migrate_page,
- .write_one_page = generic_file_buffered_write_one_page,
};

/*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 41e5d2d..716f964 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1812,7 +1812,6 @@ static const struct address_space_operations ext3_ordered_aops = {
.direct_IO = ext3_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
- .write_one_page = generic_file_buffered_write_one_page,
};

static const struct address_space_operations ext3_writeback_aops = {
@@ -1828,7 +1827,6 @@ static const struct address_space_operations ext3_writeback_aops = {
.direct_IO = ext3_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
- .write_one_page = generic_file_buffered_write_one_page,
};

static const struct address_space_operations ext3_journalled_aops = {
@@ -1843,7 +1841,6 @@ static const struct address_space_operations ext3_journalled_aops = {
.invalidatepage = ext3_invalidatepage,
.releasepage = ext3_releasepage,
.is_partially_uptodate = block_is_partially_uptodate,
- .write_one_page = generic_file_buffered_write_one_page,
};

void ext3_set_aops(struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0bf1f1e..a09e17c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -532,11 +532,6 @@ struct address_space_operations {
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
unsigned long);
-
- /* write the contents of the source page over the page at the specified
- * index in the target address space (the source page does not need to
- * be related to the target address space) */
- int (*write_one_page)(struct address_space *, pgoff_t, struct page *);
};

/*
@@ -2138,8 +2133,6 @@ extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *,
unsigned long *, loff_t, loff_t *, size_t, size_t);
extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
-extern int generic_file_buffered_write_one_page(struct address_space *,
- pgoff_t, struct page *);
extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos);
extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
extern int generic_segment_checks(const struct iovec *iov,
diff --git a/mm/filemap.c b/mm/filemap.c
index 15dda3c..a07d714 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2338,67 +2338,6 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
}
EXPORT_SYMBOL(generic_file_buffered_write);

-/**
- * generic_file_buffered_write_one_page - Write a single page of data to an
- * inode
- * @mapping - The address space of the target inode
- * @index - The target page in the target inode to fill
- * @source - The data to write into the target page
- *
- * Write the data from the source page to the page in the nominated address
- * space at the @index specified. Note that the file will not be extended if
- * the page crosses the EOF marker, in which case only the first part of the
- * page will be written.
- *
- * The @source page does not need to have any association with the file or the
- * target page offset.
- */
-int generic_file_buffered_write_one_page(struct address_space *mapping,
- pgoff_t index,
- struct page *source)
-{
- const struct address_space_operations *a_ops = mapping->a_ops;
- struct page *page;
- unsigned len;
- loff_t isize, pos;
- void *fsdata;
- int ret;
-
- pos = index;
- pos <<= PAGE_CACHE_SHIFT;
-
- len = PAGE_CACHE_SIZE;
- isize = i_size_read(mapping->host);
- if ((isize >> PAGE_CACHE_SHIFT) == index)
- len = isize & (PAGE_CACHE_SIZE - 1);
-
- ret = pagecache_write_begin(NULL, mapping, pos, len,
- AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata);
- if (ret < 0)
- goto sync;
-
- copy_highpage(page, source);
-
- ret = pagecache_write_end(NULL, mapping, pos, len, len, page, fsdata);
- if (ret < 0)
- goto sync;
-
- balance_dirty_pages_ratelimited(mapping);
- cond_resched();
-
-sync:
- /* the caller must handle O_SYNC themselves, but we handle S_SYNC and
- * MS_SYNCHRONOUS here */
- if (unlikely(IS_SYNC(mapping->host)) && !a_ops->writepage)
- ret = generic_osync_inode(mapping->host, mapping,
- OSYNC_METADATA | OSYNC_DATA);
-
- /* the caller must handle O_DIRECT for themselves */
-
- return ret;
-}
-EXPORT_SYMBOL(generic_file_buffered_write_one_page);
-
static ssize_t
__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)

2009-04-03 09:43:21

by David Howells

[permalink] [raw]
Subject: [PATCH 4/4] FS-Cache: Revert the addition of PG_owner_priv2/PG_fscache_write

Revert the addition of PG_owner_priv2/PG_fscache_write as it's no longer used
by FS-Cache.

Signed-off-by: David Howells <[email protected]>
---

include/linux/page-flags.h | 3 ---
include/linux/pagemap.h | 16 ----------------
mm/filemap.c | 17 -----------------
3 files changed, 0 insertions(+), 36 deletions(-)


diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0633e06..62214c7 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -79,7 +79,6 @@ enum pageflags {
PG_active,
PG_slab,
PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/
- PG_owner_priv_2, /* Owner use. fs may use in pagecache */
PG_arch_1,
PG_reserved,
PG_private, /* If pagecache, has fs-private data */
@@ -115,7 +114,6 @@ enum pageflags {
* when those inodes are being locally cached.
*/
PG_fscache = PG_private_2, /* page backed by cache */
- PG_fscache_write = PG_owner_priv_2, /* page being written to cache */

/* XEN */
PG_pinned = PG_owner_priv_1,
@@ -220,7 +218,6 @@ PAGEFLAG(Private, private) __SETPAGEFLAG(Private, private)
__CLEARPAGEFLAG(Private, private)
PAGEFLAG(Private2, private_2) TESTSCFLAG(Private2, private_2)
PAGEFLAG(OwnerPriv1, owner_priv_1) TESTCLEARFLAG(OwnerPriv1, owner_priv_1)
-PAGEFLAG(OwnerPriv2, owner_priv_2) TESTSCFLAG(OwnerPriv2, owner_priv_2)

/*
* Only test-and-set exist for PG_writeback. The unconditional operators are
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7140f74..34da523 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -383,22 +383,6 @@ static inline void wait_on_page_writeback(struct page *page)

extern void end_page_writeback(struct page *page);

-/**
- * wait_on_page_owner_priv_2 - Wait for PG_owner_priv_2 to become clear
- * @page: The page to monitor
- *
- * Wait for a PG_owner_priv_2 to become clear on the specified page. This is
- * also used to monitor PG_fscache_write (which is an alternate name for the
- * same bit).
- */
-static inline void wait_on_page_owner_priv_2(struct page *page)
-{
- if (PageOwnerPriv2(page))
- wait_on_page_bit(page, PG_owner_priv_2);
-}
-
-extern void end_page_owner_priv_2(struct page *page);
-
/*
* Add an arbitrary waiter to a page's wait queue
*/
diff --git a/mm/filemap.c b/mm/filemap.c
index a07d714..fc11974 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -621,23 +621,6 @@ void end_page_writeback(struct page *page)
EXPORT_SYMBOL(end_page_writeback);

/**
- * end_page_owner_priv_2 - Clear PG_owner_priv_2 and wake up any waiters
- * @page: the page
- *
- * Clear PG_owner_priv_2 and wake up any processes waiting for that event.
- * This is used to indicate - using PG_fscache_write (an alternate name for the
- * same bit) - that a page has finished being written to the local disk cache.
- */
-void end_page_owner_priv_2(struct page *page)
-{
- if (!TestClearPageOwnerPriv2(page))
- BUG();
- smp_mb__after_clear_bit();
- wake_up_page(page, PG_owner_priv_2);
-}
-EXPORT_SYMBOL(end_page_owner_priv_2);
-
-/**
* __lock_page - get a lock on the page, assuming we need to sleep to get it
* @page: the page to lock
*

2009-04-03 09:42:53

by David Howells

[permalink] [raw]
Subject: [PATCH 3/4] FS-Cache: Use a radix tree to track pages being written rather than a page flag

Use a radix tree attached to struct fscache_cookie to track what pages are
undergoing write, rather than using a page flag.

The radix tree that was resident in struct fscache_object to track pages that
need writing is moved to fscache_cookie. Pages that need writing and pages
that are being written are both held in there. The difference being that the
former are tagged with FSCACHE_COOKIE_PENDING_TAG.

Signed-off-by: David Howells <[email protected]>
---

Documentation/filesystems/caching/netfs-api.txt | 88 +++++++++--------------
fs/afs/file.c | 6 +-
fs/afs/write.c | 2 -
fs/fscache/cookie.c | 2 +
fs/fscache/page.c | 77 ++++++++++++++++----
fs/nfs/file.c | 5 +
fs/nfs/fscache.c | 26 ++++---
fs/nfs/fscache.h | 12 +++
include/linux/fscache-cache.h | 5 +
include/linux/fscache.h | 54 ++++++++++----
10 files changed, 172 insertions(+), 105 deletions(-)


diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
index da8f92f..4db125b 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -640,7 +640,18 @@ Note that pages can't be explicitly deleted from the a data file. The whole
data file must be retired (see the relinquish cookie function below).

Furthermore, note that this does not cancel the asynchronous read or write
-operation started by the read/alloc and write functions.
+operation started by the read/alloc and write functions, so the page
+invalidation and release functions must use:
+
+ bool fscache_check_page_write(struct fscache_cookie *cookie,
+ struct page *page);
+
+to see if a page is being written to the cache, and:
+
+ void fscache_wait_on_page_write(struct fscache_cookie *cookie,
+ struct page *page);
+
+to wait for it to finish if it is.


==========================
@@ -730,52 +741,32 @@ this, the caller should relinquish and retire the cookie they have, and then
acquire a new one.


-============================
-FS-CACHE SPECIFIC PAGE FLAGS
-============================
-
-FS-Cache makes use of two page flags, PG_private_2 and PG_owner_priv_2, for
-its own purpose. The first is given the alternative name PG_fscache and the
-second PG_fscache_write.
-
-FS-Cache uses these flags to keep track of two bits of information per cached
-netfs page:
+===========================
+FS-CACHE SPECIFIC PAGE FLAG
+===========================

- (1) PG_fscache.
+FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
+given the alternative name PG_fscache.

- This indicates that the page is known by the cache, and that the cache
- must be informed if the page is going to go away. It's an indication to
- the netfs that the cache has an interest in this page, where an interest
- may be a pointer to it, resources allocated or reserved for it, or I/O in
- progress upon it.
+PG_fscache is used to indicate that the page is known by the cache, and that
+the cache must be informed if the page is going to go away. It's an indication
+to the netfs that the cache has an interest in this page, where an interest may
+be a pointer to it, resources allocated or reserved for it, or I/O in progress
+upon it.

- The netfs can use this information in methods such as releasepage() to
- determine whether it needs to uncache a page or update it.
+The netfs can use this information in methods such as releasepage() to
+determine whether it needs to uncache a page or update it.

- Furthermore, if this bit is set, releasepage() and invalidatepage()
- operations will be called on a page to get rid of it, even if PG_private
- is not set. This allows caching to attempted on a page before
- read_cache_pages() to be called after fscache_read_or_alloc_pages() as
- the former will try and release pages it was given under certain
- circumstances.
+Furthermore, if this bit is set, releasepage() and invalidatepage() operations
+will be called on a page to get rid of it, even if PG_private is not set. This
+allows caching to attempted on a page before read_cache_pages() to be called
+after fscache_read_or_alloc_pages() as the former will try and release pages it
+was given under certain circumstances.

- (2) PG_fscache_write.
+This bit does not overlap with such as PG_private. This means that FS-Cache
+can be used with a filesystem that uses the block buffering code.

- This indicates that the page is being written to disk by the cache, and
- that it cannot be released until completion. Ideally it shouldn't be
- changed until completion either so as to maintain the known state of the
- cache. This cannot be unified with PG_writeback as the page may be being
- written to both the server and the cache at the same time or at different
- times.
-
- This can be used by the netfs to wait for a page to be written out to the
- cache before, say, releasing or invalidating it, or before allowing
- someone to modify it in page_mkwrite(), say.
-
-Neither of these two bits overlaps with such as PG_private. This means that
-FS-Cache can be used with a filesystem that uses the block buffering code.
-
-There are a number of operations defined on these two bits:
+There are a number of operations defined on this flag:

int PageFsCache(struct page *page);
void SetPageFsCache(struct page *page)
@@ -783,18 +774,5 @@ There are a number of operations defined on these two bits:
int TestSetPageFsCache(struct page *page)
int TestClearPageFsCache(struct page *page)

- int PageFsCacheWrite(struct page *page)
- void SetPageFsCacheWrite(struct page *page)
- void ClearPageFsCacheWrite(struct page *page)
- int TestSetPageFsCacheWrite(struct page *page)
- int TestClearPageFsCacheWrite(struct page *page)
-
These functions are bit test, bit set, bit clear, bit test and set and bit
-test and clear operations on PG_fscache and PG_fscache_write.
-
- void wait_on_page_fscache_write(struct page *page)
- void end_page_fscache_write(struct page *page)
-
-The first of these two functions waits uninterruptibly for PG_fscache_write to
-become clear, if it isn't already so. The second clears PG_fscache_write and
-wakes up anyone waiting for it.
+test and clear operations on PG_fscache.
diff --git a/fs/afs/file.c b/fs/afs/file.c
index aeb6cdd..7a1d942 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -299,7 +299,7 @@ static void afs_invalidatepage(struct page *page, unsigned long offset)
#ifdef CONFIG_AFS_FSCACHE
if (PageFsCache(page)) {
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- wait_on_page_fscache_write(page);
+ fscache_wait_on_page_write(vnode->cache, page);
fscache_uncache_page(vnode->cache, page);
ClearPageFsCache(page);
}
@@ -336,12 +336,12 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
* elected to wait */
#ifdef CONFIG_AFS_FSCACHE
if (PageFsCache(page)) {
- if (PageFsCacheWrite(page)) {
+ if (fscache_check_page_write(vnode->cache, page)) {
if (!(gfp_flags & __GFP_WAIT)) {
_leave(" = F [cache busy]");
return 0;
}
- wait_on_page_fscache_write(page);
+ fscache_wait_on_page_write(vnode->cache, page);
}

fscache_uncache_page(vnode->cache, page);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 7884518..c2e7a7f 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -795,7 +795,7 @@ int afs_page_mkwrite(struct vm_area_struct *vma, struct page *page)
/* wait for the page to be written to the cache before we allow it to
* be modified */
#ifdef CONFIG_AFS_FSCACHE
- wait_on_page_fscache_write(page);
+ fscache_wait_on_page_write(vnode->cache, page);
#endif

_leave(" = 0");
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index cd9d065..72fd18f 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -102,6 +102,8 @@ struct fscache_cookie *__fscache_acquire_cookie(
cookie->netfs_data = netfs_data;
cookie->flags = 0;

+ INIT_RADIX_TREE(&cookie->stores, GFP_NOFS);
+
switch (cookie->def->type) {
case FSCACHE_COOKIE_TYPE_INDEX:
fscache_stat(&fscache_n_cookie_index);
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 512ec2c..2568e0e 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -17,6 +17,47 @@
#include "internal.h"

/*
+ * check to see if a page is being written to the cache
+ */
+bool __fscache_check_page_write(struct fscache_cookie *cookie, struct page *page)
+{
+ void *val;
+
+ rcu_read_lock();
+ val = radix_tree_lookup(&cookie->stores, page->index);
+ rcu_read_unlock();
+
+ return val != NULL;
+}
+EXPORT_SYMBOL(__fscache_check_page_write);
+
+/*
+ * wait for a page to finish being written to the cache
+ */
+void __fscache_wait_on_page_write(struct fscache_cookie *cookie, struct page *page)
+{
+ wait_queue_head_t *wq = bit_waitqueue(&cookie->flags, 0);
+
+ wait_event(*wq, !__fscache_check_page_write(cookie, page));
+}
+EXPORT_SYMBOL(__fscache_wait_on_page_write);
+
+/*
+ * note that a page has finished being written to the cache
+ */
+static void fscache_end_page_write(struct fscache_cookie *cookie, struct page *page)
+{
+ struct page *xpage;
+
+ spin_lock(&cookie->lock);
+ xpage = radix_tree_delete(&cookie->stores, page->index);
+ spin_unlock(&cookie->lock);
+ ASSERT(xpage != NULL);
+
+ wake_up_bit(&cookie->flags, 0);
+}
+
+/*
* actually apply the changed attributes to a cache object
*/
static void fscache_attr_changed_op(struct fscache_operation *op)
@@ -480,6 +521,7 @@ static void fscache_write_op(struct fscache_operation *_op)
struct fscache_storage *op =
container_of(_op, struct fscache_storage, op);
struct fscache_object *object = op->op.object;
+ struct fscache_cookie *cookie = object->cookie;
struct page *page;
unsigned n;
void *results[1];
@@ -487,10 +529,12 @@ static void fscache_write_op(struct fscache_operation *_op)

_enter("{OP%x,%d}", op->op.debug_id, atomic_read(&op->op.usage));

+ spin_lock(&cookie->lock);
spin_lock(&object->lock);

if (!fscache_object_is_active(object)) {
spin_unlock(&object->lock);
+ spin_unlock(&cookie->lock);
_leave("");
return;
}
@@ -499,23 +543,24 @@ static void fscache_write_op(struct fscache_operation *_op)

/* find a page to store */
page = NULL;
- n = radix_tree_gang_lookup(&object->stores, results, 0, 1);
- if (n == 1) {
- page = results[0];
- _debug("gang %d [%lx]", n, page->index);
- if (page->index <= op->store_limit)
- radix_tree_delete(&object->stores, page->index);
- else
- goto superseded;
- } else {
+ n = radix_tree_gang_lookup_tag(&cookie->stores, results, 0, 1,
+ FSCACHE_COOKIE_PENDING_TAG);
+ if (n != 1)
+ goto superseded;
+ page = results[0];
+ _debug("gang %d [%lx]", n, page->index);
+ if (page->index > op->store_limit)
goto superseded;
- }
+
+ radix_tree_tag_clear(&cookie->stores, page->index,
+ FSCACHE_COOKIE_PENDING_TAG);

spin_unlock(&object->lock);
+ spin_unlock(&cookie->lock);

if (page) {
ret = object->cache->ops->write_page(op, page);
- end_page_fscache_write(page);
+ fscache_end_page_write(cookie, page);
page_cache_release(page);
if (ret < 0)
fscache_abort_object(object);
@@ -532,6 +577,7 @@ superseded:
_debug("cease");
clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
spin_unlock(&object->lock);
+ spin_unlock(&cookie->lock);
_leave("");
}

@@ -609,7 +655,7 @@ int __fscache_write_page(struct fscache_cookie *cookie,

_debug("store limit %llx", (unsigned long long) object->store_limit);

- ret = radix_tree_insert(&object->stores, page->index, page);
+ ret = radix_tree_insert(&cookie->stores, page->index, page);
if (ret < 0) {
if (ret == -EEXIST)
goto already_queued;
@@ -617,9 +663,9 @@ int __fscache_write_page(struct fscache_cookie *cookie,
goto nobufs_unlock_obj;
}

+ radix_tree_tag_set(&cookie->stores, page->index,
+ FSCACHE_COOKIE_PENDING_TAG);
page_cache_get(page);
- if (TestSetPageFsCacheWrite(page))
- BUG();

/* we only want one writer at a time, but we do need to queue new
* writers after exclusive ops */
@@ -656,8 +702,7 @@ already_pending:
return 0;

submit_failed:
- radix_tree_delete(&object->stores, page->index);
- end_page_fscache_write(page);
+ radix_tree_delete(&cookie->stores, page->index);
page_cache_release(page);
ret = -ENOBUFS;
goto nobufs;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index d3060c4..3523b89 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -456,11 +456,12 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
static int nfs_launder_page(struct page *page)
{
struct inode *inode = page->mapping->host;
+ struct nfs_inode *nfsi = NFS_I(inode);

dfprintk(PAGECACHE, "NFS: launder_page(%ld, %llu)\n",
inode->i_ino, (long long)page_offset(page));

- wait_on_page_fscache_write(page);
+ nfs_fscache_wait_on_page_write(nfsi, page);
return nfs_wb_page(inode, page);
}

@@ -498,7 +499,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
(long long)page_offset(page));

/* make sure the cache has finished storing the page */
- wait_on_page_fscache_write(page);
+ nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);

lock_page(page);
mapping = page->mapping;
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 968cf5d..379be67 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -337,21 +337,22 @@ void nfs_fscache_reset_inode_cookie(struct inode *inode)
*/
int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
- if (PageFsCacheWrite(page)) {
+ struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+ struct fscache_cookie *cookie = nfsi->fscache;
+
+ BUG_ON(!cookie);
+
+ if (fscache_check_page_write(cookie, page)) {
if (!(gfp & __GFP_WAIT))
return 0;
- wait_on_page_fscache_write(page);
+ fscache_wait_on_page_write(cookie, page);
}

if (PageFsCache(page)) {
- struct nfs_inode *nfsi = NFS_I(page->mapping->host);
-
- BUG_ON(!nfsi->fscache);
-
dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
- nfsi->fscache, page, nfsi);
+ cookie, page, nfsi);

- fscache_uncache_page(nfsi->fscache, page);
+ fscache_uncache_page(cookie, page);
nfs_add_fscache_stats(page->mapping->host,
NFSIOS_FSCACHE_PAGES_UNCACHED, 1);
}
@@ -366,16 +367,17 @@ int nfs_fscache_release_page(struct page *page, gfp_t gfp)
void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode)
{
struct nfs_inode *nfsi = NFS_I(inode);
+ struct fscache_cookie *cookie = nfsi->fscache;

- BUG_ON(!nfsi->fscache);
+ BUG_ON(!cookie);

dfprintk(FSCACHE, "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
- nfsi->fscache, page, nfsi);
+ cookie, page, nfsi);

- wait_on_page_fscache_write(page);
+ fscache_wait_on_page_write(cookie, page);

BUG_ON(!PageLocked(page));
- fscache_uncache_page(nfsi->fscache, page);
+ fscache_uncache_page(cookie, page);
nfs_add_fscache_stats(page->mapping->host,
NFSIOS_FSCACHE_PAGES_UNCACHED, 1);
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 2d43b67..6e809bb 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -94,6 +94,16 @@ extern int __nfs_readpages_from_fscache(struct nfs_open_context *,
extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int);

/*
+ * wait for a page to complete writing to the cache
+ */
+static inline void nfs_fscache_wait_on_page_write(struct nfs_inode *nfsi,
+ struct page *page)
+{
+ if (PageFsCache(page))
+ fscache_wait_on_page_write(nfsi->fscache, page);
+}
+
+/*
* release the caching state associated with a page if undergoing complete page
* invalidation
*/
@@ -181,6 +191,8 @@ static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
}
static inline void nfs_fscache_invalidate_page(struct page *page,
struct inode *inode) {}
+static inline void nfs_fscache_wait_on_page_write(struct nfs_inode *nfsi,
+ struct page *page) {}

static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
struct inode *inode,
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 0410bd9..84d3532 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -301,6 +301,9 @@ struct fscache_cookie {
const struct fscache_cookie_def *def; /* definition */
struct fscache_cookie *parent; /* parent of this entry */
void *netfs_data; /* back pointer to netfs */
+ struct radix_tree_root stores; /* pages to be stored on this cookie */
+#define FSCACHE_COOKIE_PENDING_TAG 0 /* pages tag: pending write to cache */
+
unsigned long flags;
#define FSCACHE_COOKIE_LOOKING_UP 0 /* T if non-index cookie being looked up still */
#define FSCACHE_COOKIE_CREATING 1 /* T if non-index object being created still */
@@ -370,7 +373,6 @@ struct fscache_object {
struct list_head dependents; /* FIFO of dependent objects */
struct list_head dep_link; /* link in parent's dependents list */
struct list_head pending_ops; /* unstarted operations on this object */
- struct radix_tree_root stores; /* data to be stored */
pgoff_t store_limit; /* current storage limit */
};

@@ -407,7 +409,6 @@ void fscache_object_init(struct fscache_object *object,
INIT_LIST_HEAD(&object->dependents);
INIT_LIST_HEAD(&object->dep_link);
INIT_LIST_HEAD(&object->pending_ops);
- INIT_RADIX_TREE(&object->stores, GFP_NOFS);
object->n_children = 0;
object->n_ops = object->n_in_progress = object->n_exclusive = 0;
object->events = object->event_mask = 0;
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 006c919..6d8ee46 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -42,20 +42,6 @@
#define TestSetPageFsCache(page) TestSetPagePrivate2((page))
#define TestClearPageFsCache(page) TestClearPagePrivate2((page))

-/*
- * overload PG_owner_priv_2 to give us PG_fscache_write - this is used to
- * indicate that a page is currently being written to a local disk cache
- */
-#define PageFsCacheWrite(page) PageOwnerPriv2((page))
-#define SetPageFsCacheWrite(page) SetPageOwnerPriv2((page))
-#define ClearPageFsCacheWrite(page) ClearPageOwnerPriv2((page))
-#define TestSetPageFsCacheWrite(page) TestSetPageOwnerPriv2((page))
-#define TestClearPageFsCacheWrite(page) TestClearPageOwnerPriv2((page))
-
-#define wait_on_page_fscache_write(page) wait_on_page_owner_priv_2((page))
-#define end_page_fscache_write(page) end_page_owner_priv_2((page))
-
-
/* pattern used to fill dead space in an index entry */
#define FSCACHE_INDEX_DEADFILL_PATTERN 0x79

@@ -214,6 +200,8 @@ extern int __fscache_read_or_alloc_pages(struct fscache_cookie *,
extern int __fscache_alloc_page(struct fscache_cookie *, struct page *, gfp_t);
extern int __fscache_write_page(struct fscache_cookie *, struct page *, gfp_t);
extern void __fscache_uncache_page(struct fscache_cookie *, struct page *);
+extern bool __fscache_check_page_write(struct fscache_cookie *, struct page *);
+extern void __fscache_wait_on_page_write(struct fscache_cookie *, struct page *);

/**
* fscache_register_netfs - Register a filesystem as desiring caching services
@@ -589,4 +577,42 @@ void fscache_uncache_page(struct fscache_cookie *cookie,
__fscache_uncache_page(cookie, page);
}

+/**
+ * fscache_check_page_write - Ask if a page is being writing to the cache
+ * @cookie: The cookie representing the cache object
+ * @page: The netfs page that is being cached.
+ *
+ * Ask the cache if a page is being written to the cache.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+bool fscache_check_page_write(struct fscache_cookie *cookie,
+ struct page *page)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_check_page_write(cookie, page);
+ return false;
+}
+
+/**
+ * fscache_wait_on_page_write - Wait for a page to complete writing to the cache
+ * @cookie: The cookie representing the cache object
+ * @page: The netfs page that is being cached.
+ *
+ * Ask the cache to wake us up when a page is no longer being written to the
+ * cache.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_wait_on_page_write(struct fscache_cookie *cookie,
+ struct page *page)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_wait_on_page_write(cookie, page);
+}
+
#endif /* _LINUX_FSCACHE_H */

2009-04-03 09:52:23

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/4] CacheFiles: Use the ->write() file op rather than a special kernel aop

David Howells <[email protected]> wrote:

> This reverts the patch:
>
> CacheFiles: Add a hook to write a single page of data to an inode
>
> Add an address space operation to write one single page of data to an
> inode at a page-aligned location (thus permitting the implementation to
> be highly optimised). The data source is a single page.
>
> This is used by CacheFiles to store the contents of netfs pages into
> their backing file pages.
>
> Supply a generic implementation for this that uses the write_begin()
> and write_end() address_space operations to bind a copy directly into
> the page cache.
>
> Hook the Ext2 and Ext3 operations to the generic implementation.

Actually, it doesn't. I split that out into the next patch, but I forgot to
edit the patch description.

David

2009-04-03 13:49:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/4] CacheFiles: Use the ->write() file op rather than a special kernel aop

On Fri, Apr 03, 2009 at 10:41:38AM +0100, David Howells wrote:
> + if (!aops->bmap)
> goto check_error;

What are you doing using bmap? You really shouldn't call into it from
anywhere but FIBMAP. Yes, swap currently does but it's a major pain
in the neck and Peter has been working on a proper interface for swap
for a while.

> + if (IS_ERR(file)) {
> + ret = PTR_ERR(file);

goto out_something;

> + } else {
> + ret = -EIO;
> + if (file->f_op->write) {

if (!file->f_op->write)
goto out_fput;

2009-04-03 13:50:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/4] CacheFiles: Revert the addition of write_one_page()

On Fri, Apr 03, 2009 at 10:41:43AM +0100, David Howells wrote:
> Revert the addition of the write_one_page() address space op as CacheFiles now
> uses the write() file op instead.

Note that in the patch series going into mainline eventually this should
not be a separate patch but the new ops should never be introduced.

2009-04-03 13:54:00

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 2/4] CacheFiles: Revert the addition of write_one_page()

Christoph Hellwig <[email protected]> wrote:

> Note that in the patch series going into mainline eventually this should
> not be a separate patch but the new ops should never be introduced.

Yes, probably, but this makes it easier for people to see what I'm proposing
to change.

David

2009-04-03 13:57:54

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/4] CacheFiles: Use the ->write() file op rather than a special kernel aop

Christoph Hellwig <[email protected]> wrote:

> > + if (!aops->bmap)
> > goto check_error;
>
> What are you doing using bmap? You really shouldn't call into it from
> anywhere but FIBMAP. Yes, swap currently does but it's a major pain
> in the neck and Peter has been working on a proper interface for swap
> for a while.

Checking to see whether there's a hole in the file. If there's a hole, that
represents data I need to fetch; if there isn't that represents data I have in
the cache. I don't care _where_ the data is, only whether it exists or not.

David

2009-04-04 05:56:30

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 2/4] CacheFiles: Revert the addition of write_one_page()

On Saturday 04 April 2009 00:53:29 David Howells wrote:
> Christoph Hellwig <[email protected]> wrote:
>
> > Note that in the patch series going into mainline eventually this should
> > not be a separate patch but the new ops should never be introduced.
>
> Yes, probably, but this makes it easier for people to see what I'm proposing
> to change.

Definitely yes please. kmap resources are not infinite but there is obviously
a proper kmap allocator so long as you only kmap one page at a time, then
others will just block waiting for more. On sane platforms of course there
is no penalty at all.

So this change is definitely required if you want to get this patchset in.
The alternative is to come up with a new API and get that merged first, but
I figure you don't want that kind of dependency.