2013-08-21 21:26:40

by Milosz Tanski

[permalink] [raw]
Subject: [PATCHv4 0/5] ceph: persistent caching with fscache

This an updated version of the fscache support for the Ceph filesystem. What's
changed since the last patchset:

1. Sparater the readpages refactor into it's own patches. These were already
accepted into the testing branch.

2. Tracked down the BUG in readahead cleanup code. We were returning with pages
marked as private_2 from readpages(). I added a simple convenience function
to the fscache netfs interface for cleaning up the page list at the end of
asop readpages(). I know other filesystems (NFS) have ran into that since
I've seen a few similar traces in Google search and the cachefs mailig list.
The second patch new make Ceph use this interface.

I've been running this code (minus the BUG fix) on clients for a couple weeks
with moderate 24/7 use without issues. At this point in time I feel like it's
solid enough to go into the ceph kclient.

Please pull the code from my repository:
https://bitbucket.org/adfin/linux-fs.git branch: wip-ceph-fscache

The first two patches I included were not written by me but were written by
Hongyi Jia. He implemented the cookie re-validation scheme into fscache core.

Finally, I CCed a couple other mailing lists (fsdevel, nfs) because the new
fscache_readpages_cancel() should be used by other filesystems to avoid the
same problem.


Hongyi Jia (2):
new cachefiles interface to check cache consistency
new fscache interface to check cache consistency

Milosz Tanski (3):
ceph: use fscache as a local presisent cache
fscache: netfs function for cleanup post readpages
ceph: clean PgPrivate2 on returning from readpages

fs/cachefiles/interface.c | 19 +++
fs/cachefiles/internal.h | 1 +
fs/cachefiles/xattr.c | 39 ++++++
fs/ceph/Kconfig | 9 ++
fs/ceph/Makefile | 2 +
fs/ceph/addr.c | 39 +++++-
fs/ceph/cache.c | 311 ++++++++++++++++++++++++++++++++++++++++++
fs/ceph/cache.h | 130 ++++++++++++++++++
fs/ceph/caps.c | 19 ++-
fs/ceph/file.c | 17 +++
fs/ceph/inode.c | 66 ++++++++-
fs/ceph/super.c | 47 ++++++-
fs/ceph/super.h | 17 +++
fs/fscache/cookie.c | 22 +++
fs/fscache/page.c | 16 +++
include/linux/fscache-cache.h | 4 +
include/linux/fscache.h | 39 ++++++
17 files changed, 785 insertions(+), 12 deletions(-)
create mode 100644 fs/ceph/cache.c
create mode 100644 fs/ceph/cache.h

--
1.8.1.2


2013-08-21 21:29:23

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 1/5] new cachefiles interface to check cache consistency

Signed-off-by: Hongyi Jia <[email protected]>
Tested-by: Milosz Tanski <[email protected]>
---
fs/cachefiles/interface.c | 19 +++++++++++++++++++
fs/cachefiles/internal.h | 1 +
fs/cachefiles/xattr.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index d4c1206..0805f23 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -378,6 +378,24 @@ static void cachefiles_sync_cache(struct fscache_cache *_cache)
}

/*
+ * check if the backing cache is updated to FS-Cache
+ * - called by FS-Cache when evaluates if need to invalidate the cache
+ */
+static bool cachefiles_check_consistency(struct fscache_object *_object)
+{
+ struct cachefiles_object *object;
+ int ret;
+
+ _enter("{OBJ%x}", _object->debug_id);
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+ ret = cachefiles_check_auxdata(object);
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
* notification the attributes on an object have changed
* - called with reads/writes excluded by FS-Cache
*/
@@ -522,4 +540,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.write_page = cachefiles_write_page,
.uncache_page = cachefiles_uncache_page,
.dissociate_pages = cachefiles_dissociate_pages,
+ .check_consistency = cachefiles_check_consistency,
};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 4938251..e102d22 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -239,6 +239,7 @@ extern int cachefiles_check_object_xattr(struct cachefiles_object *object,
struct cachefiles_xattr *auxdata);
extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
struct dentry *dentry);
+extern bool cachefiles_check_auxdata(struct cachefiles_object *object);


/*
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 2476e51..aafaac3 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -157,6 +157,45 @@ int cachefiles_update_object_xattr(struct cachefiles_object *object,
}

/*
+ * check the consistency between the backing cache and the FS-Cache cookie
+ */
+bool cachefiles_check_auxdata(struct cachefiles_object *object)
+{
+ struct cachefiles_xattr *auxbuf;
+ struct dentry *dentry = object->dentry;
+ unsigned int dlen;
+ int ret;
+
+ ASSERT(dentry);
+ ASSERT(dentry->d_inode);
+
+ auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
+ if (!auxbuf)
+ return false;
+
+ auxbuf->len = vfs_getxattr(dentry, cachefiles_xattr_cache,
+ &auxbuf->type, 512 + 1);
+ if (auxbuf->len < 1)
+ return false;
+
+ if (auxbuf->type != object->fscache.cookie->def->type)
+ return false;
+
+ if (!object->fscache.cookie->def->check_aux)
+ return false;
+
+ dlen = auxbuf->len - 1;
+ ret = fscache_check_aux(&object->fscache,
+ &auxbuf->data, dlen);
+
+ kfree(auxbuf);
+ if (ret == FSCACHE_CHECKAUX_OKAY)
+ return true;
+ else
+ return false;
+}
+
+/*
* check the state xattr on a cache file
* - return -ESTALE if the object should be deleted
*/
--
1.8.1.2

2013-08-21 21:29:38

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 2/5] new fscache interface to check cache consistency

Signed-off-by: Hongyi Jia <[email protected]>
Tested-by: Milosz Tanski <[email protected]>
---
fs/fscache/cookie.c | 22 ++++++++++++++++++++++
include/linux/fscache-cache.h | 4 ++++
include/linux/fscache.h | 17 +++++++++++++++++
3 files changed, 43 insertions(+)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 0e91a3c..bfa1d3b 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -533,6 +533,28 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
EXPORT_SYMBOL(__fscache_relinquish_cookie);

/*
+ * check the consistency between the netfs inode and the backing cache
+ *
+ * NOTE: it only serves no-index type
+ */
+bool __fscache_check_consistency(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+
+ if (cookie->def->type != FSCACHE_COOKIE_TYPE_DATAFILE)
+ return false;
+
+ if (hlist_empty(&cookie->backing_objects))
+ return false;
+
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ return object->cache->ops->check_consistency(object);
+}
+EXPORT_SYMBOL(__fscache_check_consistency);
+
+/*
* destroy a cookie
*/
void __fscache_cookie_put(struct fscache_cookie *cookie)
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index a9ff9a3..5513342 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -302,6 +302,10 @@ struct fscache_cache_ops {

/* dissociate a cache from all the pages it was backing */
void (*dissociate_pages)(struct fscache_cache *cache);
+
+ /* check the consistency between the backing cache and the FS-Cache
+ * cookie */
+ bool (*check_consistency)(struct fscache_object *object);
};

/*
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 7a08623..7a49e8f 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -183,6 +183,7 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
const struct fscache_cookie_def *,
void *);
extern void __fscache_relinquish_cookie(struct fscache_cookie *, int);
+extern bool __fscache_check_consistency(struct fscache_cookie *);
extern void __fscache_update_cookie(struct fscache_cookie *);
extern int __fscache_attr_changed(struct fscache_cookie *);
extern void __fscache_invalidate(struct fscache_cookie *);
@@ -326,6 +327,22 @@ void fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
}

/**
+ * fscache_check_consistency - Request that if the cache is updated
+ * @cookie: The cookie representing the cache object
+ *
+ * Request an consistency check from fscache, which resorts to backing
+ * cache.
+ */
+static inline
+bool fscache_check_consistency(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_check_consistency(cookie);
+ else
+ return false;
+}
+
+/**
* fscache_update_cookie - Request that a cache object be updated
* @cookie: The cookie representing the cache object
*
--
1.8.1.2

2013-08-21 21:29:56

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 3/5] ceph: use fscache as a local presisent cache

Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)

In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.

Signed-off-by: Milosz Tanski <[email protected]>
---
fs/ceph/Kconfig | 9 ++
fs/ceph/Makefile | 2 +
fs/ceph/addr.c | 37 +++++--
fs/ceph/cache.c | 311 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ceph/cache.h | 123 ++++++++++++++++++++++
fs/ceph/caps.c | 19 +++-
fs/ceph/file.c | 17 +++
fs/ceph/inode.c | 66 +++++++++++-
fs/ceph/super.c | 47 ++++++++-
fs/ceph/super.h | 17 +++
10 files changed, 636 insertions(+), 12 deletions(-)
create mode 100644 fs/ceph/cache.c
create mode 100644 fs/ceph/cache.h

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 49bc782..ac9a2ef 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -16,3 +16,12 @@ config CEPH_FS

If unsure, say N.

+if CEPH_FS
+config CEPH_FSCACHE
+ bool "Enable Ceph client caching support"
+ depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
+ help
+ Choose Y here to enable persistent, read-only local
+ caching support for Ceph clients using FS-Cache
+
+endif
diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index bd35212..0af0678 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -9,3 +9,5 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
mds_client.o mdsmap.o strings.o ceph_frag.o \
debugfs.o

+ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
+
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index cb78ce8..632bb48 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -11,6 +11,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"
#include <linux/ceph/osd_client.h>

/*
@@ -159,6 +160,11 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
return;
}

+ ceph_invalidate_fscache_page(inode, page);
+
+ if (!PagePrivate(page))
+ return;
+
/*
* We can get non-dirty pages here due to races between
* set_page_dirty and truncate_complete_page; just spit out a
@@ -178,14 +184,17 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
ClearPagePrivate(page);
}

-/* just a sanity check */
static int ceph_releasepage(struct page *page, gfp_t g)
{
struct inode *inode = page->mapping ? page->mapping->host : NULL;
dout("%p releasepage %p idx %lu\n", inode, page, page->index);
WARN_ON(PageDirty(page));
- WARN_ON(PagePrivate(page));
- return 0;
+
+ /* Can we release the page from the cache? */
+ if (!ceph_release_fscache_page(page, g))
+ return 0;
+
+ return !PagePrivate(page);
}

/*
@@ -195,11 +204,16 @@ static int readpage_nounlock(struct file *filp, struct page *page)
{
struct inode *inode = file_inode(filp);
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_osd_client *osdc =
+ struct ceph_osd_client *osdc =
&ceph_inode_to_client(inode)->client->osdc;
int err = 0;
u64 len = PAGE_CACHE_SIZE;

+ err = ceph_readpage_from_fscache(inode, page);
+
+ if (err == 0)
+ goto out;
+
dout("readpage inode %p file %p page %p index %lu\n",
inode, filp, page, page->index);
err = ceph_osdc_readpages(osdc, ceph_vino(inode), &ci->i_layout,
@@ -217,6 +231,9 @@ static int readpage_nounlock(struct file *filp, struct page *page)
}
SetPageUptodate(page);

+ if (err == 0)
+ ceph_readpage_to_fscache(inode, page);
+
out:
return err < 0 ? err : 0;
}
@@ -259,6 +276,7 @@ static void finish_read(struct ceph_osd_request *req, struct ceph_msg *msg)
page->index);
flush_dcache_page(page);
SetPageUptodate(page);
+ ceph_readpage_to_fscache(inode, page);
unlock_page(page);
page_cache_release(page);
bytes -= PAGE_CACHE_SIZE;
@@ -328,7 +346,7 @@ static int start_read(struct inode *inode, struct list_head *page_list, int max)
page = list_entry(page_list->prev, struct page, lru);
BUG_ON(PageLocked(page));
list_del(&page->lru);
-
+
dout("start_read %p adding %p idx %lu\n", inode, page,
page->index);
if (add_to_page_cache_lru(page, &inode->i_data, page->index,
@@ -375,6 +393,12 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
int rc = 0;
int max = 0;

+ rc = ceph_readpages_from_fscache(mapping->host, mapping, page_list,
+ &nr_pages);
+
+ if (rc == 0)
+ goto out;
+
if (fsc->mount_options->rsize >= PAGE_CACHE_SIZE)
max = (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
>> PAGE_SHIFT;
@@ -494,6 +518,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
set_bdi_congested(&fsc->backing_dev_info, BLK_RW_ASYNC);

+ ceph_readpage_to_fscache(inode, page);
+
set_page_writeback(page);
err = ceph_osdc_writepages(osdc, ceph_vino(inode),
&ci->i_layout, snapc,
@@ -549,7 +575,6 @@ static void ceph_release_pages(struct page **pages, int num)
pagevec_release(&pvec);
}

-
/*
* async writeback completion handler.
*
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
new file mode 100644
index 0000000..a5ad9c3
--- /dev/null
+++ b/fs/ceph/cache.c
@@ -0,0 +1,311 @@
+/*
+ * Ceph cache definitions.
+ *
+ * Copyright (C) 2013 by Adfin Solutions, Inc. All Rights Reserved.
+ * Written by Milosz Tanski ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to:
+ * Free Software Foundation
+ * 51 Franklin Street, Fifth Floor
+ * Boston, MA 02111-1301 USA
+ *
+ */
+
+#include "super.h"
+#include "cache.h"
+
+struct ceph_aux_inode {
+ struct timespec mtime;
+ loff_t size;
+};
+
+struct fscache_netfs ceph_cache_netfs = {
+ .name = "ceph",
+ .version = 0,
+};
+
+static uint16_t ceph_fscache_session_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t maxbuf)
+{
+ const struct ceph_fs_client* fsc = cookie_netfs_data;
+ uint16_t klen;
+
+ klen = sizeof(fsc->client->fsid);
+ if (klen > maxbuf)
+ return 0;
+
+ memcpy(buffer, &fsc->client->fsid, klen);
+ return klen;
+}
+
+static const struct fscache_cookie_def ceph_fscache_fsid_object_def = {
+ .name = "CEPH.fsid",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = ceph_fscache_session_get_key,
+};
+
+void ceph_fscache_register_fsid_cookie(struct ceph_fs_client* fsc)
+{
+ fsc->fscache = fscache_acquire_cookie(ceph_cache_netfs.primary_index,
+ &ceph_fscache_fsid_object_def,
+ fsc);
+}
+
+void ceph_fscache_unregister_fsid_cookie(struct ceph_fs_client* fsc)
+{
+ fscache_relinquish_cookie(fsc->fscache, 0);
+ fsc->fscache = NULL;
+}
+
+static uint16_t ceph_fscache_inode_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t maxbuf)
+{
+ const struct ceph_inode_info* ci = cookie_netfs_data;
+ uint16_t klen;
+
+ /* use ceph virtual inode (id + snaphot) */
+ klen = sizeof(ci->i_vino);
+ if (klen > maxbuf)
+ return 0;
+
+ memcpy(buffer, &ci->i_vino, klen);
+ return klen;
+}
+
+static uint16_t ceph_fscache_inode_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ struct ceph_aux_inode aux;
+ const struct ceph_inode_info* ci = cookie_netfs_data;
+ const struct inode* inode = &ci->vfs_inode;
+
+ memset(&aux, 0, sizeof(aux));
+ aux.mtime = inode->i_mtime;
+ aux.size = inode->i_size;
+
+ memcpy(buffer, &aux, sizeof(aux));
+
+ return sizeof(aux);
+}
+
+static void ceph_fscache_inode_get_attr(const void *cookie_netfs_data,
+ uint64_t *size)
+{
+ const struct ceph_inode_info* ci = cookie_netfs_data;
+ const struct inode* inode = &ci->vfs_inode;
+
+ *size = inode->i_size;
+}
+
+static enum fscache_checkaux ceph_fscache_inode_check_aux(
+ void *cookie_netfs_data, const void *data, uint16_t dlen)
+{
+ struct ceph_aux_inode aux;
+ struct ceph_inode_info* ci = cookie_netfs_data;
+ struct inode* inode = &ci->vfs_inode;
+
+ if (dlen != sizeof(aux))
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ memset(&aux, 0, sizeof(aux));
+ aux.mtime = inode->i_mtime;
+ aux.size = inode->i_size;
+
+ if (memcmp(data, &aux, sizeof(aux)) != 0)
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ dout("ceph inode 0x%p cached okay", ci);
+ return FSCACHE_CHECKAUX_OKAY;
+}
+
+static void ceph_fscache_inode_now_uncached(void* cookie_netfs_data)
+{
+ struct ceph_inode_info* ci = cookie_netfs_data;
+ struct pagevec pvec;
+ pgoff_t first;
+ int loop, nr_pages;
+
+ pagevec_init(&pvec, 0);
+ first = 0;
+
+ dout("ceph inode 0x%p now uncached", ci);
+
+ while (1) {
+ nr_pages = pagevec_lookup(&pvec, ci->vfs_inode.i_mapping, first,
+ PAGEVEC_SIZE - pagevec_count(&pvec));
+
+ if (!nr_pages)
+ break;
+
+ for (loop = 0; loop < nr_pages; loop++)
+ ClearPageFsCache(pvec.pages[loop]);
+
+ first = pvec.pages[nr_pages - 1]->index + 1;
+
+ pvec.nr = nr_pages;
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+}
+
+static const struct fscache_cookie_def ceph_fscache_inode_object_def = {
+ .name = "CEPH.inode",
+ .type = FSCACHE_COOKIE_TYPE_DATAFILE,
+ .get_key = ceph_fscache_inode_get_key,
+ .get_attr = ceph_fscache_inode_get_attr,
+ .get_aux = ceph_fscache_inode_get_aux,
+ .check_aux = ceph_fscache_inode_check_aux,
+ .now_uncached = ceph_fscache_inode_now_uncached,
+};
+
+void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc,
+ struct ceph_inode_info* ci)
+{
+ struct inode* inode = &ci->vfs_inode;
+
+ /* No caching for filesystem */
+ if (fsc->fscache == NULL)
+ return;
+
+ /* Only cache for regular files that are read only */
+ if ((ci->vfs_inode.i_mode & S_IFREG) == 0)
+ return;
+
+ /* Avoid multiple racing open requests */
+ mutex_lock(&inode->i_mutex);
+
+ if (ci->fscache)
+ goto done;
+
+ ci->fscache = fscache_acquire_cookie(fsc->fscache,
+ &ceph_fscache_inode_object_def,
+ ci);
+done:
+ mutex_unlock(&inode->i_mutex);
+
+}
+
+void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+{
+ struct fscache_cookie* cookie;
+
+ if ((cookie = ci->fscache) == NULL)
+ return;
+
+ ci->fscache = NULL;
+
+ fscache_uncache_all_inode_pages(cookie, &ci->vfs_inode);
+ fscache_relinquish_cookie(cookie, 0);
+}
+
+static void ceph_vfs_readpage_complete(struct page *page, void *data, int error)
+{
+ if (!error)
+ SetPageUptodate(page);
+}
+
+static void ceph_vfs_readpage_complete_unlock(struct page *page, void *data, int error)
+{
+ if (!error)
+ SetPageUptodate(page);
+
+ unlock_page(page);
+}
+
+static inline int cache_valid(struct ceph_inode_info *ci)
+{
+ return ((ceph_caps_issued(ci) & CEPH_CAP_FILE_CACHE) &&
+ (ci->i_fscache_gen == ci->i_rdcache_gen));
+}
+
+
+/* Atempt to read from the fscache,
+ *
+ * This function is called from the readpage_nounlock context. DO NOT attempt to
+ * unlock the page here (or in the callback).
+ */
+int __ceph_readpage_from_fscache(struct inode *inode, struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ int ret;
+
+ if (!cache_valid(ci))
+ return -ENOBUFS;
+
+ ret = fscache_read_or_alloc_page(ci->fscache, page,
+ ceph_vfs_readpage_complete, NULL,
+ GFP_KERNEL);
+
+ switch (ret) {
+ case 0: /* Page found */
+ dout("page read submitted\n");
+ return 0;
+ case -ENOBUFS: /* Pages were not found, and can't be */
+ case -ENODATA: /* Pages were not found */
+ dout("page/inode not in cache\n");
+ return ret;
+ default:
+ dout("%s: unknown error ret = %i\n", __func__, ret);
+ return ret;
+ }
+}
+
+int __ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ int ret;
+
+ if (!cache_valid(ci))
+ return -ENOBUFS;
+
+ ret = fscache_read_or_alloc_pages(ci->fscache, mapping, pages, nr_pages,
+ ceph_vfs_readpage_complete_unlock,
+ NULL, mapping_gfp_mask(mapping));
+
+ switch (ret) {
+ case 0: /* All pages found */
+ dout("all-page read submitted\n");
+ return 0;
+ case -ENOBUFS: /* Some pages were not found, and can't be */
+ case -ENODATA: /* some pages were not found */
+ dout("page/inode not in cache\n");
+ return ret;
+ default:
+ dout("%s: unknown error ret = %i\n", __func__, ret);
+ return ret;
+ }
+}
+
+void __ceph_readpage_to_fscache(struct inode *inode, struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ int ret;
+
+ if (!cache_valid(ci))
+ return;
+
+ ret = fscache_write_page(ci->fscache, page, GFP_KERNEL);
+ if (ret)
+ fscache_uncache_page(ci->fscache, page);
+}
+
+void __ceph_invalidate_fscache_page(struct inode* inode, struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ fscache_wait_on_page_write(ci->fscache, page);
+ fscache_uncache_page(ci->fscache, page);
+}
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
new file mode 100644
index 0000000..23f2666
--- /dev/null
+++ b/fs/ceph/cache.h
@@ -0,0 +1,123 @@
+/*
+ * Ceph cache definitions.
+ *
+ * Copyright (C) 2013 by Adfin Solutions, Inc. All Rights Reserved.
+ * Written by Milosz Tanski ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to:
+ * Free Software Foundation
+ * 51 Franklin Street, Fifth Floor
+ * Boston, MA 02111-1301 USA
+ *
+ */
+
+#ifndef _CEPH_CACHE_H
+#define _CEPH_CACHE_H
+
+#include <linux/fscache.h>
+
+
+extern struct fscache_netfs ceph_cache_netfs;
+
+
+void ceph_fscache_register_fsid_cookie(struct ceph_fs_client* fsc);
+void ceph_fscache_unregister_fsid_cookie(struct ceph_fs_client* fsc);
+void ceph_fscache_register_inode_cookie(struct ceph_fs_client* parent_fsc,
+ struct ceph_inode_info* ci);
+void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci);
+
+int __ceph_readpage_from_fscache(struct inode *inode, struct page *page);
+int __ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages);
+void __ceph_readpage_to_fscache(struct inode *inode, struct page *page);
+void __ceph_invalidate_fscache_page(struct inode* inode, struct page *page);
+
+#ifdef CONFIG_CEPH_FSCACHE
+
+
+static inline int ceph_readpage_from_fscache(struct inode* inode,
+ struct page *page)
+{
+ return __ceph_readpage_from_fscache(inode, page);
+}
+
+static inline int ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ return __ceph_readpages_from_fscache(inode, mapping, pages,
+ nr_pages);
+}
+
+static inline void ceph_readpage_to_fscache(struct inode *inode,
+ struct page *page)
+{
+ return __ceph_readpage_to_fscache(inode, page);
+}
+
+static inline void ceph_invalidate_fscache_page(struct inode *inode,
+ struct page *page)
+{
+ return __ceph_invalidate_fscache_page(inode, page);
+}
+
+static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
+{
+ struct inode* inode = page->mapping->host;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ return fscache_maybe_release_page(ci->fscache, page, gfp);
+}
+
+#else
+
+static inline int ceph_readpage_from_fscache(struct inode* inode,
+ struct page *page)
+{
+ return -ENOBUFS;
+}
+
+static inline int ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ return -ENOBUFS;
+}
+
+static inline void ceph_readpage_to_fscache(struct inode *inode,
+ struct page *page)
+{
+}
+
+static inline void ceph_invalidate_fscache_page(struct inode *inode,
+ struct page *page)
+{
+}
+
+static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
+{
+ return 1;
+}
+
+static void ceph_fscache_readpages_cancel(struct inode *inode,
+ struct list_head *pages)
+{
+
+}
+
+#endif
+
+#endif
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 5a26bc1..a94ca4b 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -10,6 +10,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"
#include <linux/ceph/decode.h>
#include <linux/ceph/messenger.h>

@@ -479,8 +480,9 @@ static void __check_cap_issue(struct ceph_inode_info *ci, struct ceph_cap *cap,
* i_rdcache_gen.
*/
if ((issued & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) &&
- (had & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0)
+ (had & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0) {
ci->i_rdcache_gen++;
+ }

/*
* if we are newly issued FILE_SHARED, mark dir not complete; we
@@ -2395,6 +2397,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
int writeback = 0;
int queue_invalidate = 0;
int deleted_inode = 0;
+ int queue_revalidate = 0;

dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
inode, cap, mds, seq, ceph_cap_string(newcaps));
@@ -2417,6 +2420,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
ci->i_rdcache_revoking = ci->i_rdcache_gen;
}
}
+
+ fscache_invalidate(ci->fscache);
}

/* side effects now are allowed */
@@ -2458,6 +2463,11 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
}
}

+ /* Do we need to revalidate our fscache cookie. Don't bother on the
+ * first cache cap as we already validate at cookie creation time. */
+ if ((issued & CEPH_CAP_FILE_CACHE) && ci->i_rdcache_gen > 1)
+ queue_revalidate = 1;
+
/* size/ctime/mtime/atime? */
ceph_fill_file_size(inode, issued,
le32_to_cpu(grant->truncate_seq),
@@ -2542,6 +2552,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
BUG_ON(cap->issued & ~cap->implemented);

spin_unlock(&ci->i_ceph_lock);
+
if (writeback)
/*
* queue inode for writeback: we can't actually call
@@ -2553,6 +2564,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
ceph_queue_invalidate(inode);
if (deleted_inode)
invalidate_aliases(inode);
+ if (queue_revalidate)
+ ceph_queue_revalidate(inode);
if (wake)
wake_up_all(&ci->i_cap_wq);

@@ -2709,8 +2722,10 @@ static void handle_cap_trunc(struct inode *inode,
truncate_seq, truncate_size, size);
spin_unlock(&ci->i_ceph_lock);

- if (queue_trunc)
+ if (queue_trunc) {
ceph_queue_vmtruncate(inode);
+ fscache_invalidate(ci->fscache);
+ }
}

/*
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 68af489..b81c75f 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -12,6 +12,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"

/*
* Ceph file operations
@@ -69,9 +70,23 @@ static int ceph_init_file(struct inode *inode, struct file *file, int fmode)
{
struct ceph_file_info *cf;
int ret = 0;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;

switch (inode->i_mode & S_IFMT) {
case S_IFREG:
+ /* First file open request creates the cookie, we want to keep
+ * this cookie around for the filetime of the inode as not to
+ * have to worry about fscache register / revoke / operation
+ * races.
+ *
+ * Also, if we know the operation is going to invalidate data
+ * (non readonly) just nuke the cache right away.
+ */
+ ceph_fscache_register_inode_cookie(mdsc->fsc, ci);
+ if ((fmode & CEPH_FILE_MODE_WR))
+ fscache_invalidate(ci->fscache);
case S_IFDIR:
dout("init_file %p %p 0%o (regular)\n", inode, file,
inode->i_mode);
@@ -182,6 +197,7 @@ int ceph_open(struct inode *inode, struct file *file)
spin_unlock(&ci->i_ceph_lock);
return ceph_init_file(inode, file, fmode);
}
+
spin_unlock(&ci->i_ceph_lock);

dout("open fmode %d wants %s\n", fmode, ceph_cap_string(wanted));
@@ -192,6 +208,7 @@ int ceph_open(struct inode *inode, struct file *file)
}
req->r_inode = inode;
ihold(inode);
+
req->r_num_caps = 1;
if (flags & (O_CREAT|O_TRUNC))
parent_inode = ceph_get_dentry_parent_inode(file->f_dentry);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 602ccd8..5daf7f8 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -12,6 +12,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"
#include <linux/ceph/decode.h>

/*
@@ -31,6 +32,7 @@ static const struct inode_operations ceph_symlink_iops;
static void ceph_invalidate_work(struct work_struct *work);
static void ceph_writeback_work(struct work_struct *work);
static void ceph_vmtruncate_work(struct work_struct *work);
+static void ceph_revalidate_work(struct work_struct *work);

/*
* find or create an inode, given the ceph ino number
@@ -386,6 +388,13 @@ struct inode *ceph_alloc_inode(struct super_block *sb)

INIT_WORK(&ci->i_vmtruncate_work, ceph_vmtruncate_work);

+#ifdef CONFIG_CEPH_FSCACHE
+ ci->fscache = NULL;
+ /* The first load is verifed cookie open time */
+ ci->i_fscache_gen = 1;
+ INIT_WORK(&ci->i_revalidate_work, ceph_revalidate_work);
+#endif
+
return &ci->vfs_inode;
}

@@ -405,6 +414,8 @@ void ceph_destroy_inode(struct inode *inode)

dout("destroy_inode %p ino %llx.%llx\n", inode, ceph_vinop(inode));

+ ceph_fscache_unregister_inode_cookie(ci);
+
ceph_queue_caps_release(inode);

/*
@@ -439,7 +450,6 @@ void ceph_destroy_inode(struct inode *inode)
call_rcu(&inode->i_rcu, ceph_i_callback);
}

-
/*
* Helpers to fill in size, ctime, mtime, and atime. We have to be
* careful because either the client or MDS may have more up to date
@@ -491,6 +501,10 @@ int ceph_fill_file_size(struct inode *inode, int issued,
truncate_size);
ci->i_truncate_size = truncate_size;
}
+
+ if (queue_trunc)
+ fscache_invalidate(ci->fscache);
+
return queue_trunc;
}

@@ -1079,7 +1093,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req,
* complete.
*/
ceph_set_dentry_offset(req->r_old_dentry);
- dout("dn %p gets new offset %lld\n", req->r_old_dentry,
+ dout("dn %p gets new offset %lld\n", req->r_old_dentry,
ceph_dentry(req->r_old_dentry)->offset);

dn = req->r_old_dentry; /* use old_dentry */
@@ -1494,6 +1508,7 @@ void ceph_queue_vmtruncate(struct inode *inode)
struct ceph_inode_info *ci = ceph_inode(inode);

ihold(inode);
+
if (queue_work(ceph_sb_to_client(inode->i_sb)->trunc_wq,
&ci->i_vmtruncate_work)) {
dout("ceph_queue_vmtruncate %p\n", inode);
@@ -1565,6 +1580,53 @@ retry:
wake_up_all(&ci->i_cap_wq);
}

+static void ceph_revalidate_work(struct work_struct *work)
+{
+ int issued;
+ u32 orig_gen;
+ struct ceph_inode_info *ci = container_of(work, struct ceph_inode_info,
+ i_revalidate_work);
+ struct inode *inode = &ci->vfs_inode;
+
+ spin_lock(&ci->i_ceph_lock);
+ issued = __ceph_caps_issued(ci, NULL);
+ orig_gen = ci->i_rdcache_gen;
+ spin_unlock(&ci->i_ceph_lock);
+
+ if (!(issued & CEPH_CAP_FILE_CACHE)) {
+ dout("revalidate_work lost cache before validation %p\n",
+ inode);
+ goto out;
+ }
+
+ if (!fscache_check_consistency(ci->fscache))
+ fscache_invalidate(ci->fscache);
+
+ spin_lock(&ci->i_ceph_lock);
+ /* Update the new valid generation (backwards sanity check too) */
+ if (orig_gen > ci->i_fscache_gen) {
+ ci->i_fscache_gen = orig_gen;
+ }
+ spin_unlock(&ci->i_ceph_lock);
+
+out:
+ iput(&ci->vfs_inode);
+}
+
+void ceph_queue_revalidate(struct inode *inode)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ ihold(inode);
+
+ if (queue_work(ceph_sb_to_client(inode->i_sb)->revalidate_wq,
+ &ci->i_revalidate_work)) {
+ dout("ceph_queue_revalidate %p\n", inode);
+ } else {
+ dout("ceph_queue_revalidate %p failed\n)", inode);
+ iput(inode);
+ }
+}

/*
* symlinks
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 6627b26..a56baab 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -17,6 +17,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"

#include <linux/ceph/ceph_features.h>
#include <linux/ceph/decode.h>
@@ -142,6 +143,8 @@ enum {
Opt_nodcache,
Opt_ino32,
Opt_noino32,
+ Opt_fscache,
+ Opt_nofscache
};

static match_table_t fsopt_tokens = {
@@ -167,6 +170,8 @@ static match_table_t fsopt_tokens = {
{Opt_nodcache, "nodcache"},
{Opt_ino32, "ino32"},
{Opt_noino32, "noino32"},
+ {Opt_fscache, "fsc"},
+ {Opt_nofscache, "nofsc"},
{-1, NULL}
};

@@ -260,6 +265,12 @@ static int parse_fsopt_token(char *c, void *private)
case Opt_noino32:
fsopt->flags &= ~CEPH_MOUNT_OPT_INO32;
break;
+ case Opt_fscache:
+ fsopt->flags |= CEPH_MOUNT_OPT_FSCACHE;
+ break;
+ case Opt_nofscache:
+ fsopt->flags &= ~CEPH_MOUNT_OPT_FSCACHE;
+ break;
default:
BUG_ON(token);
}
@@ -422,6 +433,10 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
seq_puts(m, ",dcache");
else
seq_puts(m, ",nodcache");
+ if (fsopt->flags & CEPH_MOUNT_OPT_FSCACHE)
+ seq_puts(m, ",fsc");
+ else
+ seq_puts(m, ",nofsc");

if (fsopt->wsize)
seq_printf(m, ",wsize=%d", fsopt->wsize);
@@ -530,11 +545,24 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
if (!fsc->wb_pagevec_pool)
goto fail_trunc_wq;

+#ifdef CONFIG_CEPH_FSCACHE
+ if ((fsopt->flags & CEPH_MOUNT_OPT_FSCACHE))
+ ceph_fscache_register_fsid_cookie(fsc);
+
+ fsc->revalidate_wq = alloc_workqueue("ceph-revalidate", 0, 1);
+ if (fsc->revalidate_wq == NULL)
+ goto fail_fscache;
+#endif
+
/* caps */
fsc->min_caps = fsopt->max_readdir;

return fsc;

+#ifdef CONFIG_CEPH_FSCACHE
+fail_fscache:
+ ceph_fscache_unregister_fsid_cookie(fsc);
+#endif
fail_trunc_wq:
destroy_workqueue(fsc->trunc_wq);
fail_pg_inv_wq:
@@ -554,6 +582,10 @@ static void destroy_fs_client(struct ceph_fs_client *fsc)
{
dout("destroy_fs_client %p\n", fsc);

+#ifdef CONFIG_CEPH_FSCACHE
+ ceph_fscache_unregister_fsid_cookie(fsc);
+#endif
+
destroy_workqueue(fsc->wb_wq);
destroy_workqueue(fsc->pg_inv_wq);
destroy_workqueue(fsc->trunc_wq);
@@ -588,6 +620,8 @@ static void ceph_inode_init_once(void *foo)

static int __init init_caches(void)
{
+ int error = -ENOMEM;
+
ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
sizeof(struct ceph_inode_info),
__alignof__(struct ceph_inode_info),
@@ -611,15 +645,19 @@ static int __init init_caches(void)
if (ceph_file_cachep == NULL)
goto bad_file;

- return 0;
+#ifdef CONFIG_CEPH_FSCACHE
+ if ((error = fscache_register_netfs(&ceph_cache_netfs)))
+ goto bad_file;
+#endif

+ return 0;
bad_file:
kmem_cache_destroy(ceph_dentry_cachep);
bad_dentry:
kmem_cache_destroy(ceph_cap_cachep);
bad_cap:
kmem_cache_destroy(ceph_inode_cachep);
- return -ENOMEM;
+ return error;
}

static void destroy_caches(void)
@@ -629,10 +667,15 @@ static void destroy_caches(void)
* destroy cache.
*/
rcu_barrier();
+
kmem_cache_destroy(ceph_inode_cachep);
kmem_cache_destroy(ceph_cap_cachep);
kmem_cache_destroy(ceph_dentry_cachep);
kmem_cache_destroy(ceph_file_cachep);
+
+#ifdef CONFIG_CEPH_FSCACHE
+ fscache_unregister_netfs(&ceph_cache_netfs);
+#endif
}


diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index f1e4e47..72eac24 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -16,6 +16,10 @@

#include <linux/ceph/libceph.h>

+#ifdef CONFIG_CEPH_FSCACHE
+#include <linux/fscache.h>
+#endif
+
/* f_type in struct statfs */
#define CEPH_SUPER_MAGIC 0x00c36400

@@ -29,6 +33,7 @@
#define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */
#define CEPH_MOUNT_OPT_INO32 (1<<8) /* 32 bit inos */
#define CEPH_MOUNT_OPT_DCACHE (1<<9) /* use dcache for readdir etc */
+#define CEPH_MOUNT_OPT_FSCACHE (1<<10) /* use fscache */

#define CEPH_MOUNT_OPT_DEFAULT (CEPH_MOUNT_OPT_RBYTES)

@@ -90,6 +95,11 @@ struct ceph_fs_client {
struct dentry *debugfs_bdi;
struct dentry *debugfs_mdsc, *debugfs_mdsmap;
#endif
+
+#ifdef CONFIG_CEPH_FSCACHE
+ struct fscache_cookie *fscache;
+ struct workqueue_struct *revalidate_wq;
+#endif
};


@@ -320,6 +330,12 @@ struct ceph_inode_info {

struct work_struct i_vmtruncate_work;

+#ifdef CONFIG_CEPH_FSCACHE
+ struct fscache_cookie *fscache;
+ u32 i_fscache_gen; /* sequence, for delayed fscache validate */
+ struct work_struct i_revalidate_work;
+#endif
+
struct inode vfs_inode; /* at end */
};

@@ -700,6 +716,7 @@ extern void ceph_queue_vmtruncate(struct inode *inode);

extern void ceph_queue_invalidate(struct inode *inode);
extern void ceph_queue_writeback(struct inode *inode);
+extern void ceph_queue_revalidate(struct inode *inode);

extern int ceph_do_getattr(struct inode *inode, int mask);
extern int ceph_permission(struct inode *inode, int mask);
--
1.8.1.2

2013-08-21 21:30:11

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 4/5] fscache: netfs function for cleanup post readpages

Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback. It marks all the pages in the list provided
by readahead with PgPrivate2. In the cases that the netfs fails to read all the
pages (which is legal) it ends up returning to the readahead and triggering a
BUG. This happens because the page list still contains marked pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages. It will revoke the pages from the
underlying cache backend and unmark them.

This addresses this BUG being triggered by netfs code:

[12410647.597278] BUG: Bad page state in process petabucket pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345] [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356] [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363] [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365] [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376] [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379] [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382] [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384] [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391] [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395] [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405] [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418] [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422] [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425] [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431] [<ffffffff81558818>] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski <[email protected]>
---
fs/fscache/page.c | 16 ++++++++++++++++
include/linux/fscache.h | 22 ++++++++++++++++++++++
2 files changed, 38 insertions(+)

diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index d479ab3..0cc3153 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -1132,3 +1132,19 @@ void __fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
_leave("");
}
EXPORT_SYMBOL(__fscache_uncache_all_inode_pages);
+
+/**
+ * Unmark pages allocate in the readahead code path (via:
+ * fscache_readpages_or_alloc) after delegating to the base filesystem
+ */
+void __fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages)
+{
+ struct page *page;
+
+ list_for_each_entry(page, pages, lru) {
+ if (PageFsCache(page))
+ __fscache_uncache_page(cookie, page);
+ }
+}
+EXPORT_SYMBOL(__fscache_readpages_cancel);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 7a49e8f..c324177 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -209,6 +209,8 @@ extern bool __fscache_maybe_release_page(struct fscache_cookie *, struct page *,
gfp_t);
extern void __fscache_uncache_all_inode_pages(struct fscache_cookie *,
struct inode *);
+extern void __fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages);

/**
* fscache_register_netfs - Register a filesystem as desiring caching services
@@ -719,4 +721,24 @@ void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
__fscache_uncache_all_inode_pages(cookie, inode);
}

+/**
+ * fscache_readpages_cancel
+ * @cookie: The cookie representing the inode's cache object.
+ * @pages: The netfs pages that we canceled write on in readpages()
+ *
+ * Uncache/unreserve the pages reserved earlier in readpages() via
+ * fscache_readpages_or_alloc(). In most successful caches in readpages() this
+ * doesn't do anything. In cases when the underlying netfs's readahead failed
+ * we need to cleanup the pagelist (unmark and uncache).
+ *
+ * This function may sleep (if it's calling to the cache backend).
+ */
+static inline
+void fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_readpages_cancel(cookie, pages);
+}
+
#endif /* _LINUX_FSCACHE_H */
--
1.8.1.2

2013-08-21 21:30:29

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 5/5] ceph: clean PgPrivate2 on returning from readpages

In some cases the ceph readapages code code bails without filling all the pages
already marked by fscache. When we return back to readahead code this causes
a BUG.

Signed-off-by: Milosz Tanski <[email protected]>
---
fs/ceph/addr.c | 2 ++
fs/ceph/cache.h | 7 +++++++
2 files changed, 9 insertions(+)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 632bb48..e8ea383 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -413,6 +413,8 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
BUG_ON(rc == 0);
}
out:
+ ceph_fscache_readpages_cancel(inode, page_list);
+
dout("readpages %p file %p ret %d\n", inode, file, rc);
return rc;
}
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index 23f2666..a0642ee 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -81,6 +81,13 @@ static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
return fscache_maybe_release_page(ci->fscache, page, gfp);
}

+static inline void ceph_fscache_readpages_cancel(struct inode *inode,
+ struct list_head *pages)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ return fscache_readpages_cancel(ci->fscache, pages);
+}
+
#else

static inline int ceph_readpage_from_fscache(struct inode* inode,
--
1.8.1.2

2013-08-22 05:05:20

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCHv4 0/5] ceph: persistent caching with fscache

Hi Milosz,

I've pulled this into the ceph testing branch to make sure it holds up
in qa.

David, are the fscache patches here ready for the next merge window? Do
you have a preference for whose tree they go through?

Thanks!
sage


On Wed, 21 Aug 2013, Milosz Tanski wrote:

> This an updated version of the fscache support for the Ceph filesystem. What's
> changed since the last patchset:
>
> 1. Sparater the readpages refactor into it's own patches. These were already
> accepted into the testing branch.
>
> 2. Tracked down the BUG in readahead cleanup code. We were returning with pages
> marked as private_2 from readpages(). I added a simple convenience function
> to the fscache netfs interface for cleaning up the page list at the end of
> asop readpages(). I know other filesystems (NFS) have ran into that since
> I've seen a few similar traces in Google search and the cachefs mailig list.
> The second patch new make Ceph use this interface.
>
> I've been running this code (minus the BUG fix) on clients for a couple weeks
> with moderate 24/7 use without issues. At this point in time I feel like it's
> solid enough to go into the ceph kclient.
>
> Please pull the code from my repository:
> https://bitbucket.org/adfin/linux-fs.git branch: wip-ceph-fscache
>
> The first two patches I included were not written by me but were written by
> Hongyi Jia. He implemented the cookie re-validation scheme into fscache core.
>
> Finally, I CCed a couple other mailing lists (fsdevel, nfs) because the new
> fscache_readpages_cancel() should be used by other filesystems to avoid the
> same problem.
>
>
> Hongyi Jia (2):
> new cachefiles interface to check cache consistency
> new fscache interface to check cache consistency
>
> Milosz Tanski (3):
> ceph: use fscache as a local presisent cache
> fscache: netfs function for cleanup post readpages
> ceph: clean PgPrivate2 on returning from readpages
>
> fs/cachefiles/interface.c | 19 +++
> fs/cachefiles/internal.h | 1 +
> fs/cachefiles/xattr.c | 39 ++++++
> fs/ceph/Kconfig | 9 ++
> fs/ceph/Makefile | 2 +
> fs/ceph/addr.c | 39 +++++-
> fs/ceph/cache.c | 311 ++++++++++++++++++++++++++++++++++++++++++
> fs/ceph/cache.h | 130 ++++++++++++++++++
> fs/ceph/caps.c | 19 ++-
> fs/ceph/file.c | 17 +++
> fs/ceph/inode.c | 66 ++++++++-
> fs/ceph/super.c | 47 ++++++-
> fs/ceph/super.h | 17 +++
> fs/fscache/cookie.c | 22 +++
> fs/fscache/page.c | 16 +++
> include/linux/fscache-cache.h | 4 +
> include/linux/fscache.h | 39 ++++++
> 17 files changed, 785 insertions(+), 12 deletions(-)
> create mode 100644 fs/ceph/cache.c
> create mode 100644 fs/ceph/cache.h
>
> --
> 1.8.1.2
>
>

2013-08-22 12:58:07

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCHv4 0/5] ceph: persistent caching with fscache

Sage,

Thanks. If you run into any problems please let me know and I'll
resolve them quick and I'll make sure I'm watching the qa mailing
list.

- Milosz

On Thu, Aug 22, 2013 at 1:05 AM, Sage Weil <[email protected]> wrote:
> Hi Milosz,
>
> I've pulled this into the ceph testing branch to make sure it holds up
> in qa.
>
> David, are the fscache patches here ready for the next merge window? Do
> you have a preference for whose tree they go through?
>
> Thanks!
> sage
>
>
> On Wed, 21 Aug 2013, Milosz Tanski wrote:
>
>> This an updated version of the fscache support for the Ceph filesystem. What's
>> changed since the last patchset:
>>
>> 1. Sparater the readpages refactor into it's own patches. These were already
>> accepted into the testing branch.
>>
>> 2. Tracked down the BUG in readahead cleanup code. We were returning with pages
>> marked as private_2 from readpages(). I added a simple convenience function
>> to the fscache netfs interface for cleaning up the page list at the end of
>> asop readpages(). I know other filesystems (NFS) have ran into that since
>> I've seen a few similar traces in Google search and the cachefs mailig list.
>> The second patch new make Ceph use this interface.
>>
>> I've been running this code (minus the BUG fix) on clients for a couple weeks
>> with moderate 24/7 use without issues. At this point in time I feel like it's
>> solid enough to go into the ceph kclient.
>>
>> Please pull the code from my repository:
>> https://bitbucket.org/adfin/linux-fs.git branch: wip-ceph-fscache
>>
>> The first two patches I included were not written by me but were written by
>> Hongyi Jia. He implemented the cookie re-validation scheme into fscache core.
>>
>> Finally, I CCed a couple other mailing lists (fsdevel, nfs) because the new
>> fscache_readpages_cancel() should be used by other filesystems to avoid the
>> same problem.
>>
>>
>> Hongyi Jia (2):
>> new cachefiles interface to check cache consistency
>> new fscache interface to check cache consistency
>>
>> Milosz Tanski (3):
>> ceph: use fscache as a local presisent cache
>> fscache: netfs function for cleanup post readpages
>> ceph: clean PgPrivate2 on returning from readpages
>>
>> fs/cachefiles/interface.c | 19 +++
>> fs/cachefiles/internal.h | 1 +
>> fs/cachefiles/xattr.c | 39 ++++++
>> fs/ceph/Kconfig | 9 ++
>> fs/ceph/Makefile | 2 +
>> fs/ceph/addr.c | 39 +++++-
>> fs/ceph/cache.c | 311 ++++++++++++++++++++++++++++++++++++++++++
>> fs/ceph/cache.h | 130 ++++++++++++++++++
>> fs/ceph/caps.c | 19 ++-
>> fs/ceph/file.c | 17 +++
>> fs/ceph/inode.c | 66 ++++++++-
>> fs/ceph/super.c | 47 ++++++-
>> fs/ceph/super.h | 17 +++
>> fs/fscache/cookie.c | 22 +++
>> fs/fscache/page.c | 16 +++
>> include/linux/fscache-cache.h | 4 +
>> include/linux/fscache.h | 39 ++++++
>> 17 files changed, 785 insertions(+), 12 deletions(-)
>> create mode 100644 fs/ceph/cache.c
>> create mode 100644 fs/ceph/cache.h
>>
>> --
>> 1.8.1.2
>>
>>

2013-08-27 01:05:20

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 3/5] ceph: use fscache as a local presisent cache

On Wed, 21 Aug 2013, Milosz Tanski wrote:
> Adding support for fscache to the Ceph filesystem. This would bring it to on
> par with some of the other network filesystems in Linux (like NFS, AFS, etc...)
>
> In order to mount the filesystem with fscache the 'fsc' mount option must be
> passed.
>
> Signed-off-by: Milosz Tanski <[email protected]>

I fixed up a couple build errors when adding this to the tree and realized
a few things need to be cleaned up first. Basically, any #ifdef
CONFIG_CEPH_FSCACHE outside of a header file is a no-no. Everything in
cache.h that is outside of the #ifdef block should be moved in, and no-op
variants added in the #else block.

More below:

> ---
> fs/ceph/Kconfig | 9 ++
> fs/ceph/Makefile | 2 +
> fs/ceph/addr.c | 37 +++++--
> fs/ceph/cache.c | 311 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/ceph/cache.h | 123 ++++++++++++++++++++++
> fs/ceph/caps.c | 19 +++-
> fs/ceph/file.c | 17 +++
> fs/ceph/inode.c | 66 +++++++++++-
> fs/ceph/super.c | 47 ++++++++-
> fs/ceph/super.h | 17 +++
> 10 files changed, 636 insertions(+), 12 deletions(-)
> create mode 100644 fs/ceph/cache.c
> create mode 100644 fs/ceph/cache.h
>
> diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
> index 49bc782..ac9a2ef 100644
> --- a/fs/ceph/Kconfig
> +++ b/fs/ceph/Kconfig
> @@ -16,3 +16,12 @@ config CEPH_FS
>
> If unsure, say N.
>
> +if CEPH_FS
> +config CEPH_FSCACHE
> + bool "Enable Ceph client caching support"
> + depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
> + help
> + Choose Y here to enable persistent, read-only local
> + caching support for Ceph clients using FS-Cache
> +
> +endif
> diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
> index bd35212..0af0678 100644
> --- a/fs/ceph/Makefile
> +++ b/fs/ceph/Makefile
> @@ -9,3 +9,5 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
> mds_client.o mdsmap.o strings.o ceph_frag.o \
> debugfs.o
>
> +ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
> +
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index cb78ce8..632bb48 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -11,6 +11,7 @@
>
> #include "super.h"
> #include "mds_client.h"
> +#include "cache.h"
> #include <linux/ceph/osd_client.h>
>
> /*
> @@ -159,6 +160,11 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
> return;
> }
>
> + ceph_invalidate_fscache_page(inode, page);
> +
> + if (!PagePrivate(page))
> + return;
> +
> /*
> * We can get non-dirty pages here due to races between
> * set_page_dirty and truncate_complete_page; just spit out a
> @@ -178,14 +184,17 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
> ClearPagePrivate(page);
> }
>
> -/* just a sanity check */
> static int ceph_releasepage(struct page *page, gfp_t g)
> {
> struct inode *inode = page->mapping ? page->mapping->host : NULL;
> dout("%p releasepage %p idx %lu\n", inode, page, page->index);
> WARN_ON(PageDirty(page));
> - WARN_ON(PagePrivate(page));
> - return 0;
> +
> + /* Can we release the page from the cache? */
> + if (!ceph_release_fscache_page(page, g))
> + return 0;
> +
> + return !PagePrivate(page);
> }
>
> /*
> @@ -195,11 +204,16 @@ static int readpage_nounlock(struct file *filp, struct page *page)
> {
> struct inode *inode = file_inode(filp);
> struct ceph_inode_info *ci = ceph_inode(inode);
> - struct ceph_osd_client *osdc =
> + struct ceph_osd_client *osdc =
> &ceph_inode_to_client(inode)->client->osdc;
> int err = 0;
> u64 len = PAGE_CACHE_SIZE;
>
> + err = ceph_readpage_from_fscache(inode, page);
> +
> + if (err == 0)
> + goto out;
> +
> dout("readpage inode %p file %p page %p index %lu\n",
> inode, filp, page, page->index);
> err = ceph_osdc_readpages(osdc, ceph_vino(inode), &ci->i_layout,
> @@ -217,6 +231,9 @@ static int readpage_nounlock(struct file *filp, struct page *page)
> }
> SetPageUptodate(page);
>
> + if (err == 0)
> + ceph_readpage_to_fscache(inode, page);
> +
> out:
> return err < 0 ? err : 0;
> }
> @@ -259,6 +276,7 @@ static void finish_read(struct ceph_osd_request *req, struct ceph_msg *msg)
> page->index);
> flush_dcache_page(page);
> SetPageUptodate(page);
> + ceph_readpage_to_fscache(inode, page);
> unlock_page(page);
> page_cache_release(page);
> bytes -= PAGE_CACHE_SIZE;
> @@ -328,7 +346,7 @@ static int start_read(struct inode *inode, struct list_head *page_list, int max)
> page = list_entry(page_list->prev, struct page, lru);
> BUG_ON(PageLocked(page));
> list_del(&page->lru);
> -
> +
> dout("start_read %p adding %p idx %lu\n", inode, page,
> page->index);
> if (add_to_page_cache_lru(page, &inode->i_data, page->index,
> @@ -375,6 +393,12 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
> int rc = 0;
> int max = 0;
>
> + rc = ceph_readpages_from_fscache(mapping->host, mapping, page_list,
> + &nr_pages);
> +
> + if (rc == 0)
> + goto out;
> +
> if (fsc->mount_options->rsize >= PAGE_CACHE_SIZE)
> max = (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
> >> PAGE_SHIFT;
> @@ -494,6 +518,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
> CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
> set_bdi_congested(&fsc->backing_dev_info, BLK_RW_ASYNC);
>
> + ceph_readpage_to_fscache(inode, page);
> +
> set_page_writeback(page);
> err = ceph_osdc_writepages(osdc, ceph_vino(inode),
> &ci->i_layout, snapc,
> @@ -549,7 +575,6 @@ static void ceph_release_pages(struct page **pages, int num)
> pagevec_release(&pvec);
> }
>
> -
> /*
> * async writeback completion handler.
> *
> diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
> new file mode 100644
> index 0000000..a5ad9c3
> --- /dev/null
> +++ b/fs/ceph/cache.c
> @@ -0,0 +1,311 @@
> +/*
> + * Ceph cache definitions.
> + *
> + * Copyright (C) 2013 by Adfin Solutions, Inc. All Rights Reserved.
> + * Written by Milosz Tanski ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to:
> + * Free Software Foundation
> + * 51 Franklin Street, Fifth Floor
> + * Boston, MA 02111-1301 USA
> + *
> + */
> +
> +#include "super.h"
> +#include "cache.h"
> +
> +struct ceph_aux_inode {
> + struct timespec mtime;
> + loff_t size;
> +};
> +
> +struct fscache_netfs ceph_cache_netfs = {
> + .name = "ceph",
> + .version = 0,
> +};
> +
> +static uint16_t ceph_fscache_session_get_key(const void *cookie_netfs_data,
> + void *buffer, uint16_t maxbuf)
> +{
> + const struct ceph_fs_client* fsc = cookie_netfs_data;
> + uint16_t klen;
> +
> + klen = sizeof(fsc->client->fsid);
> + if (klen > maxbuf)
> + return 0;
> +
> + memcpy(buffer, &fsc->client->fsid, klen);
> + return klen;
> +}
> +
> +static const struct fscache_cookie_def ceph_fscache_fsid_object_def = {
> + .name = "CEPH.fsid",
> + .type = FSCACHE_COOKIE_TYPE_INDEX,
> + .get_key = ceph_fscache_session_get_key,
> +};
> +
> +void ceph_fscache_register_fsid_cookie(struct ceph_fs_client* fsc)
> +{
> + fsc->fscache = fscache_acquire_cookie(ceph_cache_netfs.primary_index,
> + &ceph_fscache_fsid_object_def,
> + fsc);
> +}
> +
> +void ceph_fscache_unregister_fsid_cookie(struct ceph_fs_client* fsc)
> +{
> + fscache_relinquish_cookie(fsc->fscache, 0);
> + fsc->fscache = NULL;
> +}
> +
> +static uint16_t ceph_fscache_inode_get_key(const void *cookie_netfs_data,
> + void *buffer, uint16_t maxbuf)
> +{
> + const struct ceph_inode_info* ci = cookie_netfs_data;
> + uint16_t klen;
> +
> + /* use ceph virtual inode (id + snaphot) */
> + klen = sizeof(ci->i_vino);
> + if (klen > maxbuf)
> + return 0;
> +
> + memcpy(buffer, &ci->i_vino, klen);
> + return klen;
> +}
> +
> +static uint16_t ceph_fscache_inode_get_aux(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + struct ceph_aux_inode aux;
> + const struct ceph_inode_info* ci = cookie_netfs_data;
> + const struct inode* inode = &ci->vfs_inode;
> +
> + memset(&aux, 0, sizeof(aux));
> + aux.mtime = inode->i_mtime;
> + aux.size = inode->i_size;
> +
> + memcpy(buffer, &aux, sizeof(aux));
> +
> + return sizeof(aux);
> +}
> +
> +static void ceph_fscache_inode_get_attr(const void *cookie_netfs_data,
> + uint64_t *size)
> +{
> + const struct ceph_inode_info* ci = cookie_netfs_data;
> + const struct inode* inode = &ci->vfs_inode;
> +
> + *size = inode->i_size;
> +}
> +
> +static enum fscache_checkaux ceph_fscache_inode_check_aux(
> + void *cookie_netfs_data, const void *data, uint16_t dlen)
> +{
> + struct ceph_aux_inode aux;
> + struct ceph_inode_info* ci = cookie_netfs_data;
> + struct inode* inode = &ci->vfs_inode;
> +
> + if (dlen != sizeof(aux))
> + return FSCACHE_CHECKAUX_OBSOLETE;
> +
> + memset(&aux, 0, sizeof(aux));
> + aux.mtime = inode->i_mtime;
> + aux.size = inode->i_size;
> +
> + if (memcmp(data, &aux, sizeof(aux)) != 0)
> + return FSCACHE_CHECKAUX_OBSOLETE;
> +
> + dout("ceph inode 0x%p cached okay", ci);
> + return FSCACHE_CHECKAUX_OKAY;
> +}
> +
> +static void ceph_fscache_inode_now_uncached(void* cookie_netfs_data)
> +{
> + struct ceph_inode_info* ci = cookie_netfs_data;
> + struct pagevec pvec;
> + pgoff_t first;
> + int loop, nr_pages;
> +
> + pagevec_init(&pvec, 0);
> + first = 0;
> +
> + dout("ceph inode 0x%p now uncached", ci);
> +
> + while (1) {
> + nr_pages = pagevec_lookup(&pvec, ci->vfs_inode.i_mapping, first,
> + PAGEVEC_SIZE - pagevec_count(&pvec));
> +
> + if (!nr_pages)
> + break;
> +
> + for (loop = 0; loop < nr_pages; loop++)
> + ClearPageFsCache(pvec.pages[loop]);
> +
> + first = pvec.pages[nr_pages - 1]->index + 1;
> +
> + pvec.nr = nr_pages;
> + pagevec_release(&pvec);
> + cond_resched();
> + }
> +}
> +
> +static const struct fscache_cookie_def ceph_fscache_inode_object_def = {
> + .name = "CEPH.inode",
> + .type = FSCACHE_COOKIE_TYPE_DATAFILE,
> + .get_key = ceph_fscache_inode_get_key,
> + .get_attr = ceph_fscache_inode_get_attr,
> + .get_aux = ceph_fscache_inode_get_aux,
> + .check_aux = ceph_fscache_inode_check_aux,
> + .now_uncached = ceph_fscache_inode_now_uncached,
> +};
> +
> +void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc,
> + struct ceph_inode_info* ci)
> +{
> + struct inode* inode = &ci->vfs_inode;
> +
> + /* No caching for filesystem */
> + if (fsc->fscache == NULL)
> + return;
> +
> + /* Only cache for regular files that are read only */
> + if ((ci->vfs_inode.i_mode & S_IFREG) == 0)
> + return;
> +
> + /* Avoid multiple racing open requests */
> + mutex_lock(&inode->i_mutex);
> +
> + if (ci->fscache)
> + goto done;
> +
> + ci->fscache = fscache_acquire_cookie(fsc->fscache,
> + &ceph_fscache_inode_object_def,
> + ci);
> +done:
> + mutex_unlock(&inode->i_mutex);
> +
> +}
> +
> +void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
> +{
> + struct fscache_cookie* cookie;
> +
> + if ((cookie = ci->fscache) == NULL)
> + return;
> +
> + ci->fscache = NULL;
> +
> + fscache_uncache_all_inode_pages(cookie, &ci->vfs_inode);
> + fscache_relinquish_cookie(cookie, 0);
> +}
> +
> +static void ceph_vfs_readpage_complete(struct page *page, void *data, int error)
> +{
> + if (!error)
> + SetPageUptodate(page);
> +}
> +
> +static void ceph_vfs_readpage_complete_unlock(struct page *page, void *data, int error)
> +{
> + if (!error)
> + SetPageUptodate(page);
> +
> + unlock_page(page);
> +}
> +
> +static inline int cache_valid(struct ceph_inode_info *ci)
> +{
> + return ((ceph_caps_issued(ci) & CEPH_CAP_FILE_CACHE) &&
> + (ci->i_fscache_gen == ci->i_rdcache_gen));
> +}
> +
> +
> +/* Atempt to read from the fscache,
> + *
> + * This function is called from the readpage_nounlock context. DO NOT attempt to
> + * unlock the page here (or in the callback).
> + */
> +int __ceph_readpage_from_fscache(struct inode *inode, struct page *page)
> +{
> + struct ceph_inode_info *ci = ceph_inode(inode);
> + int ret;
> +
> + if (!cache_valid(ci))
> + return -ENOBUFS;
> +
> + ret = fscache_read_or_alloc_page(ci->fscache, page,
> + ceph_vfs_readpage_complete, NULL,
> + GFP_KERNEL);
> +
> + switch (ret) {
> + case 0: /* Page found */
> + dout("page read submitted\n");
> + return 0;
> + case -ENOBUFS: /* Pages were not found, and can't be */
> + case -ENODATA: /* Pages were not found */
> + dout("page/inode not in cache\n");
> + return ret;
> + default:
> + dout("%s: unknown error ret = %i\n", __func__, ret);
> + return ret;
> + }
> +}
> +
> +int __ceph_readpages_from_fscache(struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + struct ceph_inode_info *ci = ceph_inode(inode);
> + int ret;
> +
> + if (!cache_valid(ci))
> + return -ENOBUFS;
> +
> + ret = fscache_read_or_alloc_pages(ci->fscache, mapping, pages, nr_pages,
> + ceph_vfs_readpage_complete_unlock,
> + NULL, mapping_gfp_mask(mapping));
> +
> + switch (ret) {
> + case 0: /* All pages found */
> + dout("all-page read submitted\n");
> + return 0;
> + case -ENOBUFS: /* Some pages were not found, and can't be */
> + case -ENODATA: /* some pages were not found */
> + dout("page/inode not in cache\n");
> + return ret;
> + default:
> + dout("%s: unknown error ret = %i\n", __func__, ret);
> + return ret;
> + }
> +}
> +
> +void __ceph_readpage_to_fscache(struct inode *inode, struct page *page)
> +{
> + struct ceph_inode_info *ci = ceph_inode(inode);
> + int ret;
> +
> + if (!cache_valid(ci))
> + return;
> +
> + ret = fscache_write_page(ci->fscache, page, GFP_KERNEL);
> + if (ret)
> + fscache_uncache_page(ci->fscache, page);
> +}
> +
> +void __ceph_invalidate_fscache_page(struct inode* inode, struct page *page)
> +{
> + struct ceph_inode_info *ci = ceph_inode(inode);
> +
> + fscache_wait_on_page_write(ci->fscache, page);
> + fscache_uncache_page(ci->fscache, page);
> +}
> diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
> new file mode 100644
> index 0000000..23f2666
> --- /dev/null
> +++ b/fs/ceph/cache.h
> @@ -0,0 +1,123 @@
> +/*
> + * Ceph cache definitions.
> + *
> + * Copyright (C) 2013 by Adfin Solutions, Inc. All Rights Reserved.
> + * Written by Milosz Tanski ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to:
> + * Free Software Foundation
> + * 51 Franklin Street, Fifth Floor
> + * Boston, MA 02111-1301 USA
> + *
> + */
> +
> +#ifndef _CEPH_CACHE_H
> +#define _CEPH_CACHE_H
> +
> +#include <linux/fscache.h>
> +
> +
> +extern struct fscache_netfs ceph_cache_netfs;
> +
> +
> +void ceph_fscache_register_fsid_cookie(struct ceph_fs_client* fsc);
> +void ceph_fscache_unregister_fsid_cookie(struct ceph_fs_client* fsc);
> +void ceph_fscache_register_inode_cookie(struct ceph_fs_client* parent_fsc,
> + struct ceph_inode_info* ci);
> +void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci);
> +
> +int __ceph_readpage_from_fscache(struct inode *inode, struct page *page);
> +int __ceph_readpages_from_fscache(struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages);
> +void __ceph_readpage_to_fscache(struct inode *inode, struct page *page);
> +void __ceph_invalidate_fscache_page(struct inode* inode, struct page *page);

These should all move down. The revalidate_work method should get moved
in here and into cache.c, too.

> +
> +#ifdef CONFIG_CEPH_FSCACHE
> +
> +
> +static inline int ceph_readpage_from_fscache(struct inode* inode,
> + struct page *page)
> +{
> + return __ceph_readpage_from_fscache(inode, page);
> +}
> +
> +static inline int ceph_readpages_from_fscache(struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + return __ceph_readpages_from_fscache(inode, mapping, pages,
> + nr_pages);
> +}
> +
> +static inline void ceph_readpage_to_fscache(struct inode *inode,
> + struct page *page)
> +{
> + return __ceph_readpage_to_fscache(inode, page);
> +}
> +
> +static inline void ceph_invalidate_fscache_page(struct inode *inode,
> + struct page *page)
> +{
> + return __ceph_invalidate_fscache_page(inode, page);
> +}
> +
> +static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
> +{
> + struct inode* inode = page->mapping->host;
> + struct ceph_inode_info *ci = ceph_inode(inode);
> + return fscache_maybe_release_page(ci->fscache, page, gfp);
> +}
> +
> +#else
> +
> +static inline int ceph_readpage_from_fscache(struct inode* inode,
> + struct page *page)
> +{
> + return -ENOBUFS;
> +}
> +
> +static inline int ceph_readpages_from_fscache(struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + return -ENOBUFS;
> +}
> +
> +static inline void ceph_readpage_to_fscache(struct inode *inode,
> + struct page *page)
> +{
> +}
> +
> +static inline void ceph_invalidate_fscache_page(struct inode *inode,
> + struct page *page)
> +{
> +}
> +
> +static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
> +{
> + return 1;
> +}
> +
> +static void ceph_fscache_readpages_cancel(struct inode *inode,
> + struct list_head *pages)
> +{
> +
> +}
> +
> +#endif
> +
> +#endif
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 5a26bc1..a94ca4b 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -10,6 +10,7 @@
>
> #include "super.h"
> #include "mds_client.h"
> +#include "cache.h"
> #include <linux/ceph/decode.h>
> #include <linux/ceph/messenger.h>
>
> @@ -479,8 +480,9 @@ static void __check_cap_issue(struct ceph_inode_info *ci, struct ceph_cap *cap,
> * i_rdcache_gen.
> */
> if ((issued & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) &&
> - (had & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0)
> + (had & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0) {
> ci->i_rdcache_gen++;
> + }
>
> /*
> * if we are newly issued FILE_SHARED, mark dir not complete; we
> @@ -2395,6 +2397,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
> int writeback = 0;
> int queue_invalidate = 0;
> int deleted_inode = 0;
> + int queue_revalidate = 0;
>
> dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
> inode, cap, mds, seq, ceph_cap_string(newcaps));
> @@ -2417,6 +2420,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
> ci->i_rdcache_revoking = ci->i_rdcache_gen;
> }
> }
> +
> + fscache_invalidate(ci->fscache);
> }
>
> /* side effects now are allowed */
> @@ -2458,6 +2463,11 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
> }
> }
>
> + /* Do we need to revalidate our fscache cookie. Don't bother on the
> + * first cache cap as we already validate at cookie creation time. */
> + if ((issued & CEPH_CAP_FILE_CACHE) && ci->i_rdcache_gen > 1)
> + queue_revalidate = 1;
> +
> /* size/ctime/mtime/atime? */
> ceph_fill_file_size(inode, issued,
> le32_to_cpu(grant->truncate_seq),
> @@ -2542,6 +2552,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
> BUG_ON(cap->issued & ~cap->implemented);
>
> spin_unlock(&ci->i_ceph_lock);
> +
> if (writeback)
> /*
> * queue inode for writeback: we can't actually call
> @@ -2553,6 +2564,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
> ceph_queue_invalidate(inode);
> if (deleted_inode)
> invalidate_aliases(inode);
> + if (queue_revalidate)
> + ceph_queue_revalidate(inode);
> if (wake)
> wake_up_all(&ci->i_cap_wq);
>
> @@ -2709,8 +2722,10 @@ static void handle_cap_trunc(struct inode *inode,
> truncate_seq, truncate_size, size);
> spin_unlock(&ci->i_ceph_lock);
>
> - if (queue_trunc)
> + if (queue_trunc) {
> ceph_queue_vmtruncate(inode);
> + fscache_invalidate(ci->fscache);

This should call ceph_fscache_invalidate(inode), a wrapper in cache.[ch].
(This is fixed in my patch in the ceph-client.git testing branch.)

> + }
> }
>
> /*
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 68af489..b81c75f 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -12,6 +12,7 @@
>
> #include "super.h"
> #include "mds_client.h"
> +#include "cache.h"
>
> /*
> * Ceph file operations
> @@ -69,9 +70,23 @@ static int ceph_init_file(struct inode *inode, struct file *file, int fmode)
> {
> struct ceph_file_info *cf;
> int ret = 0;
> + struct ceph_inode_info *ci = ceph_inode(inode);
> + struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
> + struct ceph_mds_client *mdsc = fsc->mdsc;
>
> switch (inode->i_mode & S_IFMT) {
> case S_IFREG:
> + /* First file open request creates the cookie, we want to keep
> + * this cookie around for the filetime of the inode as not to
> + * have to worry about fscache register / revoke / operation
> + * races.
> + *
> + * Also, if we know the operation is going to invalidate data
> + * (non readonly) just nuke the cache right away.
> + */
> + ceph_fscache_register_inode_cookie(mdsc->fsc, ci);
> + if ((fmode & CEPH_FILE_MODE_WR))
> + fscache_invalidate(ci->fscache);
> case S_IFDIR:
> dout("init_file %p %p 0%o (regular)\n", inode, file,
> inode->i_mode);
> @@ -182,6 +197,7 @@ int ceph_open(struct inode *inode, struct file *file)
> spin_unlock(&ci->i_ceph_lock);
> return ceph_init_file(inode, file, fmode);
> }
> +
> spin_unlock(&ci->i_ceph_lock);
>
> dout("open fmode %d wants %s\n", fmode, ceph_cap_string(wanted));
> @@ -192,6 +208,7 @@ int ceph_open(struct inode *inode, struct file *file)
> }
> req->r_inode = inode;
> ihold(inode);
> +
> req->r_num_caps = 1;
> if (flags & (O_CREAT|O_TRUNC))
> parent_inode = ceph_get_dentry_parent_inode(file->f_dentry);
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 602ccd8..5daf7f8 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -12,6 +12,7 @@
>
> #include "super.h"
> #include "mds_client.h"
> +#include "cache.h"
> #include <linux/ceph/decode.h>
>
> /*
> @@ -31,6 +32,7 @@ static const struct inode_operations ceph_symlink_iops;
> static void ceph_invalidate_work(struct work_struct *work);
> static void ceph_writeback_work(struct work_struct *work);
> static void ceph_vmtruncate_work(struct work_struct *work);
> +static void ceph_revalidate_work(struct work_struct *work);

This can go in cache.h, without the 'static'...

>
> /*
> * find or create an inode, given the ceph ino number
> @@ -386,6 +388,13 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
>
> INIT_WORK(&ci->i_vmtruncate_work, ceph_vmtruncate_work);
>
> +#ifdef CONFIG_CEPH_FSCACHE
> + ci->fscache = NULL;
> + /* The first load is verifed cookie open time */
> + ci->i_fscache_gen = 1;
> + INIT_WORK(&ci->i_revalidate_work, ceph_revalidate_work);
> +#endif
> +
> return &ci->vfs_inode;
> }
>
> @@ -405,6 +414,8 @@ void ceph_destroy_inode(struct inode *inode)
>
> dout("destroy_inode %p ino %llx.%llx\n", inode, ceph_vinop(inode));
>
> + ceph_fscache_unregister_inode_cookie(ci);
> +
> ceph_queue_caps_release(inode);
>
> /*
> @@ -439,7 +450,6 @@ void ceph_destroy_inode(struct inode *inode)
> call_rcu(&inode->i_rcu, ceph_i_callback);
> }
>
> -
> /*
> * Helpers to fill in size, ctime, mtime, and atime. We have to be
> * careful because either the client or MDS may have more up to date
> @@ -491,6 +501,10 @@ int ceph_fill_file_size(struct inode *inode, int issued,
> truncate_size);
> ci->i_truncate_size = truncate_size;
> }
> +
> + if (queue_trunc)
> + fscache_invalidate(ci->fscache);
> +
> return queue_trunc;
> }
>
> @@ -1079,7 +1093,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req,
> * complete.
> */
> ceph_set_dentry_offset(req->r_old_dentry);
> - dout("dn %p gets new offset %lld\n", req->r_old_dentry,
> + dout("dn %p gets new offset %lld\n", req->r_old_dentry,
> ceph_dentry(req->r_old_dentry)->offset);
>
> dn = req->r_old_dentry; /* use old_dentry */
> @@ -1494,6 +1508,7 @@ void ceph_queue_vmtruncate(struct inode *inode)
> struct ceph_inode_info *ci = ceph_inode(inode);
>
> ihold(inode);
> +
> if (queue_work(ceph_sb_to_client(inode->i_sb)->trunc_wq,
> &ci->i_vmtruncate_work)) {
> dout("ceph_queue_vmtruncate %p\n", inode);
> @@ -1565,6 +1580,53 @@ retry:
> wake_up_all(&ci->i_cap_wq);
> }
>
> +static void ceph_revalidate_work(struct work_struct *work)
> +{
> + int issued;
> + u32 orig_gen;
> + struct ceph_inode_info *ci = container_of(work, struct ceph_inode_info,
> + i_revalidate_work);
> + struct inode *inode = &ci->vfs_inode;
> +
> + spin_lock(&ci->i_ceph_lock);
> + issued = __ceph_caps_issued(ci, NULL);
> + orig_gen = ci->i_rdcache_gen;
> + spin_unlock(&ci->i_ceph_lock);
> +
> + if (!(issued & CEPH_CAP_FILE_CACHE)) {
> + dout("revalidate_work lost cache before validation %p\n",
> + inode);
> + goto out;
> + }
> +
> + if (!fscache_check_consistency(ci->fscache))
> + fscache_invalidate(ci->fscache);
> +
> + spin_lock(&ci->i_ceph_lock);
> + /* Update the new valid generation (backwards sanity check too) */
> + if (orig_gen > ci->i_fscache_gen) {
> + ci->i_fscache_gen = orig_gen;
> + }
> + spin_unlock(&ci->i_ceph_lock);
> +
> +out:
> + iput(&ci->vfs_inode);
> +}
> +
> +void ceph_queue_revalidate(struct inode *inode)
> +{
> + struct ceph_inode_info *ci = ceph_inode(inode);
> +
> + ihold(inode);
> +
> + if (queue_work(ceph_sb_to_client(inode->i_sb)->revalidate_wq,
> + &ci->i_revalidate_work)) {
> + dout("ceph_queue_revalidate %p\n", inode);
> + } else {
> + dout("ceph_queue_revalidate %p failed\n)", inode);
> + iput(inode);
> + }
> +}

Move these to cache.c, and put a no-op ceph_queue_revalidate() in
cache.h's #else block...

>
> /*
> * symlinks
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 6627b26..a56baab 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -17,6 +17,7 @@
>
> #include "super.h"
> #include "mds_client.h"
> +#include "cache.h"
>
> #include <linux/ceph/ceph_features.h>
> #include <linux/ceph/decode.h>
> @@ -142,6 +143,8 @@ enum {
> Opt_nodcache,
> Opt_ino32,
> Opt_noino32,
> + Opt_fscache,
> + Opt_nofscache
> };
>
> static match_table_t fsopt_tokens = {
> @@ -167,6 +170,8 @@ static match_table_t fsopt_tokens = {
> {Opt_nodcache, "nodcache"},
> {Opt_ino32, "ino32"},
> {Opt_noino32, "noino32"},
> + {Opt_fscache, "fsc"},
> + {Opt_nofscache, "nofsc"},
> {-1, NULL}
> };
>
> @@ -260,6 +265,12 @@ static int parse_fsopt_token(char *c, void *private)
> case Opt_noino32:
> fsopt->flags &= ~CEPH_MOUNT_OPT_INO32;
> break;
> + case Opt_fscache:
> + fsopt->flags |= CEPH_MOUNT_OPT_FSCACHE;
> + break;
> + case Opt_nofscache:
> + fsopt->flags &= ~CEPH_MOUNT_OPT_FSCACHE;
> + break;
> default:
> BUG_ON(token);
> }
> @@ -422,6 +433,10 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
> seq_puts(m, ",dcache");
> else
> seq_puts(m, ",nodcache");
> + if (fsopt->flags & CEPH_MOUNT_OPT_FSCACHE)
> + seq_puts(m, ",fsc");
> + else
> + seq_puts(m, ",nofsc");
>
> if (fsopt->wsize)
> seq_printf(m, ",wsize=%d", fsopt->wsize);
> @@ -530,11 +545,24 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
> if (!fsc->wb_pagevec_pool)
> goto fail_trunc_wq;
>
> +#ifdef CONFIG_CEPH_FSCACHE
> + if ((fsopt->flags & CEPH_MOUNT_OPT_FSCACHE))
> + ceph_fscache_register_fsid_cookie(fsc);
> +
> + fsc->revalidate_wq = alloc_workqueue("ceph-revalidate", 0, 1);
> + if (fsc->revalidate_wq == NULL)
> + goto fail_fscache;
> +#endif
> +

Since this is non-trivial, I'd make ceph_fscache_init() and _shutdown()
functions (in cache.[ch]) so the #ifdef's go away here, too.

> /* caps */
> fsc->min_caps = fsopt->max_readdir;
>
> return fsc;
>
> +#ifdef CONFIG_CEPH_FSCACHE
> +fail_fscache:
> + ceph_fscache_unregister_fsid_cookie(fsc);
> +#endif
> fail_trunc_wq:
> destroy_workqueue(fsc->trunc_wq);
> fail_pg_inv_wq:
> @@ -554,6 +582,10 @@ static void destroy_fs_client(struct ceph_fs_client *fsc)
> {
> dout("destroy_fs_client %p\n", fsc);
>
> +#ifdef CONFIG_CEPH_FSCACHE
> + ceph_fscache_unregister_fsid_cookie(fsc);
> +#endif
> +

and here

> destroy_workqueue(fsc->wb_wq);
> destroy_workqueue(fsc->pg_inv_wq);
> destroy_workqueue(fsc->trunc_wq);
> @@ -588,6 +620,8 @@ static void ceph_inode_init_once(void *foo)
>
> static int __init init_caches(void)
> {
> + int error = -ENOMEM;
> +
> ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
> sizeof(struct ceph_inode_info),
> __alignof__(struct ceph_inode_info),
> @@ -611,15 +645,19 @@ static int __init init_caches(void)
> if (ceph_file_cachep == NULL)
> goto bad_file;
>
> - return 0;
> +#ifdef CONFIG_CEPH_FSCACHE
> + if ((error = fscache_register_netfs(&ceph_cache_netfs)))
> + goto bad_file;
> +#endif

ceph_fscache_register() (?) in cache.h?

>
> + return 0;
> bad_file:
> kmem_cache_destroy(ceph_dentry_cachep);
> bad_dentry:
> kmem_cache_destroy(ceph_cap_cachep);
> bad_cap:
> kmem_cache_destroy(ceph_inode_cachep);
> - return -ENOMEM;
> + return error;
> }
>
> static void destroy_caches(void)
> @@ -629,10 +667,15 @@ static void destroy_caches(void)
> * destroy cache.
> */
> rcu_barrier();
> +
> kmem_cache_destroy(ceph_inode_cachep);
> kmem_cache_destroy(ceph_cap_cachep);
> kmem_cache_destroy(ceph_dentry_cachep);
> kmem_cache_destroy(ceph_file_cachep);
> +
> +#ifdef CONFIG_CEPH_FSCACHE
> + fscache_unregister_netfs(&ceph_cache_netfs);
> +#endif

and ceph_fscache_unregister()

We'd also like to make sure this gets tested by our qa suite. That
probably means setting up the fscache stuff on the clients in the
teuthology.git/teuthology/tests/kclient.py task. I'd settle for a
quick run-down of what steps we should take to do that during
mount/umount, though.

Thanks, Milosz!
sage


> }
>
>
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index f1e4e47..72eac24 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -16,6 +16,10 @@
>
> #include <linux/ceph/libceph.h>
>
> +#ifdef CONFIG_CEPH_FSCACHE
> +#include <linux/fscache.h>
> +#endif
> +
> /* f_type in struct statfs */
> #define CEPH_SUPER_MAGIC 0x00c36400
>
> @@ -29,6 +33,7 @@
> #define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */
> #define CEPH_MOUNT_OPT_INO32 (1<<8) /* 32 bit inos */
> #define CEPH_MOUNT_OPT_DCACHE (1<<9) /* use dcache for readdir etc */
> +#define CEPH_MOUNT_OPT_FSCACHE (1<<10) /* use fscache */
>
> #define CEPH_MOUNT_OPT_DEFAULT (CEPH_MOUNT_OPT_RBYTES)
>
> @@ -90,6 +95,11 @@ struct ceph_fs_client {
> struct dentry *debugfs_bdi;
> struct dentry *debugfs_mdsc, *debugfs_mdsmap;
> #endif
> +
> +#ifdef CONFIG_CEPH_FSCACHE
> + struct fscache_cookie *fscache;
> + struct workqueue_struct *revalidate_wq;
> +#endif
> };
>
>
> @@ -320,6 +330,12 @@ struct ceph_inode_info {
>
> struct work_struct i_vmtruncate_work;
>
> +#ifdef CONFIG_CEPH_FSCACHE
> + struct fscache_cookie *fscache;
> + u32 i_fscache_gen; /* sequence, for delayed fscache validate */
> + struct work_struct i_revalidate_work;
> +#endif
> +
> struct inode vfs_inode; /* at end */
> };
>
> @@ -700,6 +716,7 @@ extern void ceph_queue_vmtruncate(struct inode *inode);
>
> extern void ceph_queue_invalidate(struct inode *inode);
> extern void ceph_queue_writeback(struct inode *inode);
> +extern void ceph_queue_revalidate(struct inode *inode);
>
> extern int ceph_do_getattr(struct inode *inode, int mask);
> extern int ceph_permission(struct inode *inode, int mask);
> --
> 1.8.1.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-09-03 17:24:25

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 4/5] fscache: netfs function for cleanup post readpages

I just wanted to follow up on this patch (and number 5) in the series.
The backtrace I posted originally is not correct backtrace from this
particular issue. The new one I attached at the bottom of this email
is the right one. The backtrace I posted is a that only Ceph
experiences in ceph_readpages because it directly returns the pages.
However, the patch I posted is still valid and still address a real
problem. The only issue was the wrong backtrace.

The fixed is between Ceph and Fscache interaction when called from
readahed code path. I also investigated the other filesystems (CIFS
and NFS) and they are also susceptible to the same issue.

In any case the correct backtrace to company the patch for review is
in this email.

- Milosz

? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824267] BUG:
Bad page state in process petabucket pfn:407aed
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824273]
page:ffffea00101ebb40 count:0 mapcount:0 mapping: (null) index:0x9cb
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824278] page
flags: 0x200000000001000(private_2)
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824282]
Modules linked in: ceph libceph cachefiles ghash_clmulni_intel
aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64
microcode auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc
raid10 raid456 async_pq async_xor async_memcpy async_raid6_recov
async_tx raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor
zlib_deflate libcrc32c
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824297] CPU:
1 PID: 32527 Comm: petabucket Tainted: G B 3.10.0-virtual #45
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824298]
0000000000000001 ffff880424341a48 ffffffff815523f2 ffff880424341a68
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824300]
ffffffff8111def7 0000000000000001 ffffea00101ebb40 ffff880424341aa8
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824302]
ffffffff8111e49e ffffffff81132ce9 ffffea00101ebb40 0200000000001000
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824304] Call Trace:
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824307]
[<ffffffff815523f2>] dump_stack+0x19/0x1b
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824309]
[<ffffffff8111def7>] bad_page+0xc7/0x120
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824312]
[<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824314]
[<ffffffff81132ce9>] ? zone_statistics+0x99/0xc0
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824316]
[<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824318]
[<ffffffff81123507>] __put_single_page+0x27/0x30
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824320]
[<ffffffff81123df5>] put_page+0x25/0x40
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824321]
[<ffffffff81123e66>] put_pages_list+0x56/0x70
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824324]
[<ffffffff81122a98>] __do_page_cache_readahead+0x1b8/0x260
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824327]
[<ffffffff81122ea1>] ra_submit+0x21/0x30
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824329]
[<ffffffff81118f64>] filemap_fault+0x254/0x490
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824332]
[<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824334]
[<ffffffff81004ec2>] ? xen_mc_flush+0xb2/0x1c0
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824336]
[<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824339]
[<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824341]
[<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824343]
[<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824346]
[<ffffffff8113f361>] handle_mm_fault+0x251/0x370
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824348]
[<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824350]
[<ffffffff81004ec2>] ? xen_mc_flush+0xb2/0x1c0
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824352]
[<ffffffff8100483d>] ? xen_clts+0x8d/0x190
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824354]
[<ffffffff8155c3ae>] do_page_fault+0xe/0x10
? 12:20:34.896 Aug 9 16:20:38 betanode2 kernel: [11121126.824357]
[<ffffffff81558818>] page_fault+0x28/0x30

On Wed, Aug 21, 2013 at 5:30 PM, Milosz Tanski <[email protected]> wrote:
> Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
> inside the aops readpages callback. It marks all the pages in the list provided
> by readahead with PgPrivate2. In the cases that the netfs fails to read all the
> pages (which is legal) it ends up returning to the readahead and triggering a
> BUG. This happens because the page list still contains marked pages.
>
> This patch implements a simple fscache_readpages_cancel function that the netfs
> should call before returning from readpages. It will revoke the pages from the
> underlying cache backend and unmark them.
>
> This addresses this BUG being triggered by netfs code:
>
> [12410647.597278] BUG: Bad page state in process petabucket pfn:3d504e
> [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
> (null) index:0x0
> [12410647.597298] page flags: 0x200000000001000(private_2)
>
> ...
>
> [12410647.597334] Call Trace:
> [12410647.597345] [<ffffffff815523f2>] dump_stack+0x19/0x1b
> [12410647.597356] [<ffffffff8111def7>] bad_page+0xc7/0x120
> [12410647.597359] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
> [12410647.597361] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
> [12410647.597363] [<ffffffff81123507>] __put_single_page+0x27/0x30
> [12410647.597365] [<ffffffff81123df5>] put_page+0x25/0x40
> [12410647.597376] [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
> [12410647.597379] [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
> [12410647.597382] [<ffffffff81122ea1>] ra_submit+0x21/0x30
> [12410647.597384] [<ffffffff81118f64>] filemap_fault+0x254/0x490
> [12410647.597387] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
> [12410647.597391] [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
> [12410647.597395] [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
> [12410647.597398] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
> [12410647.597401] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
> [12410647.597403] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
> [12410647.597405] [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
> [12410647.597407] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
> [12410647.597411] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
> [12410647.597414] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
> [12410647.597418] [<ffffffff8108011d>] ? up_write+0x1d/0x20
> [12410647.597422] [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
> [12410647.597425] [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
> [12410647.597427] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
> [12410647.597431] [<ffffffff81558818>] page_fault+0x28/0x30
>
> Signed-off-by: Milosz Tanski <[email protected]>
> ---
> fs/fscache/page.c | 16 ++++++++++++++++
> include/linux/fscache.h | 22 ++++++++++++++++++++++
> 2 files changed, 38 insertions(+)
>
> diff --git a/fs/fscache/page.c b/fs/fscache/page.c
> index d479ab3..0cc3153 100644
> --- a/fs/fscache/page.c
> +++ b/fs/fscache/page.c
> @@ -1132,3 +1132,19 @@ void __fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
> _leave("");
> }
> EXPORT_SYMBOL(__fscache_uncache_all_inode_pages);
> +
> +/**
> + * Unmark pages allocate in the readahead code path (via:
> + * fscache_readpages_or_alloc) after delegating to the base filesystem
> + */
> +void __fscache_readpages_cancel(struct fscache_cookie *cookie,
> + struct list_head *pages)
> +{
> + struct page *page;
> +
> + list_for_each_entry(page, pages, lru) {
> + if (PageFsCache(page))
> + __fscache_uncache_page(cookie, page);
> + }
> +}
> +EXPORT_SYMBOL(__fscache_readpages_cancel);
> diff --git a/include/linux/fscache.h b/include/linux/fscache.h
> index 7a49e8f..c324177 100644
> --- a/include/linux/fscache.h
> +++ b/include/linux/fscache.h
> @@ -209,6 +209,8 @@ extern bool __fscache_maybe_release_page(struct fscache_cookie *, struct page *,
> gfp_t);
> extern void __fscache_uncache_all_inode_pages(struct fscache_cookie *,
> struct inode *);
> +extern void __fscache_readpages_cancel(struct fscache_cookie *cookie,
> + struct list_head *pages);
>
> /**
> * fscache_register_netfs - Register a filesystem as desiring caching services
> @@ -719,4 +721,24 @@ void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
> __fscache_uncache_all_inode_pages(cookie, inode);
> }
>
> +/**
> + * fscache_readpages_cancel
> + * @cookie: The cookie representing the inode's cache object.
> + * @pages: The netfs pages that we canceled write on in readpages()
> + *
> + * Uncache/unreserve the pages reserved earlier in readpages() via
> + * fscache_readpages_or_alloc(). In most successful caches in readpages() this
> + * doesn't do anything. In cases when the underlying netfs's readahead failed
> + * we need to cleanup the pagelist (unmark and uncache).
> + *
> + * This function may sleep (if it's calling to the cache backend).
> + */
> +static inline
> +void fscache_readpages_cancel(struct fscache_cookie *cookie,
> + struct list_head *pages)
> +{
> + if (fscache_cookie_valid(cookie))
> + __fscache_readpages_cancel(cookie, pages);
> +}
> +
> #endif /* _LINUX_FSCACHE_H */
> --
> 1.8.1.2
>

2013-09-04 15:49:25

by David Howells

[permalink] [raw]
Subject: Re: [PATCHv4 0/5] ceph: persistent caching with fscache

Sage Weil <[email protected]> wrote:

> David, are the fscache patches here ready for the next merge window? Do
> you have a preference for whose tree they go through?

There's only one problem - patch 1 needs to come _after_ patch 2 to avoid
breaking git bisect. Plus these patches 2 and 4 extend the fscache API
without adjusting the documentation - but that can be added later.

And I think Milosz deserves a beer (or other poison of his choice;-) for
finding a longstanding irritating bug.

I think AFS, CIFS, NFS and 9P all need patching too, but I can attend to that.

Should I take the patches through my tree? Then I can make the adjustments.

David

2013-09-04 15:54:37

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCHv4 0/5] ceph: persistent caching with fscache

Hi David!

On Wed, 4 Sep 2013, David Howells wrote:
> Sage Weil <[email protected]> wrote:
>
> > David, are the fscache patches here ready for the next merge window? Do
> > you have a preference for whose tree they go through?
>
> There's only one problem - patch 1 needs to come _after_ patch 2 to avoid
> breaking git bisect. Plus these patches 2 and 4 extend the fscache API
> without adjusting the documentation - but that can be added later.
>
> And I think Milosz deserves a beer (or other poison of his choice;-) for
> finding a longstanding irritating bug.
>
> I think AFS, CIFS, NFS and 9P all need patching too, but I can attend to that.
>
> Should I take the patches through my tree? Then I can make the adjustments.

Sure. Do you want the Ceph patches as well, or just the fscache bits?
I'll repost the latest version, as it's gotten several fixes squashed in.

Thanks!
sage

2013-09-04 16:24:13

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

Hongyi Jia <[email protected]> wrote:

> +bool __fscache_check_consistency(struct fscache_cookie *cookie)
> +{
> + struct fscache_object *object;
> +
> + if (cookie->def->type != FSCACHE_COOKIE_TYPE_DATAFILE)
> + return false;
> +
> + if (hlist_empty(&cookie->backing_objects))
> + return false;
> +
> + object = hlist_entry(cookie->backing_objects.first,
> + struct fscache_object, cookie_link);
> +
> + return object->cache->ops->check_consistency(object);
> +}

Hmmm... This isn't actually safe. You have to either:

(1) hold cookie->lock whilst touching the object pointer when coming from the
netfs side, or:

(2) set up an operation to do this (as, say, __fscache_alloc_page() does).

The problem is that you have nothing to defend against the object being
withdrawn by the cache under you.

David

2013-09-04 16:25:37

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCHv4 0/5] ceph: persistent caching with fscache

On Wed, 4 Sep 2013, Sage Weil wrote:
> Hi David!
>
> On Wed, 4 Sep 2013, David Howells wrote:
> > Sage Weil <[email protected]> wrote:
> >
> > > David, are the fscache patches here ready for the next merge window? Do
> > > you have a preference for whose tree they go through?
> >
> > There's only one problem - patch 1 needs to come _after_ patch 2 to avoid
> > breaking git bisect. Plus these patches 2 and 4 extend the fscache API
> > without adjusting the documentation - but that can be added later.
> >
> > And I think Milosz deserves a beer (or other poison of his choice;-) for
> > finding a longstanding irritating bug.
> >
> > I think AFS, CIFS, NFS and 9P all need patching too, but I can attend to that.
> >
> > Should I take the patches through my tree? Then I can make the adjustments.
>
> Sure. Do you want the Ceph patches as well, or just the fscache bits?
> I'll repost the latest version, as it's gotten several fixes squashed in.

The full series is here:

git://github.com/ceph/ceph-client wip-fscache
https://github.com/ceph/ceph-client/commits/wip-fscache

Thanks!
sage

2013-09-04 16:48:48

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

David,

If the cache is withdrawn and we're starting anew I would consider
that to okay. I would consider an empty page cache for a cookie to be
consistent since there's nothing stale that I can read. Unless there's
another synchronization issue that I'm missing in fscache.

Thanks,
- Milosz

On Wed, Sep 4, 2013 at 12:24 PM, David Howells <[email protected]> wrote:
> Hongyi Jia <[email protected]> wrote:
>
>> +bool __fscache_check_consistency(struct fscache_cookie *cookie)
>> +{
>> + struct fscache_object *object;
>> +
>> + if (cookie->def->type != FSCACHE_COOKIE_TYPE_DATAFILE)
>> + return false;
>> +
>> + if (hlist_empty(&cookie->backing_objects))
>> + return false;
>> +
>> + object = hlist_entry(cookie->backing_objects.first,
>> + struct fscache_object, cookie_link);
>> +
>> + return object->cache->ops->check_consistency(object);
>> +}
>
> Hmmm... This isn't actually safe. You have to either:
>
> (1) hold cookie->lock whilst touching the object pointer when coming from the
> netfs side, or:
>
> (2) set up an operation to do this (as, say, __fscache_alloc_page() does).
>
> The problem is that you have nothing to defend against the object being
> withdrawn by the cache under you.
>
> David

2013-09-04 17:26:25

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

Milosz Tanski <[email protected]> wrote:

> If the cache is withdrawn and we're starting anew I would consider
> that to okay. I would consider an empty page cache for a cookie to be
> consistent since there's nothing stale that I can read. Unless there's
> another synchronization issue that I'm missing in fscache.

The problem is that the fscache_object struct may be deallocated whilst you're
using it.

David

2013-09-04 18:04:27

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

David,

Is it as simple as stick a mutex at the top of the
__fscache_check_consistency function before we try to find the object?
This code should be called from a context that can sleep, in the Ceph
code we call it from a delayed work queue (revalidate queue).

-- Milosz

On Wed, Sep 4, 2013 at 1:26 PM, David Howells <[email protected]> wrote:
> Milosz Tanski <[email protected]> wrote:
>
>> If the cache is withdrawn and we're starting anew I would consider
>> that to okay. I would consider an empty page cache for a cookie to be
>> consistent since there's nothing stale that I can read. Unless there's
>> another synchronization issue that I'm missing in fscache.
>
> The problem is that the fscache_object struct may be deallocated whilst you're
> using it.
>
> David

2013-09-04 18:13:29

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

Milosz Tanski <[email protected]> wrote:

> Is it as simple as stick a mutex at the top of the
> __fscache_check_consistency function before we try to find the object?

You can lock a mutex in a function, but where are you actually going to place
the mutex struct? And what other code is going to take it? To do this, you'd
have to place the mutex struct in fscache_cookie. The problem is that you
have to protect the pointer from fscache_cookie to fscache_object - so placing
the mutex in fscache_object doesn't help.

David

2013-09-04 19:41:52

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

On Wed, Sep 4, 2013 at 2:13 PM, David Howells <[email protected]> wrote:
> Milosz Tanski <[email protected]> wrote:
>
>> Is it as simple as stick a mutex at the top of the
>> __fscache_check_consistency function before we try to find the object?
>
> You can lock a mutex in a function, but where are you actually going to place
> the mutex struct? And what other code is going to take it? To do this, you'd
> have to place the mutex struct in fscache_cookie. The problem is that you
> have to protect the pointer from fscache_cookie to fscache_object - so placing
> the mutex in fscache_object doesn't help.

David,

I meant lock cookie->lock inside of __fscache_check_consistency in the
beginning of the function. I don't see the need to push this into the
netfs code.

- Milosz

2013-09-04 22:03:03

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 2/5] new fscache interface to check cache consistency

David,

Is there anyway I can call you at (at your desk) during EST hours. I'd
like to talk this part through since I think we're going a bit in
circles. I'd like to get this fixed so we can submit the fscache for
Ceph code for the upstream kernel in the merge window.

Best,
-- Milosz

On Wed, Sep 4, 2013 at 3:41 PM, Milosz Tanski <[email protected]> wrote:
> On Wed, Sep 4, 2013 at 2:13 PM, David Howells <[email protected]> wrote:
>> Milosz Tanski <[email protected]> wrote:
>>
>>> Is it as simple as stick a mutex at the top of the
>>> __fscache_check_consistency function before we try to find the object?
>>
>> You can lock a mutex in a function, but where are you actually going to place
>> the mutex struct? And what other code is going to take it? To do this, you'd
>> have to place the mutex struct in fscache_cookie. The problem is that you
>> have to protect the pointer from fscache_cookie to fscache_object - so placing
>> the mutex in fscache_object doesn't help.
>
> David,
>
> I meant lock cookie->lock inside of __fscache_check_consistency in the
> beginning of the function. I don't see the need to push this into the
> netfs code.
>
> - Milosz

2013-09-05 12:34:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCHv4 0/5] ceph: persistent caching with fscache


I've pushed the non-Ceph bits of the bad-page fix to here:

http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache

and added a CIFS fix.

David