2013-09-05 22:20:33

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 0/8] ceph: fscache support & upstream changes

Hey gang I think this should be final revision of these changes. The changes
are:

* David rewrote the cookie validity check (that originally was written by
Hongyi Jia). You might have seen some emails flying about doing it the
right way.
* I added crash fix when for Ceph filesystems mounted with nofsc (default)
when fscache is compiled into Ceph. Previously it would crash trying to
enqueue invalidate checks in the work queue because we didn't initialize
if the mount had fscache disabled.

I've tested both changes on my cluster. You can get get these changes from my
branch in bitbucket. It contains the upstream wip-fscache branch rebased with
David's rewrite of Hongyi Jia's changes.

The branch is located at.

https://bitbucket.org/adfin/linux-fs.git in the wip-fscahce branch

Finally, David requested that this patchset go through the Ceph tree. The tree
should have all the proper sign off from David. I also CC'ed him so he can give
his final okay.

Best,
- Milosz

David Howells (2):
FS-Cache: Add interface to check consistency of a cached object
CacheFiles: Implement interface to check cache consistency

Milosz Tanski (6):
fscache: Netfs function for cleanup post readpages
ceph: use fscache as a local presisent cache
ceph: clean PgPrivate2 on returning from readpages
ceph: ceph_readpage_to_fscache didn't check if marked
ceph: page still marked private_2
ceph: Do not do invalidate if the filesystem is mounted nofsc

Documentation/filesystems/caching/backend-api.txt | 9 +
Documentation/filesystems/caching/netfs-api.txt | 35 +-
fs/cachefiles/interface.c | 26 ++
fs/cachefiles/internal.h | 1 +
fs/cachefiles/xattr.c | 36 ++
fs/ceph/Kconfig | 9 +
fs/ceph/Makefile | 1 +
fs/ceph/addr.c | 40 ++-
fs/ceph/cache.c | 400 +++++++++++++++++++++
fs/ceph/cache.h | 157 ++++++++
fs/ceph/caps.c | 19 +-
fs/ceph/file.c | 17 +
fs/ceph/inode.c | 14 +-
fs/ceph/super.c | 35 +-
fs/ceph/super.h | 16 +
fs/fscache/cookie.c | 69 ++++
fs/fscache/internal.h | 6 +
fs/fscache/page.c | 71 ++--
include/linux/fscache-cache.h | 4 +
include/linux/fscache.h | 42 +++
20 files changed, 965 insertions(+), 42 deletions(-)
create mode 100644 fs/ceph/cache.c
create mode 100644 fs/ceph/cache.h

--
1.7.9.5


2013-09-05 22:21:13

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 1/8] FS-Cache: Add interface to check consistency of a cached object

Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call. FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Hongyi Jia <[email protected]>
cc: Milosz Tanski <[email protected]>
---
Documentation/filesystems/caching/backend-api.txt | 9 +++
Documentation/filesystems/caching/netfs-api.txt | 17 +++--
fs/fscache/cookie.c | 69 +++++++++++++++++++++
fs/fscache/internal.h | 6 ++
fs/fscache/page.c | 55 +++++++++-------
include/linux/fscache-cache.h | 4 ++
include/linux/fscache.h | 20 ++++++
7 files changed, 152 insertions(+), 28 deletions(-)

diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt
index d78bab9..277d1e8 100644
--- a/Documentation/filesystems/caching/backend-api.txt
+++ b/Documentation/filesystems/caching/backend-api.txt
@@ -299,6 +299,15 @@ performed on the denizens of the cache. These are held in a structure of type:
enough space in the cache to permit this.


+ (*) Check coherency state of an object [mandatory]:
+
+ int (*check_consistency)(struct fscache_object *object)
+
+ This method is called to have the cache check the saved auxiliary data of
+ the object against the netfs's idea of the state. 0 should be returned
+ if they're consistent and -ESTALE otherwise. -ENOMEM and -ERESTARTSYS
+ may also be returned.
+
(*) Update object [mandatory]:

int (*update_object)(struct fscache_object *object)
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
index 97e6c0e..12b3442 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -32,7 +32,7 @@ This document contains the following sections:
(9) Setting the data file size
(10) Page alloc/read/write
(11) Page uncaching
- (12) Index and data file update
+ (12) Index and data file consistency
(13) Miscellaneous cookie operations
(14) Cookie unregistration
(15) Index invalidation
@@ -690,9 +690,18 @@ written to the cache and for the cache to finish with the page generally. No
error is returned.


-==========================
-INDEX AND DATA FILE UPDATE
-==========================
+===============================
+INDEX AND DATA FILE CONSISTENCY
+===============================
+
+To find out whether auxiliary data for an object is up to data within the
+cache, the following function can be called:
+
+ int fscache_check_consistency(struct fscache_cookie *cookie)
+
+This will call back to the netfs to check whether the auxiliary data associated
+with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it
+may also return -ENOMEM and -ERESTARTSYS.

To request an update of the index data for an index or other object, the
following function should be called:
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 0e91a3c..2ef4c9d 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -558,3 +558,72 @@ void __fscache_cookie_put(struct fscache_cookie *cookie)

_leave("");
}
+
+/*
+ * check the consistency between the netfs inode and the backing cache
+ *
+ * NOTE: it only serves no-index type
+ */
+int __fscache_check_consistency(struct fscache_cookie *cookie)
+{
+ struct fscache_operation *op;
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,", cookie);
+
+ ASSERTCMP(cookie->def->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE);
+
+ if (fscache_wait_for_deferred_lookup(cookie) < 0)
+ return -ERESTARTSYS;
+
+ if (hlist_empty(&cookie->backing_objects))
+ return 0;
+
+ op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY);
+ if (!op)
+ return -ENOMEM;
+
+ fscache_operation_init(op, NULL, NULL);
+ op->flags = FSCACHE_OP_MYTHREAD |
+ (1 << FSCACHE_OP_WAITING);
+
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto inconsistent;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+ if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+ goto inconsistent;
+
+ op->debug_id = atomic_inc_return(&fscache_op_debug_id);
+
+ atomic_inc(&cookie->n_active);
+ if (fscache_submit_op(object, op) < 0)
+ goto submit_failed;
+
+ /* the work queue now carries its own ref on the object */
+ spin_unlock(&cookie->lock);
+
+ ret = fscache_wait_for_operation_activation(object, op,
+ NULL, NULL, NULL);
+ if (ret == 0)
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->check_consistency(op);
+ else if (ret == -ENOBUFS)
+ ret = 0;
+
+ fscache_put_operation(op);
+ _leave(" = %d", ret);
+ return ret;
+
+submit_failed:
+ atomic_dec(&cookie->n_active);
+inconsistent:
+ spin_unlock(&cookie->lock);
+ kfree(op);
+ _leave(" = -ESTALE");
+ return -ESTALE;
+}
+EXPORT_SYMBOL(__fscache_check_consistency);
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 12d505b..4226f66 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -130,6 +130,12 @@ extern void fscache_operation_gc(struct work_struct *);
/*
* page.c
*/
+extern int fscache_wait_for_deferred_lookup(struct fscache_cookie *);
+extern int fscache_wait_for_operation_activation(struct fscache_object *,
+ struct fscache_operation *,
+ atomic_t *,
+ atomic_t *,
+ void (*)(struct fscache_operation *));
extern void fscache_invalidate_writes(struct fscache_cookie *);

/*
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index d479ab3..793e3d5 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -278,7 +278,7 @@ static struct fscache_retrieval *fscache_alloc_retrieval(
/*
* wait for a deferred lookup to complete
*/
-static int fscache_wait_for_deferred_lookup(struct fscache_cookie *cookie)
+int fscache_wait_for_deferred_lookup(struct fscache_cookie *cookie)
{
unsigned long jif;

@@ -322,42 +322,46 @@ static void fscache_do_cancel_retrieval(struct fscache_operation *_op)
/*
* wait for an object to become active (or dead)
*/
-static int fscache_wait_for_retrieval_activation(struct fscache_object *object,
- struct fscache_retrieval *op,
- atomic_t *stat_op_waits,
- atomic_t *stat_object_dead)
+int fscache_wait_for_operation_activation(struct fscache_object *object,
+ struct fscache_operation *op,
+ atomic_t *stat_op_waits,
+ atomic_t *stat_object_dead,
+ void (*do_cancel)(struct fscache_operation *))
{
int ret;

- if (!test_bit(FSCACHE_OP_WAITING, &op->op.flags))
+ if (!test_bit(FSCACHE_OP_WAITING, &op->flags))
goto check_if_dead;

_debug(">>> WT");
- fscache_stat(stat_op_waits);
- if (wait_on_bit(&op->op.flags, FSCACHE_OP_WAITING,
+ if (stat_op_waits)
+ fscache_stat(stat_op_waits);
+ if (wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
fscache_wait_bit_interruptible,
TASK_INTERRUPTIBLE) != 0) {
- ret = fscache_cancel_op(&op->op, fscache_do_cancel_retrieval);
+ ret = fscache_cancel_op(op, do_cancel);
if (ret == 0)
return -ERESTARTSYS;

/* it's been removed from the pending queue by another party,
* so we should get to run shortly */
- wait_on_bit(&op->op.flags, FSCACHE_OP_WAITING,
+ wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
fscache_wait_bit, TASK_UNINTERRUPTIBLE);
}
_debug("<<< GO");

check_if_dead:
- if (op->op.state == FSCACHE_OP_ST_CANCELLED) {
- fscache_stat(stat_object_dead);
+ if (op->state == FSCACHE_OP_ST_CANCELLED) {
+ if (stat_object_dead)
+ fscache_stat(stat_object_dead);
_leave(" = -ENOBUFS [cancelled]");
return -ENOBUFS;
}
if (unlikely(fscache_object_is_dead(object))) {
- pr_err("%s() = -ENOBUFS [obj dead %d]\n", __func__, op->op.state);
- fscache_cancel_op(&op->op, fscache_do_cancel_retrieval);
- fscache_stat(stat_object_dead);
+ pr_err("%s() = -ENOBUFS [obj dead %d]\n", __func__, op->state);
+ fscache_cancel_op(op, do_cancel);
+ if (stat_object_dead)
+ fscache_stat(stat_object_dead);
return -ENOBUFS;
}
return 0;
@@ -432,10 +436,11 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,

/* we wait for the operation to become active, and then process it
* *here*, in this thread, and not in the thread pool */
- ret = fscache_wait_for_retrieval_activation(
- object, op,
+ ret = fscache_wait_for_operation_activation(
+ object, &op->op,
__fscache_stat(&fscache_n_retrieval_op_waits),
- __fscache_stat(&fscache_n_retrievals_object_dead));
+ __fscache_stat(&fscache_n_retrievals_object_dead),
+ fscache_do_cancel_retrieval);
if (ret < 0)
goto error;

@@ -557,10 +562,11 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,

/* we wait for the operation to become active, and then process it
* *here*, in this thread, and not in the thread pool */
- ret = fscache_wait_for_retrieval_activation(
- object, op,
+ ret = fscache_wait_for_operation_activation(
+ object, &op->op,
__fscache_stat(&fscache_n_retrieval_op_waits),
- __fscache_stat(&fscache_n_retrievals_object_dead));
+ __fscache_stat(&fscache_n_retrievals_object_dead),
+ fscache_do_cancel_retrieval);
if (ret < 0)
goto error;

@@ -658,10 +664,11 @@ int __fscache_alloc_page(struct fscache_cookie *cookie,

fscache_stat(&fscache_n_alloc_ops);

- ret = fscache_wait_for_retrieval_activation(
- object, op,
+ ret = fscache_wait_for_operation_activation(
+ object, &op->op,
__fscache_stat(&fscache_n_alloc_op_waits),
- __fscache_stat(&fscache_n_allocs_object_dead));
+ __fscache_stat(&fscache_n_allocs_object_dead),
+ fscache_do_cancel_retrieval);
if (ret < 0)
goto error;

diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index a9ff9a3..7823e9e 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -251,6 +251,10 @@ struct fscache_cache_ops {
/* unpin an object in the cache */
void (*unpin_object)(struct fscache_object *object);

+ /* check the consistency between the backing cache and the FS-Cache
+ * cookie */
+ bool (*check_consistency)(struct fscache_operation *op);
+
/* store the updated auxiliary data on an object */
void (*update_object)(struct fscache_object *object);

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 7a08623..d984aff 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -183,6 +183,7 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
const struct fscache_cookie_def *,
void *);
extern void __fscache_relinquish_cookie(struct fscache_cookie *, int);
+extern int __fscache_check_consistency(struct fscache_cookie *);
extern void __fscache_update_cookie(struct fscache_cookie *);
extern int __fscache_attr_changed(struct fscache_cookie *);
extern void __fscache_invalidate(struct fscache_cookie *);
@@ -326,6 +327,25 @@ void fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
}

/**
+ * fscache_check_consistency - Request that if the cache is updated
+ * @cookie: The cookie representing the cache object
+ *
+ * Request an consistency check from fscache, which passes the request
+ * to the backing cache.
+ *
+ * Returns 0 if consistent and -ESTALE if inconsistent. May also
+ * return -ENOMEM and -ERESTARTSYS.
+ */
+static inline
+int fscache_check_consistency(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_check_consistency(cookie);
+ else
+ return 0;
+}
+
+/**
* fscache_update_cookie - Request that a cache object be updated
* @cookie: The cookie representing the cache object
*
--
1.7.9.5

2013-09-05 22:21:28

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 2/8] CacheFiles: Implement interface to check cache consistency

Implement the FS-Cache interface to check the consistency of a cache object in
CacheFiles.

Original-author: Hongyi Jia <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Hongyi Jia <[email protected]>
cc: Milosz Tanski <[email protected]>
---
fs/cachefiles/interface.c | 26 ++++++++++++++++++++++++++
fs/cachefiles/internal.h | 1 +
fs/cachefiles/xattr.c | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index d4c1206..43eb559 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -378,6 +378,31 @@ static void cachefiles_sync_cache(struct fscache_cache *_cache)
}

/*
+ * check if the backing cache is updated to FS-Cache
+ * - called by FS-Cache when evaluates if need to invalidate the cache
+ */
+static bool cachefiles_check_consistency(struct fscache_operation *op)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ const struct cred *saved_cred;
+ int ret;
+
+ _enter("{OBJ%x}", op->object->debug_id);
+
+ object = container_of(op->object, struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = cachefiles_check_auxdata(object);
+ cachefiles_end_secure(cache, saved_cred);
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
* notification the attributes on an object have changed
* - called with reads/writes excluded by FS-Cache
*/
@@ -522,4 +547,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.write_page = cachefiles_write_page,
.uncache_page = cachefiles_uncache_page,
.dissociate_pages = cachefiles_dissociate_pages,
+ .check_consistency = cachefiles_check_consistency,
};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 4938251..5349473 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -235,6 +235,7 @@ extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
struct cachefiles_xattr *auxdata);
extern int cachefiles_update_object_xattr(struct cachefiles_object *object,
struct cachefiles_xattr *auxdata);
+extern int cachefiles_check_auxdata(struct cachefiles_object *object);
extern int cachefiles_check_object_xattr(struct cachefiles_object *object,
struct cachefiles_xattr *auxdata);
extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 2476e51..34c88b8 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -157,6 +157,42 @@ int cachefiles_update_object_xattr(struct cachefiles_object *object,
}

/*
+ * check the consistency between the backing cache and the FS-Cache cookie
+ */
+int cachefiles_check_auxdata(struct cachefiles_object *object)
+{
+ struct cachefiles_xattr *auxbuf;
+ struct dentry *dentry = object->dentry;
+ unsigned int dlen;
+ int ret;
+
+ ASSERT(dentry);
+ ASSERT(dentry->d_inode);
+ ASSERT(object->fscache.cookie->def->check_aux);
+
+ auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
+ if (!auxbuf)
+ return -ENOMEM;
+
+ auxbuf->len = vfs_getxattr(dentry, cachefiles_xattr_cache,
+ &auxbuf->type, 512 + 1);
+ if (auxbuf->len < 1)
+ return -ESTALE;
+
+ if (auxbuf->type != object->fscache.cookie->def->type)
+ return -ESTALE;
+
+ dlen = auxbuf->len - 1;
+ ret = fscache_check_aux(&object->fscache, &auxbuf->data, dlen);
+
+ kfree(auxbuf);
+ if (ret != FSCACHE_CHECKAUX_OKAY)
+ return -ESTALE;
+
+ return 0;
+}
+
+/*
* check the state xattr on a cache file
* - return -ESTALE if the object should be deleted
*/
--
1.7.9.5

2013-09-05 22:21:51

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 3/8] fscache: Netfs function for cleanup post readpages

Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback. It marks all the pages in the list
provided by readahead with PG_private_2. In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG. This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages. It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS. It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345] [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356] [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363] [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365] [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376] [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379] [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382] [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384] [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391] [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395] [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405] [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418] [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422] [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425] [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431] [<ffffffff81558818>] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski <[email protected]>
Signed-off-by: David Howells <[email protected]>
---
Documentation/filesystems/caching/netfs-api.txt | 18 +++++++++++++++++-
fs/fscache/page.c | 16 ++++++++++++++++
include/linux/fscache.h | 22 ++++++++++++++++++++++
3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
index 12b3442..0c2329d 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -499,7 +499,7 @@ Else if there's a copy of the page resident in the cache:
(*) An argument that's 0 on success or negative for an error code.

If an error occurs, it should be assumed that the page contains no usable
- data.
+ data. fscache_readpages_cancel() may need to be called.

end_io_func() will be called in process context if the read is results in
an error, but it might be called in interrupt context if the read is
@@ -623,6 +623,22 @@ some of the pages being read and some being allocated. Those pages will have
been marked appropriately and will need uncaching.


+CANCELLATION OF UNREAD PAGES
+----------------------------
+
+If one or more pages are passed to fscache_read_or_alloc_pages() but not then
+read from the cache and also not read from the underlying filesystem then
+those pages will need to have any marks and reservations removed. This can be
+done by calling:
+
+ void fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages);
+
+prior to returning to the caller. The cookie argument should be as passed to
+fscache_read_or_alloc_pages(). Every page in the pages list will be examined
+and any that have PG_fscache set will be uncached.
+
+
==============
PAGE UNCACHING
==============
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 793e3d5..8702b73 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -701,6 +701,22 @@ nobufs:
EXPORT_SYMBOL(__fscache_alloc_page);

/*
+ * Unmark pages allocate in the readahead code path (via:
+ * fscache_readpages_or_alloc) after delegating to the base filesystem
+ */
+void __fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages)
+{
+ struct page *page;
+
+ list_for_each_entry(page, pages, lru) {
+ if (PageFsCache(page))
+ __fscache_uncache_page(cookie, page);
+ }
+}
+EXPORT_SYMBOL(__fscache_readpages_cancel);
+
+/*
* release a write op reference
*/
static void fscache_release_write_op(struct fscache_operation *_op)
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index d984aff..19b4645 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -209,6 +209,8 @@ extern bool __fscache_maybe_release_page(struct fscache_cookie *, struct page *,
gfp_t);
extern void __fscache_uncache_all_inode_pages(struct fscache_cookie *,
struct inode *);
+extern void __fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages);

/**
* fscache_register_netfs - Register a filesystem as desiring caching services
@@ -590,6 +592,26 @@ int fscache_alloc_page(struct fscache_cookie *cookie,
}

/**
+ * fscache_readpages_cancel - Cancel read/alloc on pages
+ * @cookie: The cookie representing the inode's cache object.
+ * @pages: The netfs pages that we canceled write on in readpages()
+ *
+ * Uncache/unreserve the pages reserved earlier in readpages() via
+ * fscache_readpages_or_alloc() and similar. In most successful caches in
+ * readpages() this doesn't do anything. In cases when the underlying netfs's
+ * readahead failed we need to clean up the pagelist (unmark and uncache).
+ *
+ * This function may sleep as it may have to clean up disk state.
+ */
+static inline
+void fscache_readpages_cancel(struct fscache_cookie *cookie,
+ struct list_head *pages)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_readpages_cancel(cookie, pages);
+}
+
+/**
* fscache_write_page - Request storage of a page in the cache
* @cookie: The cookie representing the cache object
* @page: The netfs page to store
--
1.7.9.5

2013-09-05 22:22:26

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 4/8] ceph: use fscache as a local presisent cache

Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)

In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.

Signed-off-by: Milosz Tanski <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
---
fs/ceph/Kconfig | 9 ++
fs/ceph/Makefile | 1 +
fs/ceph/addr.c | 37 ++++-
fs/ceph/cache.c | 393 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ceph/cache.h | 138 +++++++++++++++++++
fs/ceph/caps.c | 19 ++-
fs/ceph/file.c | 17 +++
fs/ceph/inode.c | 14 +-
fs/ceph/super.c | 35 ++++-
fs/ceph/super.h | 16 +++
10 files changed, 666 insertions(+), 13 deletions(-)
create mode 100644 fs/ceph/cache.c
create mode 100644 fs/ceph/cache.h

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 49bc782..ac9a2ef 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -16,3 +16,12 @@ config CEPH_FS

If unsure, say N.

+if CEPH_FS
+config CEPH_FSCACHE
+ bool "Enable Ceph client caching support"
+ depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
+ help
+ Choose Y here to enable persistent, read-only local
+ caching support for Ceph clients using FS-Cache
+
+endif
diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index bd35212..32e3010 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -9,3 +9,4 @@ ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
mds_client.o mdsmap.o strings.o ceph_frag.o \
debugfs.o

+ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 3bed7da..3a21a7c 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -11,6 +11,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"
#include <linux/ceph/osd_client.h>

/*
@@ -144,6 +145,11 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
return;
}

+ ceph_invalidate_fscache_page(inode, page);
+
+ if (!PagePrivate(page))
+ return;
+
/*
* We can get non-dirty pages here due to races between
* set_page_dirty and truncate_complete_page; just spit out a
@@ -163,14 +169,17 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
ClearPagePrivate(page);
}

-/* just a sanity check */
static int ceph_releasepage(struct page *page, gfp_t g)
{
struct inode *inode = page->mapping ? page->mapping->host : NULL;
dout("%p releasepage %p idx %lu\n", inode, page, page->index);
WARN_ON(PageDirty(page));
- WARN_ON(PagePrivate(page));
- return 0;
+
+ /* Can we release the page from the cache? */
+ if (!ceph_release_fscache_page(page, g))
+ return 0;
+
+ return !PagePrivate(page);
}

/*
@@ -180,11 +189,16 @@ static int readpage_nounlock(struct file *filp, struct page *page)
{
struct inode *inode = file_inode(filp);
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_osd_client *osdc =
+ struct ceph_osd_client *osdc =
&ceph_inode_to_client(inode)->client->osdc;
int err = 0;
u64 len = PAGE_CACHE_SIZE;

+ err = ceph_readpage_from_fscache(inode, page);
+
+ if (err == 0)
+ goto out;
+
dout("readpage inode %p file %p page %p index %lu\n",
inode, filp, page, page->index);
err = ceph_osdc_readpages(osdc, ceph_vino(inode), &ci->i_layout,
@@ -202,6 +216,9 @@ static int readpage_nounlock(struct file *filp, struct page *page)
}
SetPageUptodate(page);

+ if (err == 0)
+ ceph_readpage_to_fscache(inode, page);
+
out:
return err < 0 ? err : 0;
}
@@ -244,6 +261,7 @@ static void finish_read(struct ceph_osd_request *req, struct ceph_msg *msg)
page->index);
flush_dcache_page(page);
SetPageUptodate(page);
+ ceph_readpage_to_fscache(inode, page);
unlock_page(page);
page_cache_release(page);
bytes -= PAGE_CACHE_SIZE;
@@ -313,7 +331,7 @@ static int start_read(struct inode *inode, struct list_head *page_list, int max)
page = list_entry(page_list->prev, struct page, lru);
BUG_ON(PageLocked(page));
list_del(&page->lru);
-
+
dout("start_read %p adding %p idx %lu\n", inode, page,
page->index);
if (add_to_page_cache_lru(page, &inode->i_data, page->index,
@@ -360,6 +378,12 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
int rc = 0;
int max = 0;

+ rc = ceph_readpages_from_fscache(mapping->host, mapping, page_list,
+ &nr_pages);
+
+ if (rc == 0)
+ goto out;
+
if (fsc->mount_options->rsize >= PAGE_CACHE_SIZE)
max = (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
>> PAGE_SHIFT;
@@ -479,6 +503,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
set_bdi_congested(&fsc->backing_dev_info, BLK_RW_ASYNC);

+ ceph_readpage_to_fscache(inode, page);
+
set_page_writeback(page);
err = ceph_osdc_writepages(osdc, ceph_vino(inode),
&ci->i_layout, snapc,
@@ -534,7 +560,6 @@ static void ceph_release_pages(struct page **pages, int num)
pagevec_release(&pvec);
}

-
/*
* async writeback completion handler.
*
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
new file mode 100644
index 0000000..5c413ec
--- /dev/null
+++ b/fs/ceph/cache.c
@@ -0,0 +1,393 @@
+/*
+ * Ceph cache definitions.
+ *
+ * Copyright (C) 2013 by Adfin Solutions, Inc. All Rights Reserved.
+ * Written by Milosz Tanski ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to:
+ * Free Software Foundation
+ * 51 Franklin Street, Fifth Floor
+ * Boston, MA 02111-1301 USA
+ *
+ */
+
+#include <linux/fscache.h>
+
+#include "super.h"
+#include "cache.h"
+
+struct ceph_aux_inode {
+ struct timespec mtime;
+ loff_t size;
+};
+
+struct fscache_netfs ceph_cache_netfs = {
+ .name = "ceph",
+ .version = 0,
+};
+
+static uint16_t ceph_fscache_session_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t maxbuf)
+{
+ const struct ceph_fs_client* fsc = cookie_netfs_data;
+ uint16_t klen;
+
+ klen = sizeof(fsc->client->fsid);
+ if (klen > maxbuf)
+ return 0;
+
+ memcpy(buffer, &fsc->client->fsid, klen);
+ return klen;
+}
+
+static const struct fscache_cookie_def ceph_fscache_fsid_object_def = {
+ .name = "CEPH.fsid",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = ceph_fscache_session_get_key,
+};
+
+int ceph_fscache_register()
+{
+ return fscache_register_netfs(&ceph_cache_netfs);
+}
+
+void ceph_fscache_unregister()
+{
+ fscache_unregister_netfs(&ceph_cache_netfs);
+}
+
+int ceph_fscache_register_fs(struct ceph_fs_client* fsc)
+{
+ fsc->fscache = fscache_acquire_cookie(ceph_cache_netfs.primary_index,
+ &ceph_fscache_fsid_object_def,
+ fsc);
+
+ if (fsc->fscache == NULL) {
+ pr_err("Unable to resgister fsid: %p fscache cookie", fsc);
+ return 0;
+ }
+
+ fsc->revalidate_wq = alloc_workqueue("ceph-revalidate", 0, 1);
+ if (fsc->revalidate_wq == NULL)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static uint16_t ceph_fscache_inode_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t maxbuf)
+{
+ const struct ceph_inode_info* ci = cookie_netfs_data;
+ uint16_t klen;
+
+ /* use ceph virtual inode (id + snaphot) */
+ klen = sizeof(ci->i_vino);
+ if (klen > maxbuf)
+ return 0;
+
+ memcpy(buffer, &ci->i_vino, klen);
+ return klen;
+}
+
+static uint16_t ceph_fscache_inode_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ struct ceph_aux_inode aux;
+ const struct ceph_inode_info* ci = cookie_netfs_data;
+ const struct inode* inode = &ci->vfs_inode;
+
+ memset(&aux, 0, sizeof(aux));
+ aux.mtime = inode->i_mtime;
+ aux.size = inode->i_size;
+
+ memcpy(buffer, &aux, sizeof(aux));
+
+ return sizeof(aux);
+}
+
+static void ceph_fscache_inode_get_attr(const void *cookie_netfs_data,
+ uint64_t *size)
+{
+ const struct ceph_inode_info* ci = cookie_netfs_data;
+ const struct inode* inode = &ci->vfs_inode;
+
+ *size = inode->i_size;
+}
+
+static enum fscache_checkaux ceph_fscache_inode_check_aux(
+ void *cookie_netfs_data, const void *data, uint16_t dlen)
+{
+ struct ceph_aux_inode aux;
+ struct ceph_inode_info* ci = cookie_netfs_data;
+ struct inode* inode = &ci->vfs_inode;
+
+ if (dlen != sizeof(aux))
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ memset(&aux, 0, sizeof(aux));
+ aux.mtime = inode->i_mtime;
+ aux.size = inode->i_size;
+
+ if (memcmp(data, &aux, sizeof(aux)) != 0)
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ dout("ceph inode 0x%p cached okay", ci);
+ return FSCACHE_CHECKAUX_OKAY;
+}
+
+static void ceph_fscache_inode_now_uncached(void* cookie_netfs_data)
+{
+ struct ceph_inode_info* ci = cookie_netfs_data;
+ struct pagevec pvec;
+ pgoff_t first;
+ int loop, nr_pages;
+
+ pagevec_init(&pvec, 0);
+ first = 0;
+
+ dout("ceph inode 0x%p now uncached", ci);
+
+ while (1) {
+ nr_pages = pagevec_lookup(&pvec, ci->vfs_inode.i_mapping, first,
+ PAGEVEC_SIZE - pagevec_count(&pvec));
+
+ if (!nr_pages)
+ break;
+
+ for (loop = 0; loop < nr_pages; loop++)
+ ClearPageFsCache(pvec.pages[loop]);
+
+ first = pvec.pages[nr_pages - 1]->index + 1;
+
+ pvec.nr = nr_pages;
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+}
+
+static const struct fscache_cookie_def ceph_fscache_inode_object_def = {
+ .name = "CEPH.inode",
+ .type = FSCACHE_COOKIE_TYPE_DATAFILE,
+ .get_key = ceph_fscache_inode_get_key,
+ .get_attr = ceph_fscache_inode_get_attr,
+ .get_aux = ceph_fscache_inode_get_aux,
+ .check_aux = ceph_fscache_inode_check_aux,
+ .now_uncached = ceph_fscache_inode_now_uncached,
+};
+
+void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc,
+ struct ceph_inode_info* ci)
+{
+ struct inode* inode = &ci->vfs_inode;
+
+ /* No caching for filesystem */
+ if (fsc->fscache == NULL)
+ return;
+
+ /* Only cache for regular files that are read only */
+ if ((ci->vfs_inode.i_mode & S_IFREG) == 0)
+ return;
+
+ /* Avoid multiple racing open requests */
+ mutex_lock(&inode->i_mutex);
+
+ if (ci->fscache)
+ goto done;
+
+ ci->fscache = fscache_acquire_cookie(fsc->fscache,
+ &ceph_fscache_inode_object_def,
+ ci);
+done:
+ mutex_unlock(&inode->i_mutex);
+
+}
+
+void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+{
+ struct fscache_cookie* cookie;
+
+ if ((cookie = ci->fscache) == NULL)
+ return;
+
+ ci->fscache = NULL;
+
+ fscache_uncache_all_inode_pages(cookie, &ci->vfs_inode);
+ fscache_relinquish_cookie(cookie, 0);
+}
+
+static void ceph_vfs_readpage_complete(struct page *page, void *data, int error)
+{
+ if (!error)
+ SetPageUptodate(page);
+}
+
+static void ceph_vfs_readpage_complete_unlock(struct page *page, void *data, int error)
+{
+ if (!error)
+ SetPageUptodate(page);
+
+ unlock_page(page);
+}
+
+static inline int cache_valid(struct ceph_inode_info *ci)
+{
+ return ((ceph_caps_issued(ci) & CEPH_CAP_FILE_CACHE) &&
+ (ci->i_fscache_gen == ci->i_rdcache_gen));
+}
+
+
+/* Atempt to read from the fscache,
+ *
+ * This function is called from the readpage_nounlock context. DO NOT attempt to
+ * unlock the page here (or in the callback).
+ */
+int ceph_readpage_from_fscache(struct inode *inode, struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ int ret;
+
+ if (!cache_valid(ci))
+ return -ENOBUFS;
+
+ ret = fscache_read_or_alloc_page(ci->fscache, page,
+ ceph_vfs_readpage_complete, NULL,
+ GFP_KERNEL);
+
+ switch (ret) {
+ case 0: /* Page found */
+ dout("page read submitted\n");
+ return 0;
+ case -ENOBUFS: /* Pages were not found, and can't be */
+ case -ENODATA: /* Pages were not found */
+ dout("page/inode not in cache\n");
+ return ret;
+ default:
+ dout("%s: unknown error ret = %i\n", __func__, ret);
+ return ret;
+ }
+}
+
+int ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ int ret;
+
+ if (!cache_valid(ci))
+ return -ENOBUFS;
+
+ ret = fscache_read_or_alloc_pages(ci->fscache, mapping, pages, nr_pages,
+ ceph_vfs_readpage_complete_unlock,
+ NULL, mapping_gfp_mask(mapping));
+
+ switch (ret) {
+ case 0: /* All pages found */
+ dout("all-page read submitted\n");
+ return 0;
+ case -ENOBUFS: /* Some pages were not found, and can't be */
+ case -ENODATA: /* some pages were not found */
+ dout("page/inode not in cache\n");
+ return ret;
+ default:
+ dout("%s: unknown error ret = %i\n", __func__, ret);
+ return ret;
+ }
+}
+
+void ceph_readpage_to_fscache(struct inode *inode, struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ int ret;
+
+ if (!cache_valid(ci))
+ return;
+
+ ret = fscache_write_page(ci->fscache, page, GFP_KERNEL);
+ if (ret)
+ fscache_uncache_page(ci->fscache, page);
+}
+
+void ceph_invalidate_fscache_page(struct inode* inode, struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ fscache_wait_on_page_write(ci->fscache, page);
+ fscache_uncache_page(ci->fscache, page);
+}
+
+void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc)
+{
+ if (fsc->revalidate_wq)
+ destroy_workqueue(fsc->revalidate_wq);
+
+ fscache_relinquish_cookie(fsc->fscache, 0);
+ fsc->fscache = NULL;
+}
+
+static void ceph_revalidate_work(struct work_struct *work)
+{
+ int issued;
+ u32 orig_gen;
+ struct ceph_inode_info *ci = container_of(work, struct ceph_inode_info,
+ i_revalidate_work);
+ struct inode *inode = &ci->vfs_inode;
+
+ spin_lock(&ci->i_ceph_lock);
+ issued = __ceph_caps_issued(ci, NULL);
+ orig_gen = ci->i_rdcache_gen;
+ spin_unlock(&ci->i_ceph_lock);
+
+ if (!(issued & CEPH_CAP_FILE_CACHE)) {
+ dout("revalidate_work lost cache before validation %p\n",
+ inode);
+ goto out;
+ }
+
+ if (!fscache_check_consistency(ci->fscache))
+ fscache_invalidate(ci->fscache);
+
+ spin_lock(&ci->i_ceph_lock);
+ /* Update the new valid generation (backwards sanity check too) */
+ if (orig_gen > ci->i_fscache_gen) {
+ ci->i_fscache_gen = orig_gen;
+ }
+ spin_unlock(&ci->i_ceph_lock);
+
+out:
+ iput(&ci->vfs_inode);
+}
+
+void ceph_queue_revalidate(struct inode *inode)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ ihold(inode);
+
+ if (queue_work(ceph_sb_to_client(inode->i_sb)->revalidate_wq,
+ &ci->i_revalidate_work)) {
+ dout("ceph_queue_revalidate %p\n", inode);
+ } else {
+ dout("ceph_queue_revalidate %p failed\n)", inode);
+ iput(inode);
+ }
+}
+
+void ceph_fscache_inode_init(struct ceph_inode_info *ci)
+{
+ ci->fscache = NULL;
+ /* The first load is verifed cookie open time */
+ ci->i_fscache_gen = 1;
+ INIT_WORK(&ci->i_revalidate_work, ceph_revalidate_work);
+}
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
new file mode 100644
index 0000000..0ea95cb
--- /dev/null
+++ b/fs/ceph/cache.h
@@ -0,0 +1,138 @@
+/*
+ * Ceph cache definitions.
+ *
+ * Copyright (C) 2013 by Adfin Solutions, Inc. All Rights Reserved.
+ * Written by Milosz Tanski ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to:
+ * Free Software Foundation
+ * 51 Franklin Street, Fifth Floor
+ * Boston, MA 02111-1301 USA
+ *
+ */
+
+#ifndef _CEPH_CACHE_H
+#define _CEPH_CACHE_H
+
+#ifdef CONFIG_CEPH_FSCACHE
+
+int ceph_fscache_register(void);
+void ceph_fscache_unregister(void);
+
+int ceph_fscache_register_fs(struct ceph_fs_client* fsc);
+void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc);
+
+void ceph_fscache_inode_init(struct ceph_inode_info *ci);
+void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc,
+ struct ceph_inode_info* ci);
+void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci);
+
+int ceph_readpage_from_fscache(struct inode *inode, struct page *page);
+int ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages);
+void ceph_readpage_to_fscache(struct inode *inode, struct page *page);
+void ceph_invalidate_fscache_page(struct inode* inode, struct page *page);
+void ceph_queue_revalidate(struct inode *inode);
+
+static inline void ceph_fscache_invalidate(struct inode *inode)
+{
+ fscache_invalidate(ceph_inode(inode)->fscache);
+}
+
+static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
+{
+ struct inode* inode = page->mapping->host;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ return fscache_maybe_release_page(ci->fscache, page, gfp);
+}
+
+#else
+
+static inline int ceph_fscache_register(void)
+{
+ return 0;
+}
+
+static inline void ceph_fscache_unregister(void)
+{
+}
+
+static inline int ceph_fscache_register_fs(struct ceph_fs_client* fsc)
+{
+ return 0;
+}
+
+static inline void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc)
+{
+}
+
+static inline void ceph_fscache_inode_init(struct ceph_inode_info *ci)
+{
+}
+
+static inline void ceph_fscache_register_inode_cookie(struct ceph_fs_client* parent_fsc,
+ struct ceph_inode_info* ci)
+{
+}
+
+static inline void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+{
+}
+
+static inline int ceph_readpage_from_fscache(struct inode* inode,
+ struct page *page)
+{
+ return -ENOBUFS;
+}
+
+static inline int ceph_readpages_from_fscache(struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ return -ENOBUFS;
+}
+
+static inline void ceph_readpage_to_fscache(struct inode *inode,
+ struct page *page)
+{
+}
+
+static inline void ceph_fscache_invalidate(struct inode *inode)
+{
+}
+
+static inline void ceph_invalidate_fscache_page(struct inode *inode,
+ struct page *page)
+{
+}
+
+static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
+{
+ return 1;
+}
+
+static inline void ceph_fscache_readpages_cancel(struct inode *inode,
+ struct list_head *pages)
+{
+}
+
+static inline void ceph_queue_revalidate(struct inode *inode)
+{
+}
+
+#endif
+
+#endif
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 5a26bc1..7b451eb 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -10,6 +10,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"
#include <linux/ceph/decode.h>
#include <linux/ceph/messenger.h>

@@ -479,8 +480,9 @@ static void __check_cap_issue(struct ceph_inode_info *ci, struct ceph_cap *cap,
* i_rdcache_gen.
*/
if ((issued & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) &&
- (had & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0)
+ (had & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0) {
ci->i_rdcache_gen++;
+ }

/*
* if we are newly issued FILE_SHARED, mark dir not complete; we
@@ -2395,6 +2397,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
int writeback = 0;
int queue_invalidate = 0;
int deleted_inode = 0;
+ int queue_revalidate = 0;

dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
inode, cap, mds, seq, ceph_cap_string(newcaps));
@@ -2417,6 +2420,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
ci->i_rdcache_revoking = ci->i_rdcache_gen;
}
}
+
+ ceph_fscache_invalidate(inode);
}

/* side effects now are allowed */
@@ -2458,6 +2463,11 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
}
}

+ /* Do we need to revalidate our fscache cookie. Don't bother on the
+ * first cache cap as we already validate at cookie creation time. */
+ if ((issued & CEPH_CAP_FILE_CACHE) && ci->i_rdcache_gen > 1)
+ queue_revalidate = 1;
+
/* size/ctime/mtime/atime? */
ceph_fill_file_size(inode, issued,
le32_to_cpu(grant->truncate_seq),
@@ -2542,6 +2552,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
BUG_ON(cap->issued & ~cap->implemented);

spin_unlock(&ci->i_ceph_lock);
+
if (writeback)
/*
* queue inode for writeback: we can't actually call
@@ -2553,6 +2564,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
ceph_queue_invalidate(inode);
if (deleted_inode)
invalidate_aliases(inode);
+ if (queue_revalidate)
+ ceph_queue_revalidate(inode);
if (wake)
wake_up_all(&ci->i_cap_wq);

@@ -2709,8 +2722,10 @@ static void handle_cap_trunc(struct inode *inode,
truncate_seq, truncate_size, size);
spin_unlock(&ci->i_ceph_lock);

- if (queue_trunc)
+ if (queue_trunc) {
ceph_queue_vmtruncate(inode);
+ ceph_fscache_invalidate(inode);
+ }
}

/*
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 20d0222..3de8982 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -12,6 +12,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"

/*
* Ceph file operations
@@ -69,9 +70,23 @@ static int ceph_init_file(struct inode *inode, struct file *file, int fmode)
{
struct ceph_file_info *cf;
int ret = 0;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;

switch (inode->i_mode & S_IFMT) {
case S_IFREG:
+ /* First file open request creates the cookie, we want to keep
+ * this cookie around for the filetime of the inode as not to
+ * have to worry about fscache register / revoke / operation
+ * races.
+ *
+ * Also, if we know the operation is going to invalidate data
+ * (non readonly) just nuke the cache right away.
+ */
+ ceph_fscache_register_inode_cookie(mdsc->fsc, ci);
+ if ((fmode & CEPH_FILE_MODE_WR))
+ ceph_fscache_invalidate(inode);
case S_IFDIR:
dout("init_file %p %p 0%o (regular)\n", inode, file,
inode->i_mode);
@@ -182,6 +197,7 @@ int ceph_open(struct inode *inode, struct file *file)
spin_unlock(&ci->i_ceph_lock);
return ceph_init_file(inode, file, fmode);
}
+
spin_unlock(&ci->i_ceph_lock);

dout("open fmode %d wants %s\n", fmode, ceph_cap_string(wanted));
@@ -192,6 +208,7 @@ int ceph_open(struct inode *inode, struct file *file)
}
req->r_inode = inode;
ihold(inode);
+
req->r_num_caps = 1;
if (flags & (O_CREAT|O_TRUNC))
parent_inode = ceph_get_dentry_parent_inode(file->f_dentry);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 602ccd8..eae41cd 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -12,6 +12,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"
#include <linux/ceph/decode.h>

/*
@@ -386,6 +387,8 @@ struct inode *ceph_alloc_inode(struct super_block *sb)

INIT_WORK(&ci->i_vmtruncate_work, ceph_vmtruncate_work);

+ ceph_fscache_inode_init(ci);
+
return &ci->vfs_inode;
}

@@ -405,6 +408,8 @@ void ceph_destroy_inode(struct inode *inode)

dout("destroy_inode %p ino %llx.%llx\n", inode, ceph_vinop(inode));

+ ceph_fscache_unregister_inode_cookie(ci);
+
ceph_queue_caps_release(inode);

/*
@@ -439,7 +444,6 @@ void ceph_destroy_inode(struct inode *inode)
call_rcu(&inode->i_rcu, ceph_i_callback);
}

-
/*
* Helpers to fill in size, ctime, mtime, and atime. We have to be
* careful because either the client or MDS may have more up to date
@@ -491,6 +495,10 @@ int ceph_fill_file_size(struct inode *inode, int issued,
truncate_size);
ci->i_truncate_size = truncate_size;
}
+
+ if (queue_trunc)
+ ceph_fscache_invalidate(inode);
+
return queue_trunc;
}

@@ -1079,7 +1087,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req,
* complete.
*/
ceph_set_dentry_offset(req->r_old_dentry);
- dout("dn %p gets new offset %lld\n", req->r_old_dentry,
+ dout("dn %p gets new offset %lld\n", req->r_old_dentry,
ceph_dentry(req->r_old_dentry)->offset);

dn = req->r_old_dentry; /* use old_dentry */
@@ -1494,6 +1502,7 @@ void ceph_queue_vmtruncate(struct inode *inode)
struct ceph_inode_info *ci = ceph_inode(inode);

ihold(inode);
+
if (queue_work(ceph_sb_to_client(inode->i_sb)->trunc_wq,
&ci->i_vmtruncate_work)) {
dout("ceph_queue_vmtruncate %p\n", inode);
@@ -1565,7 +1574,6 @@ retry:
wake_up_all(&ci->i_cap_wq);
}

-
/*
* symlinks
*/
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 6627b26..6a0951e 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -17,6 +17,7 @@

#include "super.h"
#include "mds_client.h"
+#include "cache.h"

#include <linux/ceph/ceph_features.h>
#include <linux/ceph/decode.h>
@@ -142,6 +143,8 @@ enum {
Opt_nodcache,
Opt_ino32,
Opt_noino32,
+ Opt_fscache,
+ Opt_nofscache
};

static match_table_t fsopt_tokens = {
@@ -167,6 +170,8 @@ static match_table_t fsopt_tokens = {
{Opt_nodcache, "nodcache"},
{Opt_ino32, "ino32"},
{Opt_noino32, "noino32"},
+ {Opt_fscache, "fsc"},
+ {Opt_nofscache, "nofsc"},
{-1, NULL}
};

@@ -260,6 +265,12 @@ static int parse_fsopt_token(char *c, void *private)
case Opt_noino32:
fsopt->flags &= ~CEPH_MOUNT_OPT_INO32;
break;
+ case Opt_fscache:
+ fsopt->flags |= CEPH_MOUNT_OPT_FSCACHE;
+ break;
+ case Opt_nofscache:
+ fsopt->flags &= ~CEPH_MOUNT_OPT_FSCACHE;
+ break;
default:
BUG_ON(token);
}
@@ -422,6 +433,10 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
seq_puts(m, ",dcache");
else
seq_puts(m, ",nodcache");
+ if (fsopt->flags & CEPH_MOUNT_OPT_FSCACHE)
+ seq_puts(m, ",fsc");
+ else
+ seq_puts(m, ",nofsc");

if (fsopt->wsize)
seq_printf(m, ",wsize=%d", fsopt->wsize);
@@ -530,11 +545,18 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
if (!fsc->wb_pagevec_pool)
goto fail_trunc_wq;

+ /* setup fscache */
+ if ((fsopt->flags & CEPH_MOUNT_OPT_FSCACHE) &&
+ (ceph_fscache_register_fs(fsc) != 0))
+ goto fail_fscache;
+
/* caps */
fsc->min_caps = fsopt->max_readdir;

return fsc;

+fail_fscache:
+ ceph_fscache_unregister_fs(fsc);
fail_trunc_wq:
destroy_workqueue(fsc->trunc_wq);
fail_pg_inv_wq:
@@ -554,6 +576,8 @@ static void destroy_fs_client(struct ceph_fs_client *fsc)
{
dout("destroy_fs_client %p\n", fsc);

+ ceph_fscache_unregister_fs(fsc);
+
destroy_workqueue(fsc->wb_wq);
destroy_workqueue(fsc->pg_inv_wq);
destroy_workqueue(fsc->trunc_wq);
@@ -588,6 +612,8 @@ static void ceph_inode_init_once(void *foo)

static int __init init_caches(void)
{
+ int error = -ENOMEM;
+
ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
sizeof(struct ceph_inode_info),
__alignof__(struct ceph_inode_info),
@@ -611,15 +637,17 @@ static int __init init_caches(void)
if (ceph_file_cachep == NULL)
goto bad_file;

- return 0;
+ if ((error = ceph_fscache_register()))
+ goto bad_file;

+ return 0;
bad_file:
kmem_cache_destroy(ceph_dentry_cachep);
bad_dentry:
kmem_cache_destroy(ceph_cap_cachep);
bad_cap:
kmem_cache_destroy(ceph_inode_cachep);
- return -ENOMEM;
+ return error;
}

static void destroy_caches(void)
@@ -629,10 +657,13 @@ static void destroy_caches(void)
* destroy cache.
*/
rcu_barrier();
+
kmem_cache_destroy(ceph_inode_cachep);
kmem_cache_destroy(ceph_cap_cachep);
kmem_cache_destroy(ceph_dentry_cachep);
kmem_cache_destroy(ceph_file_cachep);
+
+ ceph_fscache_unregister();
}


diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index f1e4e47..bb23ef6 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -16,6 +16,10 @@

#include <linux/ceph/libceph.h>

+#ifdef CONFIG_CEPH_FSCACHE
+#include <linux/fscache.h>
+#endif
+
/* f_type in struct statfs */
#define CEPH_SUPER_MAGIC 0x00c36400

@@ -29,6 +33,7 @@
#define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */
#define CEPH_MOUNT_OPT_INO32 (1<<8) /* 32 bit inos */
#define CEPH_MOUNT_OPT_DCACHE (1<<9) /* use dcache for readdir etc */
+#define CEPH_MOUNT_OPT_FSCACHE (1<<10) /* use fscache */

#define CEPH_MOUNT_OPT_DEFAULT (CEPH_MOUNT_OPT_RBYTES)

@@ -90,6 +95,11 @@ struct ceph_fs_client {
struct dentry *debugfs_bdi;
struct dentry *debugfs_mdsc, *debugfs_mdsmap;
#endif
+
+#ifdef CONFIG_CEPH_FSCACHE
+ struct fscache_cookie *fscache;
+ struct workqueue_struct *revalidate_wq;
+#endif
};


@@ -320,6 +330,12 @@ struct ceph_inode_info {

struct work_struct i_vmtruncate_work;

+#ifdef CONFIG_CEPH_FSCACHE
+ struct fscache_cookie *fscache;
+ u32 i_fscache_gen; /* sequence, for delayed fscache validate */
+ struct work_struct i_revalidate_work;
+#endif
+
struct inode vfs_inode; /* at end */
};

--
1.7.9.5

2013-09-05 22:22:40

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 5/8] ceph: clean PgPrivate2 on returning from readpages

In some cases the ceph readapages code code bails without filling all the pages
already marked by fscache. When we return back to readahead code this causes
a BUG.

Signed-off-by: Milosz Tanski <[email protected]>
---
fs/ceph/addr.c | 2 ++
fs/ceph/cache.h | 7 +++++++
2 files changed, 9 insertions(+)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 3a21a7c..1fda9cf 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -398,6 +398,8 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
BUG_ON(rc == 0);
}
out:
+ ceph_fscache_readpages_cancel(inode, page_list);
+
dout("readpages %p file %p ret %d\n", inode, file, rc);
return rc;
}
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index 0ea95cb..fb326fd 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -58,6 +58,13 @@ static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
return fscache_maybe_release_page(ci->fscache, page, gfp);
}

+static inline void ceph_fscache_readpages_cancel(struct inode *inode,
+ struct list_head *pages)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ return fscache_readpages_cancel(ci->fscache, pages);
+}
+
#else

static inline int ceph_fscache_register(void)
--
1.7.9.5

2013-09-05 22:22:54

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 6/8] ceph: ceph_readpage_to_fscache didn't check if marked

Previously ceph_readpage_to_fscache did not call if page was marked as cached
before calling fscache_write_page resulting in a BUG inside of fscache.

FS-Cache: Assertion failed
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:874!
invalid opcode: 0000 [#1] SMP
Call Trace:
[<ffffffffa02e6566>] __ceph_readpage_to_fscache+0x66/0x80 [ceph]
[<ffffffffa02caf84>] readpage_nounlock+0x124/0x210 [ceph]
[<ffffffffa02cb08d>] ceph_readpage+0x1d/0x40 [ceph]
[<ffffffff81126db6>] generic_file_aio_read+0x1f6/0x700
[<ffffffffa02c6fcc>] ceph_aio_read+0x5fc/0xab0 [ceph]

Signed-off-by: Milosz Tanski <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
---
fs/ceph/cache.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 5c413ec..c737ae9 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -311,6 +311,9 @@ void ceph_readpage_to_fscache(struct inode *inode, struct page *page)
struct ceph_inode_info *ci = ceph_inode(inode);
int ret;

+ if (!PageFsCache(page))
+ return;
+
if (!cache_valid(ci))
return;

--
1.7.9.5

2013-09-05 22:23:09

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 7/8] ceph: page still marked private_2

Previous patch that allowed us to cleanup most of the issues with pages marked
as private_2 when calling ceph_readpages. However, there seams to be a case in
the error case clean up in start read that still trigers this from time to
time. I've only seen this one a couple times.

BUG: Bad page state in process petabucket pfn:335b82
page:ffffea000cd6e080 count:0 mapcount:0 mapping: (null) index:0x0
page flags: 0x200000000001000(private_2)
Call Trace:
[<ffffffff81563442>] dump_stack+0x46/0x58
[<ffffffff8112c7f7>] bad_page+0xc7/0x120
[<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
[<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
[<ffffffff81132427>] __put_single_page+0x27/0x30
[<ffffffff81132d95>] put_page+0x25/0x40
[<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
[<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260

Signed-off-by: Milosz Tanski <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
---
fs/ceph/addr.c | 1 +
fs/ceph/cache.h | 14 +++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 1fda9cf..6df8bd4 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -336,6 +336,7 @@ static int start_read(struct inode *inode, struct list_head *page_list, int max)
page->index);
if (add_to_page_cache_lru(page, &inode->i_data, page->index,
GFP_NOFS)) {
+ ceph_fscache_uncache_page(inode, page);
page_cache_release(page);
dout("start_read %p add_to_page_cache failed %p\n",
inode, page);
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index fb326fd..bf48695 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -51,6 +51,13 @@ static inline void ceph_fscache_invalidate(struct inode *inode)
fscache_invalidate(ceph_inode(inode)->fscache);
}

+static inline void ceph_fscache_uncache_page(struct inode *inode,
+ struct page *page)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ return fscache_uncache_page(ci->fscache, page);
+}
+
static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
{
struct inode* inode = page->mapping->host;
@@ -94,7 +101,8 @@ static inline void ceph_fscache_register_inode_cookie(struct ceph_fs_client* par
{
}

-static inline void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+static inline void ceph_fscache_uncache_page(struct inode *inode,
+ struct page *pages)
{
}

@@ -126,6 +134,10 @@ static inline void ceph_invalidate_fscache_page(struct inode *inode,
{
}

+static inline void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+{
+}
+
static inline int ceph_release_fscache_page(struct page *page, gfp_t gfp)
{
return 1;
--
1.7.9.5

2013-09-05 22:23:23

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH 8/8] ceph: Do not do invalidate if the filesystem is mounted nofsc

Previously we would always try to enqueue work even if the filesystem is not
mounted with fscache enabled (or the file has no cookie). In the case of the
filesystem mouned nofsc (but with fscache compiled in) this would lead to a
crash.

Signed-off-by: Milosz Tanski <[email protected]>
---
fs/ceph/cache.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index c737ae9..d3b88c7 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -374,8 +374,12 @@ out:

void ceph_queue_revalidate(struct inode *inode)
{
+ struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
struct ceph_inode_info *ci = ceph_inode(inode);

+ if (fsc->revalidate_wq == NULL || ci->fscache == NULL)
+ return;
+
ihold(inode);

if (queue_work(ceph_sb_to_client(inode->i_sb)->revalidate_wq,
--
1.7.9.5

2013-09-05 23:00:10

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

On Thu, 5 Sep 2013, Milosz Tanski wrote:
> Hey gang I think this should be final revision of these changes. The changes
> are:
>
> * David rewrote the cookie validity check (that originally was written by
> Hongyi Jia). You might have seen some emails flying about doing it the
> right way.
> * I added crash fix when for Ceph filesystems mounted with nofsc (default)
> when fscache is compiled into Ceph. Previously it would crash trying to
> enqueue invalidate checks in the work queue because we didn't initialize
> if the mount had fscache disabled.
>
> I've tested both changes on my cluster. You can get get these changes from my
> branch in bitbucket. It contains the upstream wip-fscache branch rebased with
> David's rewrite of Hongyi Jia's changes.
>
> The branch is located at.
>
> https://bitbucket.org/adfin/linux-fs.git in the wip-fscahce branch
>
> Finally, David requested that this patchset go through the Ceph tree. The tree
> should have all the proper sign off from David. I also CC'ed him so he can give
> his final okay.
>
> Best,
> - Milosz

I've pulled this into ceph-client.git master. If this looks good to you,
David, I'll send it all to Linus (along with the current set of RBD fixes,
once they are ready).

Thanks!
sage


>
> David Howells (2):
> FS-Cache: Add interface to check consistency of a cached object
> CacheFiles: Implement interface to check cache consistency
>
> Milosz Tanski (6):
> fscache: Netfs function for cleanup post readpages
> ceph: use fscache as a local presisent cache
> ceph: clean PgPrivate2 on returning from readpages
> ceph: ceph_readpage_to_fscache didn't check if marked
> ceph: page still marked private_2
> ceph: Do not do invalidate if the filesystem is mounted nofsc
>
> Documentation/filesystems/caching/backend-api.txt | 9 +
> Documentation/filesystems/caching/netfs-api.txt | 35 +-
> fs/cachefiles/interface.c | 26 ++
> fs/cachefiles/internal.h | 1 +
> fs/cachefiles/xattr.c | 36 ++
> fs/ceph/Kconfig | 9 +
> fs/ceph/Makefile | 1 +
> fs/ceph/addr.c | 40 ++-
> fs/ceph/cache.c | 400 +++++++++++++++++++++
> fs/ceph/cache.h | 157 ++++++++
> fs/ceph/caps.c | 19 +-
> fs/ceph/file.c | 17 +
> fs/ceph/inode.c | 14 +-
> fs/ceph/super.c | 35 +-
> fs/ceph/super.h | 16 +
> fs/fscache/cookie.c | 69 ++++
> fs/fscache/internal.h | 6 +
> fs/fscache/page.c | 71 ++--
> include/linux/fscache-cache.h | 4 +
> include/linux/fscache.h | 42 +++
> 20 files changed, 965 insertions(+), 42 deletions(-)
> create mode 100644 fs/ceph/cache.c
> create mode 100644 fs/ceph/cache.h
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-09-06 04:41:51

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

David,

After running this for a day on some loaded machines I ran into what
looks like an old issue with the new code. I remember you saw an issue
that manifested it self in a similar way a while back.

[13837253.462779] FS-Cache: Assertion failed
[13837253.462782] 3 == 5 is false
[13837253.462807] ------------[ cut here ]------------
[13837253.462811] kernel BUG at fs/fscache/operation.c:414!
[13837253.462815] invalid opcode: 0000 [#1] SMP
[13837253.462820] Modules linked in: cachefiles microcode auth_rpcgss
oid_registry nfsv4 nfs lockd ceph sunrpc libceph fscache raid10
raid456 async_pq async_xor async_memcpy async_raid6_recov async_tx
raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor
zlib_deflate libcrc32c
[13837253.462851] CPU: 1 PID: 1848 Comm: kworker/1:2 Not tainted
3.11.0-rc5-virtual #55
[13837253.462870] Workqueue: ceph-revalidate ceph_revalidate_work [ceph]
[13837253.462875] task: ffff8804251f16f0 ti: ffff8804047fa000 task.ti:
ffff8804047fa000
[13837253.462879] RIP: e030:[<ffffffffa0171bad>] [<ffffffffa0171bad>]
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.462893] RSP: e02b:ffff8804047fbd58 EFLAGS: 00010296
[13837253.462896] RAX: 000000000000000f RBX: ffff880424049d80 RCX:
0000000000000006
[13837253.462901] RDX: 0000000000000007 RSI: 0000000000000007 RDI:
ffff8804047f0218
[13837253.462906] RBP: ffff8804047fbd68 R08: 0000000000000000 R09:
0000000000000000
[13837253.462910] R10: 0000000000000108 R11: 0000000000000107 R12:
ffff8804251cf928
[13837253.462915] R13: ffff8804253c7370 R14: 0000000000000000 R15:
0000000000000000
[13837253.462923] FS: 00007f5c56e43700(0000)
GS:ffff880443500000(0000) knlGS:0000000000000000
[13837253.462928] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[13837253.462932] CR2: 00007fc08b7ee000 CR3: 00000004259a4000 CR4:
0000000000002660
[13837253.462936] Stack:
[13837253.462939] ffff880424049d80 ffff8804251cf928 ffff8804047fbda8
ffffffffa016def1
[13837253.462946] ffff88042b462b20 ffff88040701c750 ffff88040701c730
ffff88040701c3f0
[13837253.462953] 0000000000000003 0000000000000000 ffff8804047fbde8
ffffffffa025ba3f
[13837253.462959] Call Trace:
[13837253.462966] [<ffffffffa016def1>]
__fscache_check_consistency+0x1a1/0x2c0 [fscache]
[13837253.462977] [<ffffffffa025ba3f>] ceph_revalidate_work+0x8f/0x120 [ceph]
[13837253.462987] [<ffffffff8107aa59>] process_one_work+0x179/0x490
[13837253.462992] [<ffffffff8107bf5b>] worker_thread+0x11b/0x370
[13837253.462998] [<ffffffff8107be40>] ? manage_workers.isra.21+0x2e0/0x2e0
[13837253.463004] [<ffffffff81082a80>] kthread+0xc0/0xd0
[13837253.463011] [<ffffffff81010000>] ? perf_trace_xen_mmu_pmd_clear+0x50/0xc0
[13837253.463017] [<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
[13837253.463024] [<ffffffff8157262c>] ret_from_fork+0x7c/0xb0
[13837253.463029] [<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
[13837253.463033] Code: 31 c0 e8 5d e6 3e e1 48 c7 c7 04 8e 17 a0 31
c0 e8 4f e6 3e e1 8b 73 40 ba 05 00 00 00 48 c7 c7 62 8e 17 a0 31 c0
e8 39 e6 3e e1 <0f> 0b 65 48 8b 34 25 80 c7 00 00 48 c7 c7 4f 8e 17 a0
48 81 c6
[13837253.463071] RIP [<ffffffffa0171bad>]
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.463079] RSP <ffff8804047fbd58>
[13837253.463085] ---[ end trace 2972d68e8efd961e ]---
[13837253.463130] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[13837253.463136] IP: [<ffffffff81082d71>] kthread_data+0x11/0x20
[13837253.463142] PGD 1a0f067 PUD 1a11067 PMD 0
[13837253.463146] Oops: 0000 [#2] SMP
[13837253.463150] Modules linked in: cachefiles microcode auth_rpcgss
oid_registry nfsv4 nfs lockd ceph sunrpc libceph fscache raid10
raid456 async_pq async_xor async_memcpy async_raid6_recov async_tx
raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor
zlib_deflate libcrc32c
[13837253.463176] CPU: 1 PID: 1848 Comm: kworker/1:2 Tainted: G D
3.11.0-rc5-virtual #55
[13837253.463190] task: ffff8804251f16f0 ti: ffff8804047fa000 task.ti:
ffff8804047fa000
[13837253.463194] RIP: e030:[<ffffffff81082d71>] [<ffffffff81082d71>]
kthread_data+0x11/0x20
[13837253.463201] RSP: e02b:ffff8804047fba00 EFLAGS: 00010046
[13837253.463204] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffffff81c30d00
[13837253.463209] RDX: 0000000000000001 RSI: 0000000000000001 RDI:
ffff8804251f16f0
[13837253.463213] RBP: ffff8804047fba18 R08: 0000000027bf1216 R09:
0000000000000000
[13837253.463217] R10: ffff88044360cec0 R11: 000000000000000e R12:
0000000000000001
[13837253.463222] R13: ffff8804251f1ac8 R14: ffff88042c498000 R15:
0000000000000000
[13837253.463228] FS: 00007f5c56e43700(0000)
GS:ffff880443500000(0000) knlGS:0000000000000000
[13837253.463233] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[13837253.463237] CR2: 0000000000000028 CR3: 00000004259a4000 CR4:
0000000000002660
[13837253.463241] Stack:
[13837253.463243] ffffffff8107c3d6 ffff880443513fc0 0000000000000001
ffff8804047fba98
[13837253.463249] ffffffff81568308 0000000000000003 ffff8804251f1ce8
ffff8804251f16f0
[13837253.463255] ffff8804047fbfd8 ffff8804047fbfd8 ffff8804047fbfd8
ffff8804047fba78
[13837253.463261] Call Trace:
[13837253.463265] [<ffffffff8107c3d6>] ? wq_worker_sleeping+0x16/0x90
[13837253.463272] [<ffffffff81568308>] __schedule+0x5c8/0x820
[13837253.463276] [<ffffffff81568619>] schedule+0x29/0x70
[13837253.662186] [<ffffffff81062600>] do_exit+0x6e0/0xa60
[13837253.662193] [<ffffffff81560233>] ? printk+0x4d/0x4f
[13837253.662199] [<ffffffff8156adc0>] oops_end+0xb0/0xf0
[13837253.662206] [<ffffffff81016b28>] die+0x58/0x90
[13837253.662211] [<ffffffff8156a6db>] do_trap+0xcb/0x170
[13837253.662217] [<ffffffff8100b432>] ? check_events+0x12/0x20
[13837253.662222] [<ffffffff81013f45>] do_invalid_op+0x95/0xb0
[13837253.662231] [<ffffffffa0171bad>] ?
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.662239] [<ffffffff810a92b4>] ? wake_up_klogd+0x34/0x40
[13837253.662244] [<ffffffff810a9545>] ? console_unlock+0x285/0x3c0
[13837253.662249] [<ffffffff810a9acd>] ? vprintk_emit+0x1cd/0x490
[13837253.662255] [<ffffffff81573d9e>] invalid_op+0x1e/0x30
[13837253.662261] [<ffffffffa0171bad>] ?
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.662269] [<ffffffffa0171bad>] ?
fscache_put_operation+0x2ad/0x330 [fscache]
[13837253.662276] [<ffffffffa016def1>]
__fscache_check_consistency+0x1a1/0x2c0 [fscache]
[13837253.662288] [<ffffffffa025ba3f>] ceph_revalidate_work+0x8f/0x120 [ceph]
[13837253.662294] [<ffffffff8107aa59>] process_one_work+0x179/0x490
[13837253.662300] [<ffffffff8107bf5b>] worker_thread+0x11b/0x370
[13837253.662305] [<ffffffff8107be40>] ? manage_workers.isra.21+0x2e0/0x2e0
[13837253.662310] [<ffffffff81082a80>] kthread+0xc0/0xd0
[13837253.662314] [<ffffffff81010000>] ? perf_trace_xen_mmu_pmd_clear+0x50/0xc0
[13837253.662320] [<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
[13837253.662325] [<ffffffff8157262c>] ret_from_fork+0x7c/0xb0
[13837253.662330] [<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
[13837253.662333] Code: 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01
c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 80 03 00 00
55 48 89 e5 5d <48> 8b 40 d8 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
66 90 55
[13837253.662373] RIP [<ffffffff81082d71>] kthread_data+0x11/0x20
[13837253.662378] RSP <ffff8804047fba00>
[13837253.662381] CR2: ffffffffffffffd8
[13837253.662385] ---[ end trace 2972d68e8efd961f ]---
[13837253.662389] Fixing recursive fault but reboot is needed!


- Milosz

On Thu, Sep 5, 2013 at 7:00 PM, Sage Weil <[email protected]> wrote:
> On Thu, 5 Sep 2013, Milosz Tanski wrote:
>> Hey gang I think this should be final revision of these changes. The changes
>> are:
>>
>> * David rewrote the cookie validity check (that originally was written by
>> Hongyi Jia). You might have seen some emails flying about doing it the
>> right way.
>> * I added crash fix when for Ceph filesystems mounted with nofsc (default)
>> when fscache is compiled into Ceph. Previously it would crash trying to
>> enqueue invalidate checks in the work queue because we didn't initialize
>> if the mount had fscache disabled.
>>
>> I've tested both changes on my cluster. You can get get these changes from my
>> branch in bitbucket. It contains the upstream wip-fscache branch rebased with
>> David's rewrite of Hongyi Jia's changes.
>>
>> The branch is located at.
>>
>> https://bitbucket.org/adfin/linux-fs.git in the wip-fscahce branch
>>
>> Finally, David requested that this patchset go through the Ceph tree. The tree
>> should have all the proper sign off from David. I also CC'ed him so he can give
>> his final okay.
>>
>> Best,
>> - Milosz
>
> I've pulled this into ceph-client.git master. If this looks good to you,
> David, I'll send it all to Linus (along with the current set of RBD fixes,
> once they are ready).
>
> Thanks!
> sage
>
>
>>
>> David Howells (2):
>> FS-Cache: Add interface to check consistency of a cached object
>> CacheFiles: Implement interface to check cache consistency
>>
>> Milosz Tanski (6):
>> fscache: Netfs function for cleanup post readpages
>> ceph: use fscache as a local presisent cache
>> ceph: clean PgPrivate2 on returning from readpages
>> ceph: ceph_readpage_to_fscache didn't check if marked
>> ceph: page still marked private_2
>> ceph: Do not do invalidate if the filesystem is mounted nofsc
>>
>> Documentation/filesystems/caching/backend-api.txt | 9 +
>> Documentation/filesystems/caching/netfs-api.txt | 35 +-
>> fs/cachefiles/interface.c | 26 ++
>> fs/cachefiles/internal.h | 1 +
>> fs/cachefiles/xattr.c | 36 ++
>> fs/ceph/Kconfig | 9 +
>> fs/ceph/Makefile | 1 +
>> fs/ceph/addr.c | 40 ++-
>> fs/ceph/cache.c | 400 +++++++++++++++++++++
>> fs/ceph/cache.h | 157 ++++++++
>> fs/ceph/caps.c | 19 +-
>> fs/ceph/file.c | 17 +
>> fs/ceph/inode.c | 14 +-
>> fs/ceph/super.c | 35 +-
>> fs/ceph/super.h | 16 +
>> fs/fscache/cookie.c | 69 ++++
>> fs/fscache/internal.h | 6 +
>> fs/fscache/page.c | 71 ++--
>> include/linux/fscache-cache.h | 4 +
>> include/linux/fscache.h | 42 +++
>> 20 files changed, 965 insertions(+), 42 deletions(-)
>> create mode 100644 fs/ceph/cache.c
>> create mode 100644 fs/ceph/cache.h
>>
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>

2013-09-06 15:59:18

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

Milosz Tanski <[email protected]> wrote:

> After running this for a day on some loaded machines I ran into what
> looks like an old issue with the new code. I remember you saw an issue
> that manifested it self in a similar way a while back.
>
> [13837253.462779] FS-Cache: Assertion failed
> [13837253.462782] 3 == 5 is false
> [13837253.462807] ------------[ cut here ]------------
> [13837253.462811] kernel BUG at fs/fscache/operation.c:414!

Bah.

I forgot to call fscache_op_complete(). Patch updated and repushed.

Btw, I've reordered the patches to put the CIFS patch last. Can you merge the
patches prior to the CIFS commit from my branch rather than cherry picking
them so that if they go via two different routes, GIT will handle the merge
correctly? I've stuck a tag on it (fscache-fixes-for-ceph) to make that
easier for you.

I've also asked another RH engineer to try doing some basic testing on the
CIFS stuff - which may validate the fscache_readpages_cancel patch.

David

2013-09-06 19:02:49

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

Sage,

I've taken David's latest changes and per his request merged his
'fscache-fixes-for-ceph' tag then applied my changes on top of that.
In addition to the pervious changes I also added a fix for the
warnings the linux-next build bot found.

I've given the results a quick test to make sure it builds, boots and
runs okay. The code is located in my repository:

https://[email protected]/adfin/linux-fs.git in the wip-fscache-v2 branch

I hope that this is the final go for now and thanks for everyone's patience.

- Milosz

On Fri, Sep 6, 2013 at 11:59 AM, David Howells <[email protected]> wrote:
> Milosz Tanski <[email protected]> wrote:
>
>> After running this for a day on some loaded machines I ran into what
>> looks like an old issue with the new code. I remember you saw an issue
>> that manifested it self in a similar way a while back.
>>
>> [13837253.462779] FS-Cache: Assertion failed
>> [13837253.462782] 3 == 5 is false
>> [13837253.462807] ------------[ cut here ]------------
>> [13837253.462811] kernel BUG at fs/fscache/operation.c:414!
>
> Bah.
>
> I forgot to call fscache_op_complete(). Patch updated and repushed.
>
> Btw, I've reordered the patches to put the CIFS patch last. Can you merge the
> patches prior to the CIFS commit from my branch rather than cherry picking
> them so that if they go via two different routes, GIT will handle the merge
> correctly? I've stuck a tag on it (fscache-fixes-for-ceph) to make that
> easier for you.
>
> I've also asked another RH engineer to try doing some basic testing on the
> CIFS stuff - which may validate the fscache_readpages_cancel patch.
>
> David

2013-09-06 20:03:06

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

On Fri, 6 Sep 2013, Milosz Tanski wrote:
> Sage,
>
> I've taken David's latest changes and per his request merged his
> 'fscache-fixes-for-ceph' tag then applied my changes on top of that.
> In addition to the pervious changes I also added a fix for the
> warnings the linux-next build bot found.
>
> I've given the results a quick test to make sure it builds, boots and
> runs okay. The code is located in my repository:
>
> https://[email protected]/adfin/linux-fs.git in the wip-fscache-v2 branch
>
> I hope that this is the final go for now and thanks for everyone's patience.

Looks good; I'll send this to Linus along with the other ceph patches
shortly.

Thanks, everyone!
sage


>
> - Milosz
>
> On Fri, Sep 6, 2013 at 11:59 AM, David Howells <[email protected]> wrote:
> > Milosz Tanski <[email protected]> wrote:
> >
> >> After running this for a day on some loaded machines I ran into what
> >> looks like an old issue with the new code. I remember you saw an issue
> >> that manifested it self in a similar way a while back.
> >>
> >> [13837253.462779] FS-Cache: Assertion failed
> >> [13837253.462782] 3 == 5 is false
> >> [13837253.462807] ------------[ cut here ]------------
> >> [13837253.462811] kernel BUG at fs/fscache/operation.c:414!
> >
> > Bah.
> >
> > I forgot to call fscache_op_complete(). Patch updated and repushed.
> >
> > Btw, I've reordered the patches to put the CIFS patch last. Can you merge the
> > patches prior to the CIFS commit from my branch rather than cherry picking
> > them so that if they go via two different routes, GIT will handle the merge
> > correctly? I've stuck a tag on it (fscache-fixes-for-ceph) to make that
> > easier for you.
> >
> > I've also asked another RH engineer to try doing some basic testing on the
> > CIFS stuff - which may validate the fscache_readpages_cancel patch.
> >
> > David
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-09-08 03:07:44

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

David,

I ran into another issue that caused one my machines to hang on a
bunch of tasks and then hard lock. Here's the backtrace of the hang:

INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
Workqueue: ceph-msgr con_work [libceph]
ffff88042b093868 0000000000000246 ffff88042b8e5bc0 ffffffff81569fc6
ffff88042c51dbc0 ffff88042b093fd8 ffff88042b093fd8 ffff88042b093fd8
ffff88042c518000 ffff88042c51dbc0 ffff8804266b8d10 ffff8804439d7188
Call Trace:
[<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
[<ffffffff81568d09>] schedule+0x29/0x70
[<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
[<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
[<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
[<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
[<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
[<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
[<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
[<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
[<ffffffff8119bb69>] evict+0x119/0x1b0
[<ffffffff8119c3f3>] iput+0x103/0x190
[<ffffffffa009aaed>] iterate_session_caps+0x7d/0x240 [ceph]
[<ffffffffa009b170>] ? remove_session_caps_cb+0x270/0x270 [ceph]
[<ffffffffa00a1fc5>] dispatch+0x725/0x1b40 [ceph]
[<ffffffff81459466>] ? kernel_recvmsg+0x46/0x60
[<ffffffffa002c0e8>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
[<ffffffffa002ecbe>] try_read+0xc1e/0x1e70 [libceph]
[<ffffffffa0030015>] con_work+0x105/0x1920 [libceph]
[<ffffffff8100349e>] ? xen_end_context_switch+0x1e/0x30
[<ffffffff8108dbca>] ? finish_task_switch+0x5a/0xc0
[<ffffffff8107aa59>] process_one_work+0x179/0x490
[<ffffffff8107bf5b>] worker_thread+0x11b/0x370
[<ffffffff8107be40>] ? manage_workers.isra.21+0x2e0/0x2e0
[<ffffffff81082a80>] kthread+0xc0/0xd0
[<ffffffff81010000>] ? perf_trace_xen_mmu_set_pud+0xd0/0xd0
[<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff81572cec>] ret_from_fork+0x7c/0xb0
[<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0

It looks like it's waiting for the the cookie's n_active to drop down
to 0 ... but it isn't. After spending a bunch of hours reading the
code, then having a some beers (it is Saturday night after all), then
looking at code again... I think that the
__fscache_check_consistency() function increments the n_active counter
but never lowers it. I think the solution to this is the bellow diff
but I'm not a 100% sure. Can you let me know if I'm on the right
track... of it's beer googles?

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 318e843..b2a86e3 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -586,7 +586,8 @@ int __fscache_check_consistency(struct
fscache_cookie *cookie)

fscache_operation_init(op, NULL, NULL);
op->flags = FSCACHE_OP_MYTHREAD |
- (1 << FSCACHE_OP_WAITING);
+ (1 << FSCACHE_OP_WAITING) |
+ (1 << FSCACHE_OP_UNUSE_COOKIE);

spin_lock(&cookie->lock);

Thanks,
- Milosz

On Fri, Sep 6, 2013 at 4:03 PM, Sage Weil <[email protected]> wrote:
> On Fri, 6 Sep 2013, Milosz Tanski wrote:
>> Sage,
>>
>> I've taken David's latest changes and per his request merged his
>> 'fscache-fixes-for-ceph' tag then applied my changes on top of that.
>> In addition to the pervious changes I also added a fix for the
>> warnings the linux-next build bot found.
>>
>> I've given the results a quick test to make sure it builds, boots and
>> runs okay. The code is located in my repository:
>>
>> https://[email protected]/adfin/linux-fs.git in the wip-fscache-v2 branch
>>
>> I hope that this is the final go for now and thanks for everyone's patience.
>
> Looks good; I'll send this to Linus along with the other ceph patches
> shortly.
>
> Thanks, everyone!
> sage
>
>
>>
>> - Milosz
>>
>> On Fri, Sep 6, 2013 at 11:59 AM, David Howells <[email protected]> wrote:
>> > Milosz Tanski <[email protected]> wrote:
>> >
>> >> After running this for a day on some loaded machines I ran into what
>> >> looks like an old issue with the new code. I remember you saw an issue
>> >> that manifested it self in a similar way a while back.
>> >>
>> >> [13837253.462779] FS-Cache: Assertion failed
>> >> [13837253.462782] 3 == 5 is false
>> >> [13837253.462807] ------------[ cut here ]------------
>> >> [13837253.462811] kernel BUG at fs/fscache/operation.c:414!
>> >
>> > Bah.
>> >
>> > I forgot to call fscache_op_complete(). Patch updated and repushed.
>> >
>> > Btw, I've reordered the patches to put the CIFS patch last. Can you merge the
>> > patches prior to the CIFS commit from my branch rather than cherry picking
>> > them so that if they go via two different routes, GIT will handle the merge
>> > correctly? I've stuck a tag on it (fscache-fixes-for-ceph) to make that
>> > easier for you.
>> >
>> > I've also asked another RH engineer to try doing some basic testing on the
>> > CIFS stuff - which may validate the fscache_readpages_cancel patch.
>> >
>> > David
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>

2013-09-08 21:21:41

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

David,

I think that change does the trick. I had it running on the same
machine for 5 hours and had the kernel forcefully drop some of the
inodes in the cache (via drop caches) without a crash. I'll send a
proper patch email after you take a look and make sure I did the right
thing.

Thanks,
- Milosz

On Sat, Sep 7, 2013 at 11:07 PM, Milosz Tanski <[email protected]> wrote:
> David,
>
> I ran into another issue that caused one my machines to hang on a
> bunch of tasks and then hard lock. Here's the backtrace of the hang:
>
> INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
> Workqueue: ceph-msgr con_work [libceph]
> ffff88042b093868 0000000000000246 ffff88042b8e5bc0 ffffffff81569fc6
> ffff88042c51dbc0 ffff88042b093fd8 ffff88042b093fd8 ffff88042b093fd8
> ffff88042c518000 ffff88042c51dbc0 ffff8804266b8d10 ffff8804439d7188
> Call Trace:
> [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
> [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
> [<ffffffff81568d09>] schedule+0x29/0x70
> [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
> [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
> [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
> [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
> [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
> [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
> [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
> [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
> [<ffffffff8119bb69>] evict+0x119/0x1b0
> [<ffffffff8119c3f3>] iput+0x103/0x190
> [<ffffffffa009aaed>] iterate_session_caps+0x7d/0x240 [ceph]
> [<ffffffffa009b170>] ? remove_session_caps_cb+0x270/0x270 [ceph]
> [<ffffffffa00a1fc5>] dispatch+0x725/0x1b40 [ceph]
> [<ffffffff81459466>] ? kernel_recvmsg+0x46/0x60
> [<ffffffffa002c0e8>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
> [<ffffffffa002ecbe>] try_read+0xc1e/0x1e70 [libceph]
> [<ffffffffa0030015>] con_work+0x105/0x1920 [libceph]
> [<ffffffff8100349e>] ? xen_end_context_switch+0x1e/0x30
> [<ffffffff8108dbca>] ? finish_task_switch+0x5a/0xc0
> [<ffffffff8107aa59>] process_one_work+0x179/0x490
> [<ffffffff8107bf5b>] worker_thread+0x11b/0x370
> [<ffffffff8107be40>] ? manage_workers.isra.21+0x2e0/0x2e0
> [<ffffffff81082a80>] kthread+0xc0/0xd0
> [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pud+0xd0/0xd0
> [<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
> [<ffffffff81572cec>] ret_from_fork+0x7c/0xb0
> [<ffffffff810829c0>] ? flush_kthread_worker+0xb0/0xb0
>
> It looks like it's waiting for the the cookie's n_active to drop down
> to 0 ... but it isn't. After spending a bunch of hours reading the
> code, then having a some beers (it is Saturday night after all), then
> looking at code again... I think that the
> __fscache_check_consistency() function increments the n_active counter
> but never lowers it. I think the solution to this is the bellow diff
> but I'm not a 100% sure. Can you let me know if I'm on the right
> track... of it's beer googles?
>
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
> index 318e843..b2a86e3 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -586,7 +586,8 @@ int __fscache_check_consistency(struct
> fscache_cookie *cookie)
>
> fscache_operation_init(op, NULL, NULL);
> op->flags = FSCACHE_OP_MYTHREAD |
> - (1 << FSCACHE_OP_WAITING);
> + (1 << FSCACHE_OP_WAITING) |
> + (1 << FSCACHE_OP_UNUSE_COOKIE);
>
> spin_lock(&cookie->lock);
>
> Thanks,
> - Milosz
>
> On Fri, Sep 6, 2013 at 4:03 PM, Sage Weil <[email protected]> wrote:
>> On Fri, 6 Sep 2013, Milosz Tanski wrote:
>>> Sage,
>>>
>>> I've taken David's latest changes and per his request merged his
>>> 'fscache-fixes-for-ceph' tag then applied my changes on top of that.
>>> In addition to the pervious changes I also added a fix for the
>>> warnings the linux-next build bot found.
>>>
>>> I've given the results a quick test to make sure it builds, boots and
>>> runs okay. The code is located in my repository:
>>>
>>> https://[email protected]/adfin/linux-fs.git in the wip-fscache-v2 branch
>>>
>>> I hope that this is the final go for now and thanks for everyone's patience.
>>
>> Looks good; I'll send this to Linus along with the other ceph patches
>> shortly.
>>
>> Thanks, everyone!
>> sage
>>
>>
>>>
>>> - Milosz
>>>
>>> On Fri, Sep 6, 2013 at 11:59 AM, David Howells <[email protected]> wrote:
>>> > Milosz Tanski <[email protected]> wrote:
>>> >
>>> >> After running this for a day on some loaded machines I ran into what
>>> >> looks like an old issue with the new code. I remember you saw an issue
>>> >> that manifested it self in a similar way a while back.
>>> >>
>>> >> [13837253.462779] FS-Cache: Assertion failed
>>> >> [13837253.462782] 3 == 5 is false
>>> >> [13837253.462807] ------------[ cut here ]------------
>>> >> [13837253.462811] kernel BUG at fs/fscache/operation.c:414!
>>> >
>>> > Bah.
>>> >
>>> > I forgot to call fscache_op_complete(). Patch updated and repushed.
>>> >
>>> > Btw, I've reordered the patches to put the CIFS patch last. Can you merge the
>>> > patches prior to the CIFS commit from my branch rather than cherry picking
>>> > them so that if they go via two different routes, GIT will handle the merge
>>> > correctly? I've stuck a tag on it (fscache-fixes-for-ceph) to make that
>>> > easier for you.
>>> >
>>> > I've also asked another RH engineer to try doing some basic testing on the
>>> > CIFS stuff - which may validate the fscache_readpages_cancel patch.
>>> >
>>> > David
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>

2013-09-09 10:18:08

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

Milosz Tanski <[email protected]> wrote:

> - (1 << FSCACHE_OP_WAITING);
> + (1 << FSCACHE_OP_WAITING) |
> + (1 << FSCACHE_OP_UNUSE_COOKIE);

Yeah... That'll do it. We could just decrement n_active directly after
calling into the backend - after all, we cannot reduce n_active to 0 here
because the netfs cannot be calling fscache_relinquish_cookie() whilst also
calling this function - but this will do too.

David

2013-09-09 10:19:00

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

Milosz Tanski <[email protected]> wrote:

> I think that change does the trick. I had it running on the same
> machine for 5 hours and had the kernel forcefully drop some of the
> inodes in the cache (via drop caches) without a crash. I'll send a
> proper patch email after you take a look and make sure I did the right
> thing.

Do you mind if I roll your change directly into my patch and reissue the set?
Or would you rather have an extra patch at this time?

David

2013-09-09 14:53:18

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

David,

I guess that's really a better question for Sage. He sent my branch
(which includes your changes) plus a whole slew of things over to
Linus. I'm going guess that a small follow on patch is simplest but
I'll let him comment.

Here's the original pull request:
http://marc.info/?l=linux-kernel&m=137849853203101&w=2

Also, so far after making this change everything is peachy and theres
no other regressions.

P.S: This is a resend because I did no hit reply to ALL, sorry for the
spam David.

On Mon, Sep 9, 2013 at 6:18 AM, David Howells <[email protected]> wrote:
> Milosz Tanski <[email protected]> wrote:
>
>> I think that change does the trick. I had it running on the same
>> machine for 5 hours and had the kernel forcefully drop some of the
>> inodes in the cache (via drop caches) without a crash. I'll send a
>> proper patch email after you take a look and make sure I did the right
>> thing.
>
> Do you mind if I roll your change directly into my patch and reissue the set?
> Or would you rather have an extra patch at this time?
>
> David

2013-09-09 17:44:56

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH 0/8] ceph: fscache support & upstream changes

On Mon, 9 Sep 2013, Milosz Tanski wrote:
> David,
>
> I guess that's really a better question for Sage. He sent my branch
> (which includes your changes) plus a whole slew of things over to
> Linus. I'm going guess that a small follow on patch is simplest but
> I'll let him comment.
>
> Here's the original pull request:
> http://marc.info/?l=linux-kernel&m=137849853203101&w=2

...and Linus just merged it a few minutes ago. This'll have to be a
separate patch. Sorry!

I have another pile of Ceph fixes that I will be sending in a week or so;
let me know if you want me include the fix there.

Thanks!
sage

>
> Also, so far after making this change everything is peachy and theres
> no other regressions.
>
> P.S: This is a resend because I did no hit reply to ALL, sorry for the
> spam David.
>
> On Mon, Sep 9, 2013 at 6:18 AM, David Howells <[email protected]> wrote:
> > Milosz Tanski <[email protected]> wrote:
> >
> >> I think that change does the trick. I had it running on the same
> >> machine for 5 hours and had the kernel forcefully drop some of the
> >> inodes in the cache (via drop caches) without a crash. I'll send a
> >> proper patch email after you take a look and make sure I did the right
> >> thing.
> >
> > Do you mind if I roll your change directly into my patch and reissue the set?
> > Or would you rather have an extra patch at this time?
> >
> > David
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-09-09 18:29:01

by Milosz Tanski

[permalink] [raw]
Subject: [PATCH] fscache: check consistency does not decrement refcount

__fscache_check_consistency() does not decrement the count of operations
active after it finishes in the success case. This leads to a hung tasks on
cookie de-registration (commonly in inode eviction).

INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
Workqueue: ceph-msgr con_work [libceph]
...
Call Trace:
[<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
[<ffffffff81568d09>] schedule+0x29/0x70
[<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
[<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
[<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
[<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
[<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
[<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
[<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
[<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
[<ffffffff8119bb69>] evict+0x119/0x1b0

Signed-off-by: Milosz Tanski <[email protected]>
---
fs/fscache/cookie.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 318e843..b2a86e3 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -586,7 +586,8 @@ int __fscache_check_consistency(struct fscache_cookie *cookie)

fscache_operation_init(op, NULL, NULL);
op->flags = FSCACHE_OP_MYTHREAD |
- (1 << FSCACHE_OP_WAITING);
+ (1 << FSCACHE_OP_WAITING) |
+ (1 << FSCACHE_OP_UNUSE_COOKIE);

spin_lock(&cookie->lock);

--
1.7.9.5

2013-09-09 18:54:27

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH] fscache: check consistency does not decrement refcount

David,

Can I get a sign off on this?

Thanks,
- Milosz

On Mon, Sep 9, 2013 at 2:28 PM, Milosz Tanski <[email protected]> wrote:
> __fscache_check_consistency() does not decrement the count of operations
> active after it finishes in the success case. This leads to a hung tasks on
> cookie de-registration (commonly in inode eviction).
>
> INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
> kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
> Workqueue: ceph-msgr con_work [libceph]
> ...
> Call Trace:
> [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
> [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
> [<ffffffff81568d09>] schedule+0x29/0x70
> [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
> [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
> [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
> [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
> [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
> [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
> [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
> [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
> [<ffffffff8119bb69>] evict+0x119/0x1b0
>
> Signed-off-by: Milosz Tanski <[email protected]>
> ---
> fs/fscache/cookie.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
> index 318e843..b2a86e3 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -586,7 +586,8 @@ int __fscache_check_consistency(struct fscache_cookie *cookie)
>
> fscache_operation_init(op, NULL, NULL);
> op->flags = FSCACHE_OP_MYTHREAD |
> - (1 << FSCACHE_OP_WAITING);
> + (1 << FSCACHE_OP_WAITING) |
> + (1 << FSCACHE_OP_UNUSE_COOKIE);
>
> spin_lock(&cookie->lock);
>
> --
> 1.7.9.5
>

2013-09-10 12:34:43

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] fscache: check consistency does not decrement refcount

Milosz Tanski <[email protected]> wrote:

> __fscache_check_consistency() does not decrement the count of operations
> active after it finishes in the success case. This leads to a hung tasks on
> cookie de-registration (commonly in inode eviction).
>
> INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
> kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
> Workqueue: ceph-msgr con_work [libceph]
> ...
> Call Trace:
> [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
> [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
> [<ffffffff81568d09>] schedule+0x29/0x70
> [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
> [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
> [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
> [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
> [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
> [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
> [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
> [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
> [<ffffffff8119bb69>] evict+0x119/0x1b0
>
> Signed-off-by: Milosz Tanski <[email protected]>

Acked-by: David Howells <[email protected]>

2013-09-10 16:45:15

by Milosz Tanski

[permalink] [raw]
Subject: Re: [PATCH] fscache: check consistency does not decrement refcount

Sage,

Can you submit to the upstream next in your next round of fixes (with
David's ack).

Thanks,
- Milosz

P.S: Thanks David.

On Tue, Sep 10, 2013 at 8:34 AM, David Howells <[email protected]> wrote:
> Milosz Tanski <[email protected]> wrote:
>
>> __fscache_check_consistency() does not decrement the count of operations
>> active after it finishes in the success case. This leads to a hung tasks on
>> cookie de-registration (commonly in inode eviction).
>>
>> INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
>> kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
>> Workqueue: ceph-msgr con_work [libceph]
>> ...
>> Call Trace:
>> [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>> [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
>> [<ffffffff81568d09>] schedule+0x29/0x70
>> [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
>> [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
>> [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
>> [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
>> [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
>> [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
>> [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
>> [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
>> [<ffffffff8119bb69>] evict+0x119/0x1b0
>>
>> Signed-off-by: Milosz Tanski <[email protected]>
>
> Acked-by: David Howells <[email protected]>

2013-09-10 16:48:40

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH] fscache: check consistency does not decrement refcount

On Tue, 10 Sep 2013, Milosz Tanski wrote:
> Sage,
>
> Can you submit to the upstream next in your next round of fixes (with
> David's ack).

Yep; it's in the queue. Thanks!

sage

>
> Thanks,
> - Milosz
>
> P.S: Thanks David.
>
> On Tue, Sep 10, 2013 at 8:34 AM, David Howells <[email protected]> wrote:
> > Milosz Tanski <[email protected]> wrote:
> >
> >> __fscache_check_consistency() does not decrement the count of operations
> >> active after it finishes in the success case. This leads to a hung tasks on
> >> cookie de-registration (commonly in inode eviction).
> >>
> >> INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
> >> kworker/1:2 D ffff880443513fc0 0 4214 2 0x00000000
> >> Workqueue: ceph-msgr con_work [libceph]
> >> ...
> >> Call Trace:
> >> [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
> >> [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
> >> [<ffffffff81568d09>] schedule+0x29/0x70
> >> [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
> >> [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
> >> [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
> >> [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
> >> [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
> >> [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
> >> [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
> >> [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
> >> [<ffffffff8119bb69>] evict+0x119/0x1b0
> >>
> >> Signed-off-by: Milosz Tanski <[email protected]>
> >
> > Acked-by: David Howells <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>