2007-08-09 16:07:52

by David Howells

[permalink] [raw]
Subject: [PATCH 00/14] Permit filesystem local caching [try #2]



These patches add local caching for network filesystems such as NFS and AFS.

FS-Cache now runs fully asynchronously as required by Trond Myklebust for NFS.

--
Changes:

(*) The CacheFiles module no longer accepts directory fds in its cull and
inuse commands from cachefilesd. Instead it uses the current working
directory of the calling process as the basis for looking up the object.
Corollary to this, fget_light() no longer needs to be exported.

--
A tarball of the patches is available at:

http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-21.tar.bz2


To use this version of CacheFiles, the cachefilesd-0.9 is also required. It
is available as an SRPM:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm

Or as individual bits:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2
http://people.redhat.com/~dhowells/fscache/cachefilesd.fc
http://people.redhat.com/~dhowells/fscache/cachefilesd.if
http://people.redhat.com/~dhowells/fscache/cachefilesd.te
http://people.redhat.com/~dhowells/fscache/cachefilesd.spec

The .fc, .if and .te files are for manipulating SELinux.

David


2007-08-09 16:06:02

by David Howells

[permalink] [raw]
Subject: [PATCH 02/14] FS-Cache: Recruit a couple of page flags for cache management [try #2]

Recruit a couple of page flags to aid in cache management. The following extra
flags are defined:

(1) PG_fscache (PG_owner_priv_2)

The marked page is backed by a local cache and is pinning resources in the
cache driver.

(2) PG_fscache_write (PG_owner_priv_3)

The marked page is being written to the local cache. The page may not be
modified whilst this is in progress.

If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to detect this.

Signed-off-by: David Howells <[email protected]>
---

fs/splice.c | 2 +-
include/linux/page-flags.h | 30 +++++++++++++++++++++++++++++-
include/linux/pagemap.h | 11 +++++++++++
mm/filemap.c | 16 ++++++++++++++++
mm/migrate.c | 2 +-
mm/page_alloc.c | 3 +++
mm/readahead.c | 9 +++++----
mm/swap.c | 4 ++--
mm/swap_state.c | 4 ++--
mm/truncate.c | 10 +++++-----
mm/vmscan.c | 2 +-
11 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index c010a72..ae4f5b7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe,
*/
wait_on_page_writeback(page);

- if (PagePrivate(page))
+ if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);

/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..eaf9854 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -83,19 +83,24 @@
#define PG_private 11 /* If pagecache, has fs-private data */

#define PG_writeback 12 /* Page is under writeback */
+#define PG_owner_priv_2 13 /* Owner use. If pagecache, fs may use */
#define PG_compound 14 /* Part of a compound page */
#define PG_swapcache 15 /* Swap page: swp_entry_t in private */

#define PG_mappedtodisk 16 /* Has blocks allocated on-disk */
#define PG_reclaim 17 /* To be reclaimed asap */
+#define PG_owner_priv_3 18 /* Owner use. If pagecache, fs may use */
#define PG_buddy 19 /* Page is free, on buddy lists */

/* PG_readahead is only used for file reads; PG_reclaim is only for writes */
#define PG_readahead PG_reclaim /* Reminder to do async read-ahead */

-/* PG_owner_priv_1 users should have descriptive aliases */
+/* PG_owner_priv_1/2/3 users should have descriptive aliases */
#define PG_checked PG_owner_priv_1 /* Used by some filesystems */
#define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */
+#define PG_fscache PG_owner_priv_2 /* Backed by local cache */
+#define PG_fscache_write PG_owner_priv_3 /* Writing to local cache */
+

#if (BITS_PER_LONG > 32)
/*
@@ -199,6 +204,18 @@ static inline void SetPageUptodate(struct page *page)
#define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback, \
&(page)->flags)

+#define PageFsCache(page) test_bit(PG_fscache, &(page)->flags)
+#define SetPageFsCache(page) set_bit(PG_fscache, &(page)->flags)
+#define ClearPageFsCache(page) clear_bit(PG_fscache, &(page)->flags)
+#define TestSetPageFsCache(page) test_and_set_bit(PG_fscache, &(page)->flags)
+#define TestClearPageFsCache(page) test_and_clear_bit(PG_fscache, &(page)->flags)
+
+#define PageFsCacheWrite(page) test_bit(PG_fscache_write, &(page)->flags)
+#define SetPageFsCacheWrite(page) set_bit(PG_fscache_write, &(page)->flags)
+#define ClearPageFsCacheWrite(page) clear_bit(PG_fscache_write, &(page)->flags)
+#define TestSetPageFsCacheWrite(page) test_and_set_bit(PG_fscache_write, &(page)->flags)
+#define TestClearPageFsCacheWrite(page) test_and_clear_bit(PG_fscache_write, &(page)->flags)
+
#define PageBuddy(page) test_bit(PG_buddy, &(page)->flags)
#define __SetPageBuddy(page) __set_bit(PG_buddy, &(page)->flags)
#define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags)
@@ -272,4 +289,15 @@ static inline void set_page_writeback(struct page *page)
test_set_page_writeback(page);
}

+/**
+ * page_has_private - Determine if page has private stuff
+ * @page: The page to be checked
+ *
+ * Determine if a page has private stuff, indicating that release routines
+ * should be invoked upon it.
+ */
+#define page_has_private(page) \
+ ((page)->flags & ((1 << PG_private) | \
+ (1 << PG_fscache)))
+
#endif /* PAGE_FLAGS_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8a83537..d1049b6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -208,6 +208,17 @@ static inline void wait_on_page_writeback(struct page *page)

extern void end_page_writeback(struct page *page);

+/*
+ * Wait for a page to finish being written to a local cache
+ */
+static inline void wait_on_page_fscache_write(struct page *page)
+{
+ if (PageFsCacheWrite(page))
+ wait_on_page_bit(page, PG_fscache_write);
+}
+
+extern void end_page_fscache_write(struct page *page);
+
/*
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
diff --git a/mm/filemap.c b/mm/filemap.c
index 6cf700d..7b96487 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -557,6 +557,19 @@ void end_page_writeback(struct page *page)
EXPORT_SYMBOL(end_page_writeback);

/**
+ * end_page_fscache_write - End writing page to cache
+ * @page: the page
+ */
+void end_page_fscache_write(struct page *page)
+{
+ if (!TestClearPageFsCacheWrite(page))
+ BUG();
+ smp_mb__after_clear_bit();
+ wake_up_page(page, PG_fscache_write);
+}
+EXPORT_SYMBOL(end_page_fscache_write);
+
+/**
* __lock_page - get a lock on the page, assuming we need to sleep to get it
* @page: the page to lock
*
@@ -2243,6 +2256,9 @@ out:
* (presumably at page->private). If the release was successful, return `1'.
* Otherwise return zero.
*
+ * This may also be called if PG_fscache is set on a page, indicating that the
+ * page is known to the local caching routines.
+ *
* The @gfp_mask argument specifies whether I/O may be performed to release
* this page (__GFP_IO), and whether the call may block (__GFP_WAIT).
*
diff --git a/mm/migrate.c b/mm/migrate.c
index 37c73b9..828cacf 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -544,7 +544,7 @@ static int fallback_migrate_page(struct address_space *mapping,
* Buffers may be managed in a filesystem specific way.
* We must have no buffers or drop them.
*/
- if (PagePrivate(page) &&
+ if (page_has_private(page) &&
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3da85b8..96a42d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -208,6 +208,7 @@ static void bad_page(struct page *page)
dump_stack();
page->flags &= ~(1 << PG_lru |
1 << PG_private |
+ 1 << PG_fscache |
1 << PG_locked |
1 << PG_active |
1 << PG_dirty |
@@ -445,6 +446,7 @@ static inline int free_pages_check(struct page *page)
(page->flags & (
1 << PG_lru |
1 << PG_private |
+ 1 << PG_fscache |
1 << PG_locked |
1 << PG_active |
1 << PG_slab |
@@ -593,6 +595,7 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
(page->flags & (
1 << PG_lru |
1 << PG_private |
+ 1 << PG_fscache |
1 << PG_locked |
1 << PG_active |
1 << PG_dirty |
diff --git a/mm/readahead.c b/mm/readahead.c
index 12d1378..8b5e017 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -54,14 +54,15 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);

/*
* see if a page needs releasing upon read_cache_pages() failure
- * - the caller of read_cache_pages() may have set PG_private before calling,
- * such as the NFS fs marking pages that are cached locally on disk, thus we
- * need to give the fs a chance to clean up in the event of an error
+ * - the caller of read_cache_pages() may have set PG_private or PG_fscache
+ * before calling, such as the NFS fs marking pages that are cached locally
+ * on disk, thus we need to give the fs a chance to clean up in the event of
+ * an error
*/
static void read_cache_pages_invalidate_page(struct address_space *mapping,
struct page *page)
{
- if (PagePrivate(page)) {
+ if (page_has_private(page)) {
if (TestSetPageLocked(page))
BUG();
page->mapping = mapping;
diff --git a/mm/swap.c b/mm/swap.c
index d3cb966..498d0c1 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -412,8 +412,8 @@ void pagevec_strip(struct pagevec *pvec)
for (i = 0; i < pagevec_count(pvec); i++) {
struct page *page = pvec->pages[i];

- if (PagePrivate(page) && !TestSetPageLocked(page)) {
- if (PagePrivate(page))
+ if (page_has_private(page) && !TestSetPageLocked(page)) {
+ if (page_has_private(page))
try_to_release_page(page, 0);
unlock_page(page);
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 67daecb..93667ef 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -75,7 +75,7 @@ static int __add_to_swap_cache(struct page *page, swp_entry_t entry,
int error;

BUG_ON(PageSwapCache(page));
- BUG_ON(PagePrivate(page));
+ BUG_ON(page_has_private(page));
error = radix_tree_preload(gfp_mask);
if (!error) {
write_lock_irq(&swapper_space.tree_lock);
@@ -126,7 +126,7 @@ void __delete_from_swap_cache(struct page *page)
BUG_ON(!PageLocked(page));
BUG_ON(!PageSwapCache(page));
BUG_ON(PageWriteback(page));
- BUG_ON(PagePrivate(page));
+ BUG_ON(page_has_private(page));

radix_tree_delete(&swapper_space.page_tree, page_private(page));
set_page_private(page, 0);
diff --git a/mm/truncate.c b/mm/truncate.c
index 5cdfbc1..5555cb0 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -48,7 +48,7 @@ void do_invalidatepage(struct page *page, unsigned long offset)
static inline void truncate_partial_page(struct page *page, unsigned partial)
{
zero_user_page(page, partial, PAGE_CACHE_SIZE - partial, KM_USER0);
- if (PagePrivate(page))
+ if (page_has_private(page))
do_invalidatepage(page, partial);
}

@@ -97,7 +97,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)

cancel_dirty_page(page, PAGE_CACHE_SIZE);

- if (PagePrivate(page))
+ if (page_has_private(page))
do_invalidatepage(page, 0);

remove_from_page_cache(page);
@@ -122,7 +122,7 @@ invalidate_complete_page(struct address_space *mapping, struct page *page)
if (page->mapping != mapping)
return 0;

- if (PagePrivate(page) && !try_to_release_page(page, 0))
+ if (page_has_private(page) && !try_to_release_page(page, 0))
return 0;

ret = remove_mapping(mapping, page);
@@ -344,14 +344,14 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)
if (page->mapping != mapping)
return 0;

- if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL))
+ if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL))
return 0;

write_lock_irq(&mapping->tree_lock);
if (PageDirty(page))
goto failed;

- BUG_ON(PagePrivate(page));
+ BUG_ON(page_has_private(page));
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
ClearPageUptodate(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d419e10..de851f4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -548,7 +548,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
* process address space (page_count == 1) it can be freed.
* Otherwise, leave the page on the LRU so it is swappable.
*/
- if (PagePrivate(page)) {
+ if (page_has_private(page)) {
if (!try_to_release_page(page, sc->gfp_mask))
goto activate_locked;
if (!mapping && page_count(page) == 1)

2007-08-09 16:06:35

by David Howells

[permalink] [raw]
Subject: [PATCH 05/14] CacheFiles: Add missing copy_page export for ia64 [try #2]

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches. This patch is not yet upstream, but is required
for cachefile on ia64. It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava <[email protected]>
Signed-Off-By: David Howells <[email protected]>
---

arch/ia64/kernel/ia64_ksyms.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index bd17190..20c3546 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user);
EXPORT_SYMBOL(__strlen_user);
EXPORT_SYMBOL(__strncpy_from_user);
EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);

/* from arch/ia64/lib */
extern void __divsi3(void);

2007-08-09 16:08:25

by David Howells

[permalink] [raw]
Subject: [PATCH 01/14] FS-Cache: Release page->private after failed readahead [try #2]

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells <[email protected]>
---

mm/readahead.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 39bf45d..12d1378 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -15,6 +15,7 @@
#include <linux/backing-dev.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/pagevec.h>
+#include <linux/buffer_head.h>

void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -51,6 +52,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);

#define list_to_page(head) (list_entry((head)->prev, struct page, lru))

+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ * such as the NFS fs marking pages that are cached locally on disk, thus we
+ * need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+ struct page *page)
+{
+ if (PagePrivate(page)) {
+ if (TestSetPageLocked(page))
+ BUG();
+ page->mapping = mapping;
+ do_invalidatepage(page, 0);
+ page->mapping = NULL;
+ unlock_page(page);
+ }
+ page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+ struct page *victim;
+
+ while (!list_empty(pages)) {
+ victim = list_to_page(pages);
+ list_del(&victim->lru);
+ read_cache_pages_invalidate_page(mapping, victim);
+ }
+}
+
/**
* read_cache_pages - populate an address space with some pages & start reads against them
* @mapping: the address_space
@@ -74,14 +110,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
page = list_to_page(pages);
list_del(&page->lru);
if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
- page_cache_release(page);
+ read_cache_pages_invalidate_page(mapping, page);
continue;
}
ret = filler(data, page);
if (!pagevec_add(&lru_pvec, page))
__pagevec_lru_add(&lru_pvec);
if (ret) {
- put_pages_list(pages);
+ read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);

2007-08-09 16:08:48

by David Howells

[permalink] [raw]
Subject: [PATCH 03/14] FS-Cache: Provide an add_wait_queue_tail() function [try #2]

Provide an add_wait_queue_tail() function to add a waiter to the back of a
wait queue instead of the front.

Signed-off-by: David Howells <[email protected]>
---

include/linux/wait.h | 1 +
kernel/wait.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..4cae7db 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -118,6 +118,7 @@ static inline int waitqueue_active(wait_queue_head_t *q)
#define is_sync_wait(wait) (!(wait) || ((wait)->private))

extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
+extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t * wait));
extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait));
extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));

diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..7acc9cc 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
}
EXPORT_SYMBOL(add_wait_queue);

+/**
+ * add_wait_queue_tail - Add a waiter to the back of a waitqueue
+ * @q: the wait queue to append the waiter to
+ * @wait: the waiter to be queued
+ *
+ * Add a waiter to the back of a waitqueue so that it gets woken up last.
+ */
+void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait)
+{
+ unsigned long flags;
+
+ wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue_tail(q, wait);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_tail);
+
void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;

2007-08-09 16:09:16

by David Howells

[permalink] [raw]
Subject: [PATCH 14/14] NFS: Use local caching [try #2]

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required. This can be
obtained from:

http://people.redhat.com/steved/cachefs/util-linux/

To mount an NFS filesystem to use caching, add an "fsc" option to the mount:

mount warthog:/ /a -o fsc

Signed-Off-By: David Howells <[email protected]>
---

fs/Kconfig | 8 +
fs/nfs/Makefile | 1
fs/nfs/client.c | 11 +
fs/nfs/file.c | 38 +++-
fs/nfs/fscache.c | 303 +++++++++++++++++++++++++++++
fs/nfs/fscache.h | 464 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/inode.c | 16 ++
fs/nfs/internal.h | 8 +
fs/nfs/read.c | 27 ++-
fs/nfs/super.c | 1
fs/nfs/sysctl.c | 43 ++++
fs/nfs/write.c | 3
include/linux/nfs4_mount.h | 3
include/linux/nfs_fs.h | 6 +
include/linux/nfs_fs_sb.h | 5
include/linux/nfs_mount.h | 3
16 files changed, 931 insertions(+), 9 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 7feb4cb..76d5d16 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1600,6 +1600,14 @@ config NFS_V4

If unsure, say N.

+config NFS_FSCACHE
+ bool "Provide NFS client caching support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+ help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
config NFS_DIRECTIO
bool "Allow direct I/O on NFS files"
depends on NFS_FS
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index b55cb23..c9e7c43 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \
nfs4namespace.o
nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
nfs-objs := $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a49f9fe..7be7807 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -137,6 +137,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname,
clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
#endif

+ nfs_fscache_get_client_cookie(clp);
+
return clp;

error_3:
@@ -168,6 +170,8 @@ static void nfs_free_client(struct nfs_client *clp)

nfs4_shutdown_client(clp);

+ nfs_fscache_release_client_cookie(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -1308,7 +1312,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)

/* display header on line 1 */
if (v == &nfs_volume_list) {
- seq_puts(m, "NV SERVER PORT DEV FSID\n");
+ seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1322,12 +1326,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);

- seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+ seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
clp->cl_nfsversion,
NIPQUAD(clp->cl_addr.sin_addr),
ntohs(clp->cl_addr.sin_port),
dev,
- fsid);
+ fsid,
+ nfs_server_fscache_state(server));

return 0;
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index c87dc71..dfd36e0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -34,6 +34,7 @@

#include "delegation.h"
#include "iostat.h"
+#include "internal.h"

#define NFSDBG_FACILITY NFSDBG_FILE

@@ -259,6 +260,10 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
status = nfs_revalidate_mapping(inode, file->f_mapping);
if (!status)
status = generic_file_mmap(file, vma);
+
+ if (status == 0)
+ nfs_fscache_install_vm_ops(inode, vma);
+
return status;
}

@@ -311,22 +316,51 @@ static int nfs_commit_write(struct file *file, struct page *page, unsigned offse
return status;
}

+/*
+ * partially or wholly invalidate a page
+ * - release the private state associated with a page if undergoing complete
+ * page invalidation
+ * - caller holds page lock
+ */
static void nfs_invalidate_page(struct page *page, unsigned long offset)
{
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_priority(page->mapping->host, page, FLUSH_INVALIDATE);
+
+ nfs_fscache_invalidate_page(page, page->mapping->host, offset);
+
+ /* we can do this here as the bits are only set with the page lock
+ * held, and our caller is holding that */
+ if (!page->private)
+ ClearPagePrivate(page);
}

+/*
+ * release the private state associated with a page, if the page isn't busy
+ * - caller holds page lock
+ * - return true (may release) or false (may not)
+ */
static int nfs_release_page(struct page *page, gfp_t gfp)
{
- /* If PagePrivate() is set, then the page is not freeable */
- return 0;
+ /*
+ * Avoid deadlock on nfs_wait_on_request().
+ */
+ if (!(gfp & __GFP_FS))
+ return 0;
+
+ if (!nfs_fscache_release_page(page, gfp))
+ return 0;
+
+ BUG_ON(page->private != 0);
+ ClearPagePrivate(page);
+ return 1;
}

static int nfs_launder_page(struct page *page)
{
+ nfs_fscache_invalidate_page(page, page->mapping->host, 0);
return nfs_wb_page(page->mapping->host, page);
}

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
new file mode 100644
index 0000000..2b2785a
--- /dev/null
+++ b/fs/nfs/fscache.c
@@ -0,0 +1,303 @@
+/* fscache.c: NFS filesystem cache interface
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_fs_sb.h>
+#include <linux/in6.h>
+
+#include "internal.h"
+
+/*
+ * Sysctl variables
+ */
+atomic_t nfs_fscache_to_pages;
+atomic_t nfs_fscache_from_pages;
+atomic_t nfs_fscache_uncache_page;
+int nfs_fscache_from_error;
+int nfs_fscache_to_error;
+
+#define NFSDBG_FACILITY NFSDBG_FSCACHE
+
+/* the auxiliary data in the cache (used for coherency management) */
+struct nfs_fh_auxdata {
+ struct timespec i_mtime;
+ struct timespec i_ctime;
+ loff_t i_size;
+};
+
+static struct fscache_netfs_operations nfs_cache_ops = {
+};
+
+struct fscache_netfs nfs_cache_netfs = {
+ .name = "nfs",
+ .version = 0,
+ .ops = &nfs_cache_ops,
+};
+
+static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
+ [0 ... 9] = 0x00,
+ [10 ... 11] = 0xff
+};
+
+struct nfs_server_key {
+ uint16_t nfsversion;
+ uint16_t port;
+ union {
+ struct {
+ uint8_t ipv6wrapper[12];
+ struct in_addr addr;
+ } ipv4_addr;
+ struct in6_addr ipv6_addr;
+ };
+};
+
+static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct nfs_client *clp = cookie_netfs_data;
+ struct nfs_server_key *key = buffer;
+ uint16_t len = 0;
+
+ key->nfsversion = clp->cl_nfsversion;
+
+ switch (clp->cl_addr.sin_family) {
+ case AF_INET:
+ key->port = clp->cl_addr.sin_port;
+
+ memcpy(&key->ipv4_addr.ipv6wrapper,
+ &nfs_cache_ipv6_wrapper_for_ipv4,
+ sizeof(key->ipv4_addr.ipv6wrapper));
+ memcpy(&key->ipv4_addr.addr,
+ &clp->cl_addr.sin_addr,
+ sizeof(key->ipv4_addr.addr));
+ len = sizeof(struct nfs_server_key);
+ break;
+
+ case AF_INET6:
+ key->port = clp->cl_addr.sin_port;
+
+ memcpy(&key->ipv6_addr,
+ &clp->cl_addr.sin_addr,
+ sizeof(key->ipv6_addr));
+ len = sizeof(struct nfs_server_key);
+ break;
+
+ default:
+ len = 0;
+ printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
+ clp->cl_addr.sin_family);
+ break;
+ }
+
+ return len;
+}
+
+/*
+ * the root index for the filesystem is defined by nfsd IP address and ports
+ */
+struct fscache_cookie_def nfs_cache_server_index_def = {
+ .name = "NFS.servers",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = nfs_server_get_key,
+};
+
+static uint16_t nfs_fh_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct nfs_inode *nfsi = cookie_netfs_data;
+ uint16_t nsize;
+
+ /* set the file handle */
+ nsize = nfsi->fh.size;
+ memcpy(buffer, nfsi->fh.data, nsize);
+ return nsize;
+}
+
+/*
+ * get an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ * require a context
+ */
+static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
+{
+ get_nfs_open_context(context);
+}
+
+/*
+ * release an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ * require a context
+ */
+static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
+{
+ if (context)
+ put_nfs_open_context(context);
+}
+
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ * is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+static void nfs_fh_now_uncached(void *cookie_netfs_data)
+{
+ struct nfs_inode *nfsi = cookie_netfs_data;
+ struct pagevec pvec;
+ pgoff_t first;
+ int loop, nr_pages;
+
+ pagevec_init(&pvec, 0);
+ first = 0;
+
+ dprintk("NFS: nfs_fh_now_uncached: nfs_inode 0x%p\n", nfsi);
+
+ for (;;) {
+ /* grab a bunch of pages to clean */
+ nr_pages = pagevec_lookup(&pvec,
+ nfsi->vfs_inode.i_mapping,
+ first,
+ PAGEVEC_SIZE - pagevec_count(&pvec));
+ if (!nr_pages)
+ break;
+
+ for (loop = 0; loop < nr_pages; loop++)
+ ClearPageFsCache(pvec.pages[loop]);
+
+ first = pvec.pages[nr_pages - 1]->index + 1;
+
+ pvec.nr = nr_pages;
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+}
+
+/*
+ * get certain file attributes from the netfs data
+ * - this function can be absent for an index
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+static void nfs_fh_get_attr(const void *cookie_netfs_data, uint64_t *size)
+{
+ const struct nfs_inode *nfsi = cookie_netfs_data;
+
+ *size = nfsi->vfs_inode.i_size;
+}
+
+/*
+ * get the auxilliary data from netfs data
+ * - this function can be absent if the index carries no state data
+ * - should store the auxilliary data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+static uint16_t nfs_fh_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ struct nfs_fh_auxdata auxdata;
+ const struct nfs_inode *nfsi = cookie_netfs_data;
+
+ auxdata.i_size = nfsi->vfs_inode.i_size;
+ auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+ auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+ if (bufmax > sizeof(auxdata))
+ bufmax = sizeof(auxdata);
+
+ memcpy(buffer, &auxdata, bufmax);
+ return bufmax;
+}
+
+/*
+ * consult the netfs about the state of an object
+ * - this function can be absent if the index carries no state data
+ * - the netfs data from the cookie being used as the target is
+ * presented, as is the auxilliary data
+ */
+static fscache_checkaux_t nfs_fh_check_aux(void *cookie_netfs_data,
+ const void *data, uint16_t datalen)
+{
+ struct nfs_fh_auxdata auxdata;
+ struct nfs_inode *nfsi = cookie_netfs_data;
+
+ if (datalen > sizeof(auxdata))
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ auxdata.i_size = nfsi->vfs_inode.i_size;
+ auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+ auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+ if (memcmp(data, &auxdata, datalen) != 0)
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ return FSCACHE_CHECKAUX_OKAY;
+}
+
+/*
+ * the primary index for each server is simply made up of a series of NFS file
+ * handles
+ */
+struct fscache_cookie_def nfs_cache_fh_index_def = {
+ .name = "NFS.fh",
+ .type = FSCACHE_COOKIE_TYPE_DATAFILE,
+ .get_key = nfs_fh_get_key,
+ .get_attr = nfs_fh_get_attr,
+ .get_aux = nfs_fh_get_aux,
+ .check_aux = nfs_fh_check_aux,
+ .get_context = nfs_fh_get_context,
+ .put_context = nfs_fh_put_context,
+ .now_uncached = nfs_fh_now_uncached,
+};
+
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+ wait_on_page_fscache_write(page);
+ return 0;
+}
+
+struct vm_operations_struct nfs_fs_vm_operations = {
+ .fault = filemap_fault,
+ .page_mkwrite = nfs_file_page_mkwrite,
+};
+
+/*
+ * handle completion of a page being read from the cache
+ * - called in process (keventd) context
+ */
+void nfs_readpage_from_fscache_complete(struct page *page,
+ void *context,
+ int error)
+{
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
+ page, context, error);
+
+ /* if the read completes with an error, we just unlock the page and let
+ * the VM reissue the readpage */
+ if (!error) {
+ SetPageUptodate(page);
+ unlock_page(page);
+ } else {
+ error = nfs_readpage_async(context, page->mapping->host, page);
+ if (error)
+ unlock_page(page);
+ }
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
new file mode 100644
index 0000000..152309c
--- /dev/null
+++ b/fs/nfs/fscache.h
@@ -0,0 +1,464 @@
+/* fscache.h: NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_mount.h>
+#include <linux/nfs4_mount.h>
+
+#ifdef CONFIG_NFS_FSCACHE
+#include <linux/fscache.h>
+
+extern struct fscache_netfs nfs_cache_netfs;
+extern struct fscache_cookie_def nfs_cache_server_index_def;
+extern struct fscache_cookie_def nfs_cache_fh_index_def;
+extern struct vm_operations_struct nfs_fs_vm_operations;
+
+extern void nfs_invalidatepage(struct page *, unsigned long);
+extern int nfs_releasepage(struct page *, gfp_t);
+
+extern atomic_t nfs_fscache_to_pages;
+extern atomic_t nfs_fscache_from_pages;
+extern atomic_t nfs_fscache_uncache_page;
+extern int nfs_fscache_from_error;
+extern int nfs_fscache_to_error;
+
+/*
+ * register NFS for caching
+ */
+static inline int nfs_fscache_register(void)
+{
+ return fscache_register_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * unregister NFS for caching
+ */
+static inline void nfs_fscache_unregister(void)
+{
+ fscache_unregister_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * get the per-client index cookie for an NFS client if the appropriate mount
+ * flag was set
+ * - we always try and get an index cookie for the client, but get filehandle
+ * cookies on a per-superblock basis, depending on the mount flags
+ */
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp)
+{
+ /* create a cache index for looking up filehandles */
+ clp->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+ &nfs_cache_server_index_def,
+ clp);
+ dfprintk(FSCACHE,"NFS: get client cookie (0x%p/0x%p)\n",
+ clp, clp->fscache);
+}
+
+/*
+ * dispose of a per-client cookie
+ */
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp)
+{
+ dfprintk(FSCACHE,"NFS: releasing client cookie (0x%p/0x%p)\n",
+ clp, clp->fscache);
+
+ fscache_relinquish_cookie(clp->fscache, 0);
+ clp->fscache = NULL;
+}
+
+/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+ if (server->nfs_client->fscache && (server->flags & NFS_MOUNT_FSCACHE))
+ return "yes";
+ return "no ";
+}
+
+/*
+ * get the per-filehandle cookie for an NFS inode
+ */
+static inline void nfs_fscache_init_fh_cookie(struct inode *inode)
+{
+ NFS_I(inode)->fscache = NULL;
+ if (S_ISREG(inode->i_mode))
+ set_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);
+}
+
+/*
+ * get the per-filehandle cookie for an NFS inode
+ */
+static inline void nfs_fscache_enable_fh_cookie(struct inode *inode)
+{
+ struct super_block *sb = inode->i_sb;
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ if (nfsi->fscache || !NFS_FSCACHE(inode))
+ return;
+
+ if ((NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
+ nfsi->fscache = fscache_acquire_cookie(
+ NFS_SB(sb)->nfs_client->fscache,
+ &nfs_cache_fh_index_def,
+ nfsi);
+
+ dfprintk(FSCACHE, "NFS: get FH cookie (0x%p/0x%p/0x%p)\n",
+ sb, nfsi, nfsi->fscache);
+ }
+}
+
+/*
+ * change the filesize associated with a per-filehandle cookie
+ */
+static inline void nfs_fscache_attr_changed(struct inode *inode)
+{
+ fscache_attr_changed(NFS_I(inode)->fscache);
+}
+
+/*
+ * replace a per-filehandle cookie due to revalidation detecting a file having
+ * changed on the server
+ */
+static inline void nfs_fscache_renew_fh_cookie(struct inode *inode)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs_server *server = NFS_SERVER(inode);
+ struct fscache_cookie *old = nfsi->fscache;
+
+ if (nfsi->fscache) {
+ /* retire the current fscache cache and get a new one */
+ fscache_relinquish_cookie(nfsi->fscache, 1);
+
+ nfsi->fscache = fscache_acquire_cookie(
+ server->nfs_client->fscache,
+ &nfs_cache_fh_index_def,
+ nfsi);
+
+ dfprintk(FSCACHE,
+ "NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
+ server, nfsi, old, nfsi->fscache);
+ }
+}
+
+/*
+ * release a per-filehandle cookie
+ */
+static inline void nfs_fscache_release_fh_cookie(struct inode *inode)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
+ nfsi, nfsi->fscache);
+
+ fscache_relinquish_cookie(nfsi->fscache, 0);
+ nfsi->fscache = NULL;
+}
+
+/*
+ * retire a per-filehandle cookie, destroying the data attached to it
+ */
+static inline void nfs_fscache_zap_fh_cookie(struct inode *inode)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
+ nfsi, nfsi->fscache);
+
+ fscache_relinquish_cookie(nfsi->fscache, 1);
+ nfsi->fscache = NULL;
+}
+
+/*
+ * turn off the cache with regard to a filehandle cookie if opened for writing,
+ * invalidating all the pages in the page cache relating to the associated
+ * inode to clear the per-page caching
+ */
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
+{
+ clear_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);
+
+ if (NFS_I(inode)->fscache) {
+ dfprintk(FSCACHE,
+ "NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
+
+ /* Need to invalided any mapped pages that were read in before
+ * turning off the cache.
+ */
+ if (inode->i_mapping && inode->i_mapping->nrpages)
+ invalidate_inode_pages2(inode->i_mapping);
+
+ nfs_fscache_zap_fh_cookie(inode);
+ }
+}
+
+/*
+ * decide if we should enable or disable the FS cache for this inode
+ * - for now, only regular files that are open read-only will be able to use
+ * the cache
+ */
+static inline void nfs_fscache_set_fh_cookie(struct inode *inode,
+ struct file *filp)
+{
+ if (NFS_FSCACHE(inode)) {
+ if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
+ nfs_fscache_disable_fh_cookie(inode);
+ else
+ nfs_fscache_enable_fh_cookie(inode);
+ }
+}
+
+/*
+ * install the VM ops for mmap() of an NFS file so that we can hold up writes
+ * to pages on shared writable mappings until the store to the cache is
+ * complete
+ */
+static inline void nfs_fscache_install_vm_ops(struct inode *inode,
+ struct vm_area_struct *vma)
+{
+ if (NFS_I(inode)->fscache)
+ vma->vm_ops = &nfs_fs_vm_operations;
+}
+
+/*
+ * release the caching state associated with a page, if the page isn't busy
+ * interacting with the cache
+ * - returns true (can release page) or false (page busy)
+ */
+static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
+{
+ if (PageFsCacheWrite(page)) {
+ if (!(gfp & __GFP_WAIT))
+ return 0;
+ wait_on_page_fscache_write(page);
+ }
+
+ if (PageFsCache(page)) {
+ struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+ BUG_ON(!nfsi->fscache);
+
+ dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
+ nfsi->fscache, page, nfsi);
+
+ fscache_uncache_page(nfsi->fscache, page);
+ atomic_inc(&nfs_fscache_uncache_page);
+ ClearPageFsCache(page);
+ }
+
+ return 1;
+}
+
+/*
+ * release the caching state associated with a page if undergoing complete page
+ * invalidation
+ */
+static inline void nfs_fscache_invalidate_page(struct page *page,
+ struct inode *inode,
+ unsigned long offset)
+{
+ struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+ if (PageFsCache(page)) {
+ BUG_ON(!nfsi->fscache);
+
+ dfprintk(FSCACHE,
+ "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
+ nfsi->fscache, page, nfsi);
+
+ wait_on_page_fscache_write(page);
+
+ if (offset == 0) {
+ BUG_ON(!PageLocked(page));
+ if (!PageWriteback(page)) {
+ fscache_uncache_page(nfsi->fscache, page);
+ atomic_inc(&nfs_fscache_uncache_page);
+ ClearPageFsCache(page);
+ }
+ }
+ }
+}
+
+/*
+ * store a newly fetched page in fscache
+ */
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+ struct page *page,
+ int sync)
+{
+ int ret;
+
+ if (PageFsCache(page)) {
+ dfprintk(FSCACHE,
+ "NFS: "
+ "readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
+ NFS_I(inode)->fscache, page, page->index, page->flags,
+ sync);
+
+ ret = fscache_write_page(NFS_I(inode)->fscache, page,
+ GFP_KERNEL);
+ dfprintk(FSCACHE,
+ "NFS: "
+ "readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
+ page, page->index, page->flags, ret);
+
+ if (ret != 0) {
+ fscache_uncache_page(NFS_I(inode)->fscache, page);
+ atomic_inc(&nfs_fscache_uncache_page);
+ ClearPageFsCache(page);
+ nfs_fscache_to_error = ret;
+ } else {
+ atomic_inc(&nfs_fscache_to_pages);
+ }
+ }
+}
+
+/*
+ * retrieve a page from fscache
+ */
+extern void nfs_readpage_from_fscache_complete(struct page *, void *, int);
+
+static inline
+int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode,
+ struct page *page)
+{
+ int ret;
+
+ if (!NFS_I(inode)->fscache)
+ return 1;
+
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
+ NFS_I(inode)->fscache, page, page->index, page->flags, inode);
+
+ ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
+ page,
+ nfs_readpage_from_fscache_complete,
+ ctx,
+ GFP_KERNEL);
+
+ switch (ret) {
+ case 0: /* read BIO submitted (page in fscache) */
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache: BIO submitted\n");
+ atomic_inc(&nfs_fscache_from_pages);
+ return ret;
+
+ case -ENOBUFS: /* inode not in cache */
+ case -ENODATA: /* page not in cache */
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache error %d\n", ret);
+ return 1;
+
+ default:
+ dfprintk(FSCACHE, "NFS: readpage_from_fscache %d\n", ret);
+ nfs_fscache_from_error = ret;
+ }
+ return ret;
+}
+
+/*
+ * retrieve a set of pages from fscache
+ */
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ int ret, npages = *nr_pages;
+
+ if (!NFS_I(inode)->fscache)
+ return 1;
+
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
+ NFS_I(inode)->fscache, *nr_pages, inode);
+
+ ret = fscache_read_or_alloc_pages(NFS_I(inode)->fscache,
+ mapping, pages, nr_pages,
+ nfs_readpage_from_fscache_complete,
+ ctx,
+ mapping_gfp_mask(mapping));
+
+
+ switch (ret) {
+ case 0: /* read BIO submitted (page in fscache) */
+ BUG_ON(!list_empty(pages));
+ BUG_ON(*nr_pages != 0);
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache: BIO submitted\n");
+
+ atomic_add(npages, &nfs_fscache_from_pages);
+ return ret;
+
+ case -ENOBUFS: /* inode not in cache */
+ case -ENODATA: /* page not in cache */
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
+ return 1;
+
+ default:
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache: ret %d\n", ret);
+ nfs_fscache_from_error = ret;
+ }
+
+ return ret;
+}
+
+#else /* CONFIG_NFS_FSCACHE */
+static inline int nfs_fscache_register(void) { return 0; }
+static inline void nfs_fscache_unregister(void) {}
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
+static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }
+
+static inline void nfs_fscache_init_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_enable_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_attr_changed(struct inode *inode) {}
+static inline void nfs_fscache_release_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_zap_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_renew_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_set_fh_cookie(struct inode *inode, struct file *filp) {}
+static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
+static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
+{
+ return 1; /* True: may release page */
+}
+static inline void nfs_fscache_invalidate_page(struct page *page,
+ struct inode *inode,
+ unsigned long offset)
+{
+}
+static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {}
+static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode, struct page *page)
+{
+ return -ENOBUFS;
+}
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ return -ENOBUFS;
+}
+
+#endif /* CONFIG_NFS_FSCACHE */
+#endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bca6cdc..8b7a39c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -88,6 +88,7 @@ void nfs_clear_inode(struct inode *inode)
BUG_ON(atomic_read(&NFS_I(inode)->data_updates) != 0);
nfs_zap_acl_cache(inode);
nfs_access_zap_cache(inode);
+ nfs_fscache_release_fh_cookie(inode);
}

/**
@@ -220,6 +221,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
};
struct inode *inode = ERR_PTR(-ENOENT);
unsigned long hash;
+ int maycache = 1;

if ((fattr->valid & NFS_ATTR_FATTR) == 0)
goto out_no_inode;
@@ -269,6 +271,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
else
inode->i_op = &nfs_mountpoint_inode_operations;
inode->i_fop = NULL;
+ maycache = 0;
}
} else if (S_ISLNK(inode->i_mode))
inode->i_op = &nfs_symlink_inode_operations;
@@ -300,6 +303,8 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
nfsi->access_cache = RB_ROOT;

+ nfs_fscache_init_fh_cookie(inode);
+
unlock_new_inode(inode);
} else
nfs_refresh_inode(inode, fattr);
@@ -573,6 +578,7 @@ int nfs_open(struct inode *inode, struct file *filp)
ctx->mode = filp->f_mode;
nfs_file_set_open_context(filp, ctx);
put_nfs_open_context(ctx);
+ nfs_fscache_set_fh_cookie(inode, filp);
return 0;
}

@@ -698,6 +704,7 @@ static int nfs_invalidate_mapping_nolock(struct inode *inode, struct address_spa
}
spin_unlock(&inode->i_lock);
nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
+ nfs_fscache_renew_fh_cookie(inode);
dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
inode->i_sb->s_id, (long long)NFS_FILEID(inode));
return 0;
@@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
if (data_stable) {
inode->i_size = new_isize;
invalid |= NFS_INO_INVALID_DATA;
+ nfs_fscache_attr_changed(inode);
}
invalid |= NFS_INO_INVALID_ATTR;
} else if (new_isize > cur_isize) {
inode->i_size = new_isize;
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+ nfs_fscache_attr_changed(inode);
}
nfsi->cache_change_attribute = now;
dprintk("NFS: isize change on server for file %s/%ld\n",
@@ -1191,6 +1200,10 @@ static int __init init_nfs_fs(void)
{
int err;

+ err = nfs_fscache_register();
+ if (err < 0)
+ goto out6;
+
err = nfs_fs_proc_init();
if (err)
goto out5;
@@ -1237,6 +1250,8 @@ out3:
out4:
nfs_fs_proc_exit();
out5:
+ nfs_fscache_unregister();
+out6:
return err;
}

@@ -1247,6 +1262,7 @@ static void __exit exit_nfs_fs(void)
nfs_destroy_readpagecache();
nfs_destroy_inodecache();
nfs_destroy_nfspagecache();
+ nfs_fscache_unregister();
#ifdef CONFIG_PROC_FS
rpc_proc_unregister("nfs");
#endif
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 76cf55d..fc75251 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -27,6 +27,11 @@ struct nfs_clone_mount {
rpc_authflavor_t authflavor;
};

+/*
+ * include filesystem caching stuff here
+ */
+#include "fscache.h"
+
/* client.c */
extern struct rpc_program nfs_program;

@@ -149,6 +154,9 @@ extern int nfs4_path_walk(struct nfs_server *server,
const char *path);
#endif

+/* read.c */
+extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, struct page *);
+
/*
* Determine the device name as a string
*/
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 19e0563..ad6f516 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -18,6 +18,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
#include <linux/smp_lock.h>

#include <asm/system.h>
@@ -114,8 +115,8 @@ static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data)
}
}

-static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
- struct page *page)
+int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
+ struct page *page)
{
LIST_HEAD(one_request);
struct nfs_page *new;
@@ -142,6 +143,11 @@ static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,

static void nfs_readpage_release(struct nfs_page *req)
{
+ struct inode *d_inode = req->wb_context->path.dentry->d_inode;
+
+ if (PageUptodate(req->wb_page))
+ nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+
unlock_page(req->wb_page);

dprintk("NFS: read done (%s/%Ld %d@%Ld)\n",
@@ -500,8 +506,15 @@ int nfs_readpage(struct file *file, struct page *page)
ctx = get_nfs_open_context((struct nfs_open_context *)
file->private_data);

+ if (!IS_SYNC(inode)) {
+ error = nfs_readpage_from_fscache(ctx, inode, page);
+ if (error == 0)
+ goto out;
+ }
+
error = nfs_readpage_async(ctx, inode, page);

+out:
put_nfs_open_context(ctx);
return error;
out_unlock:
@@ -578,6 +591,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
} else
desc.ctx = get_nfs_open_context((struct nfs_open_context *)
filp->private_data);
+
+ /* attempt to read as many of the pages as possible from the cache
+ * - this returns -ENOBUFS immediately if the cookie is negative
+ */
+ ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
+ pages, &nr_pages);
+ if (ret == 0)
+ goto read_complete; /* all pages were read */
+
if (rsize < PAGE_CACHE_SIZE)
nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
else
@@ -588,6 +610,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
nfs_pageio_complete(&pgio);
npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
nfs_add_stats(inode, NFSIOS_READPAGES, npages);
+read_complete:
put_nfs_open_context(desc.ctx);
out:
return ret;
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index b2a851c..7a74677 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -456,6 +456,7 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
{ NFS_MOUNT_NOACL, ",noacl", "" },
{ NFS_MOUNT_NORDIRPLUS, ",nordirplus", "" },
{ NFS_MOUNT_UNSHARED, ",nosharecache", ""},
+ { NFS_MOUNT_FSCACHE, ",fsc", "" },
{ 0, NULL, NULL }
};
const struct proc_nfs_info *nfs_infop;
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index b62481d..e522177 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -14,6 +14,7 @@
#include <linux/nfs_fs.h>

#include "callback.h"
+#include "internal.h"

static const int nfs_set_port_min = 0;
static const int nfs_set_port_max = 65535;
@@ -58,6 +59,48 @@ static ctl_table nfs_cb_sysctls[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+#ifdef CONFIG_NFS_FSCACHE
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_from_error",
+ .data = &nfs_fscache_from_error,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_to_error",
+ .data = &nfs_fscache_to_error,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_uncache_page",
+ .data = &nfs_fscache_uncache_page,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_to_pages",
+ .data = &nfs_fscache_to_pages,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_minmax,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_from_pages",
+ .data = &nfs_fscache_from_pages,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
{ .ctl_name = 0 }
};

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ef97e0c..6576738 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -147,6 +147,9 @@ static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int c
return;
nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
i_size_write(inode, end);
+#ifdef FSCACHE_WRITE_SUPPORT
+ nfs_set_fscsize(NFS_SERVER(inode), NFS_I(inode), end);
+#endif
}

/* A writeback failed: mark the page as bad, and invalidate the page cache */
diff --git a/include/linux/nfs4_mount.h b/include/linux/nfs4_mount.h
index a0dcf66..40595d0 100644
--- a/include/linux/nfs4_mount.h
+++ b/include/linux/nfs4_mount.h
@@ -66,6 +66,7 @@ struct nfs4_mount_data {
#define NFS4_MOUNT_NOAC 0x0020 /* 1 */
#define NFS4_MOUNT_STRICTLOCK 0x1000 /* 1 */
#define NFS4_MOUNT_UNSHARED 0x8000 /* 1 */
-#define NFS4_MOUNT_FLAGMASK 0x9033
+#define NFS4_MOUNT_FSCACHE 0x10000 /* 1 */
+#define NFS4_MOUNT_FLAGMASK 0x19033

#endif
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 9ba4aec..d1d6192 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -172,6 +172,9 @@ struct nfs_inode {
int delegation_state;
struct rw_semaphore rwsem;
#endif /* CONFIG_NFS_V4*/
+#ifdef CONFIG_NFS_FSCACHE
+ struct fscache_cookie *fscache;
+#endif
struct inode vfs_inode;
};

@@ -193,6 +196,7 @@ struct nfs_inode {
#define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */
#define NFS_INO_STALE (2) /* possible stale inode */
#define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */
+#define NFS_INO_FSCACHE (4) /* inode can be cached by FS-Cache */

static inline struct nfs_inode *NFS_I(struct inode *inode)
{
@@ -218,6 +222,7 @@ static inline struct nfs_inode *NFS_I(struct inode *inode)

#define NFS_FLAGS(inode) (NFS_I(inode)->flags)
#define NFS_STALE(inode) (test_bit(NFS_INO_STALE, &NFS_FLAGS(inode)))
+#define NFS_FSCACHE(inode) (test_bit(NFS_INO_FSCACHE, &NFS_FLAGS(inode)))

#define NFS_FILEID(inode) (NFS_I(inode)->fileid)

@@ -553,6 +558,7 @@ extern void * nfs_root_data(void);
#define NFSDBG_CALLBACK 0x0100
#define NFSDBG_CLIENT 0x0200
#define NFSDBG_MOUNT 0x0400
+#define NFSDBG_FSCACHE 0x0800
#define NFSDBG_ALL 0xFFFF

#ifdef __KERNEL__
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 0cac49b..abe1043 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -3,6 +3,7 @@

#include <linux/list.h>
#include <linux/backing-dev.h>
+#include <linux/fscache.h>

struct nfs_iostats;

@@ -65,6 +66,10 @@ struct nfs_client {
char cl_ipaddr[16];
unsigned char cl_id_uniquifier;
#endif
+
+#ifdef CONFIG_NFS_FSCACHE
+ struct fscache_cookie *fscache; /* client index cache cookie */
+#endif
};

/*
diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
index a3ade89..36971cd 100644
--- a/include/linux/nfs_mount.h
+++ b/include/linux/nfs_mount.h
@@ -63,6 +63,7 @@ struct nfs_mount_data {
#define NFS_MOUNT_SECFLAVOUR 0x2000 /* 5 */
#define NFS_MOUNT_NORDIRPLUS 0x4000 /* 5 */
#define NFS_MOUNT_UNSHARED 0x8000 /* 5 */
-#define NFS_MOUNT_FLAGMASK 0xFFFF
+#define NFS_MOUNT_FSCACHE 0x10000 /* 5 */
+#define NFS_MOUNT_FLAGMASK 0x1FFFF

#endif

2007-08-09 16:32:56

by David Howells

[permalink] [raw]
Subject: [PATCH 08/14] CacheFiles: Export things for CacheFiles [try #2]

Export a number of functions for CacheFiles's use.

Signed-Off-By: David Howells <[email protected]>
---

fs/super.c | 2 ++
kernel/auditsc.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index fc8ebed..c0d99dd 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -270,6 +270,8 @@ int fsync_super(struct super_block *sb)
return sync_blockdev(sb->s_bdev);
}

+EXPORT_SYMBOL_GPL(fsync_super);
+
/**
* generic_shutdown_super - common helper for ->kill_sb()
* @sb: superblock to kill
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index a777d37..1c068ec 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1526,6 +1526,8 @@ add_names:
}
}

+EXPORT_SYMBOL_GPL(__audit_inode_child);
+
/**
* auditsc_get_stamp - get local copies of audit_context values
* @ctx: audit_context for the task

2007-08-09 16:33:59

by David Howells

[permalink] [raw]
Subject: [PATCH 07/14] CacheFiles: Permit the page lock state to be monitored [try #2]

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/pagemap.h | 5 +++++
mm/filemap.c | 19 +++++++++++++++++++
2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d1049b6..452fdcf 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -220,6 +220,11 @@ static inline void wait_on_page_fscache_write(struct page *page)
extern void end_page_fscache_write(struct page *page);

/*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
* This assumes that two userspace pages are always sufficient. That's
diff --git a/mm/filemap.c b/mm/filemap.c
index 5e419a2..c60c24e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -518,6 +518,25 @@ void fastcall wait_on_page_bit(struct page *page, int bit_nr)
EXPORT_SYMBOL(wait_on_page_bit);

/**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+ wait_queue_head_t *q = page_waitqueue(page);
+ unsigned long flags;
+
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue(q, waiter);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
* unlock_page - unlock a locked page
* @page: the page
*

2007-08-09 16:35:17

by David Howells

[permalink] [raw]
Subject: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

Permit an inode's security ID to be obtained by the CacheFiles module. This is
then used as the SID with which files and directories will be created in the
cache.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 13 +++++++++++++
security/dummy.c | 6 ++++++
security/selinux/hooks.c | 8 ++++++++
3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 422015d..21cadea 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1252,6 +1252,7 @@ struct security_operations {
int (*inode_getsecurity)(const struct inode *inode, const char *name, void *buffer, size_t size, int err);
int (*inode_setsecurity)(struct inode *inode, const char *name, const void *value, size_t size, int flags);
int (*inode_listsecurity)(struct inode *inode, char *buffer, size_t buffer_size);
+ u32 (*inode_get_secid)(struct inode *inode);

int (*file_permission) (struct file * file, int mask);
int (*file_alloc_security) (struct file * file);
@@ -1814,6 +1815,13 @@ static inline int security_inode_listsecurity(struct inode *inode, char *buffer,
return security_ops->inode_listsecurity(inode, buffer, buffer_size);
}

+static inline u32 security_inode_get_secid(struct inode *inode)
+{
+ if (unlikely(IS_PRIVATE(inode)))
+ return 0;
+ return security_ops->inode_get_secid(inode);
+}
+
static inline int security_file_permission (struct file *file, int mask)
{
return security_ops->file_permission (file, mask);
@@ -2514,6 +2522,11 @@ static inline int security_inode_listsecurity(struct inode *inode, char *buffer,
return 0;
}

+static inline u32 security_inode_get_secid(struct inode *inode)
+{
+ return 0;
+}
+
static inline int security_file_permission (struct file *file, int mask)
{
return 0;
diff --git a/security/dummy.c b/security/dummy.c
index 77ec75d..6a7a317 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -392,6 +392,11 @@ static int dummy_inode_listsecurity(struct inode *inode, char *buffer, size_t bu
return 0;
}

+static u32 dummy_inode_get_secid(struct inode *inode)
+{
+ return 0;
+}
+
static const char *dummy_inode_xattr_getsuffix(void)
{
return NULL;
@@ -1042,6 +1047,7 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, inode_getsecurity);
set_to_dummy_if_null(ops, inode_setsecurity);
set_to_dummy_if_null(ops, inode_listsecurity);
+ set_to_dummy_if_null(ops, inode_get_secid);
set_to_dummy_if_null(ops, file_permission);
set_to_dummy_if_null(ops, file_alloc_security);
set_to_dummy_if_null(ops, file_free_security);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 66af819..c05d662 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2464,6 +2464,13 @@ static int selinux_inode_listsecurity(struct inode *inode, char *buffer, size_t
return len;
}

+static u32 selinux_inode_get_secid(struct inode *inode)
+{
+ struct inode_security_struct *isec = inode->i_security;
+
+ return isec->sid;
+}
+
/* file security operations */

static int selinux_file_permission(struct file *file, int mask)
@@ -4822,6 +4829,7 @@ static struct security_operations selinux_ops = {
.inode_getsecurity = selinux_inode_getsecurity,
.inode_setsecurity = selinux_inode_setsecurity,
.inode_listsecurity = selinux_inode_listsecurity,
+ .inode_get_secid = selinux_inode_get_secid,

.file_permission = selinux_file_permission,
.file_alloc_security = selinux_file_alloc_security,

2007-08-09 16:36:26

by David Howells

[permalink] [raw]
Subject: [PATCH 12/14] CacheFiles: Get the SID under which the CacheFiles module should operate [try #2]

Get the SID under which the CacheFiles module should operate so that the
SELinux security system can control the accesses it makes.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 20 ++++++++++++++++++++
security/dummy.c | 7 +++++++
security/selinux/hooks.c | 7 +++++++
3 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 21cadea..9cb417e 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1164,6 +1164,14 @@ struct request_sock;
* owning security ID, and return the security ID as which the process was
* previously acting.
*
+ * @cachefiles_get_secid:
+ * Determine the security ID for the CacheFiles module to use when
+ * accessing the filesystem containing the cache.
+ * @secid contains the security ID under which cachefiles daemon is
+ * running.
+ * @modsecid contains the pointer to where the security ID for the module
+ * is to be stored.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1352,6 +1360,7 @@ struct security_operations {
u32 (*set_fscreate_secid)(u32 secid);
u32 (*act_as_secid)(u32 secid);
u32 (*act_as_self)(void);
+ int (*cachefiles_get_secid)(u32 secid, u32 *modsecid);

#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2176,6 +2185,11 @@ static inline u32 security_act_as_self(void)
return security_ops->act_as_self();
}

+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ return security_ops->cachefiles_get_secid(secid, modsecid);
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2883,6 +2897,12 @@ static inline u32 security_act_as_self(void)
return 0;
}

+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ *modsecid = 0;
+ return 0;
+}
+
#endif /* CONFIG_SECURITY */

#ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index 6a7a317..2c1fd16 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -955,6 +955,12 @@ static u32 dummy_act_as_self(void)
return 0;
}

+static int dummy_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ *modsecid = 0;
+ return 0;
+}
+
#ifdef CONFIG_KEYS
static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
unsigned long flags)
@@ -1114,6 +1120,7 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, set_fscreate_secid);
set_to_dummy_if_null(ops, act_as_secid);
set_to_dummy_if_null(ops, act_as_self);
+ set_to_dummy_if_null(ops, cachefiles_get_secid);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c05d662..725f657 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4718,6 +4718,12 @@ static u32 selinux_act_as_self(void)
return oldactor_sid;
}

+static int selinux_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ return security_transition_sid(secid, SECINITSID_KERNEL,
+ SECCLASS_PROCESS, modsecid);
+}
+
#ifdef CONFIG_KEYS

static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4905,6 +4911,7 @@ static struct security_operations selinux_ops = {
.set_fscreate_secid = selinux_set_fscreate_secid,
.act_as_secid = selinux_act_as_secid,
.act_as_self = selinux_act_as_self,
+ .cachefiles_get_secid = selinux_cachefiles_get_secid,

.unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send = selinux_socket_unix_may_send,

2007-08-09 16:37:36

by David Howells

[permalink] [raw]
Subject: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]

Make it possible for a process's file creation SID to be temporarily overridden
by CacheFiles so that files created in the cache have the right label attached.

Without this facility, files created in the cache will be given the current
file creation SID of whatever process happens to have invoked CacheFiles
indirectly by means of opening a netfs file at the time the cache file is
created.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 35 +++++++++++++++++++++++++++++++++++
security/dummy.c | 12 ++++++++++++
security/selinux/hooks.c | 18 ++++++++++++++++++
3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index c11dc8a..92d3da0 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1147,6 +1147,13 @@ struct request_sock;
* @secdata contains the security context.
* @seclen contains the length of the security context.
*
+ * @get_fscreate_secid:
+ * Get the current FS security ID.
+ *
+ * @set_fscreate_secid:
+ * Set the current FS security ID.
+ * @secid contains the security ID to set.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1330,6 +1337,8 @@ struct security_operations {
int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size);
int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen);
void (*release_secctx)(char *secdata, u32 seclen);
+ u32 (*get_fscreate_secid)(void);
+ u32 (*set_fscreate_secid)(u32 secid);

#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2127,6 +2136,16 @@ static inline void security_release_secctx(char *secdata, u32 seclen)
return security_ops->release_secctx(secdata, seclen);
}

+static inline u32 security_get_fscreate_secid(void)
+{
+ return security_ops->get_fscreate_secid();
+}
+
+static inline u32 security_set_fscreate_secid(u32 secid)
+{
+ return security_ops->set_fscreate_secid(secid);
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2795,6 +2814,11 @@ static inline void securityfs_remove(struct dentry *dentry)
{
}

+static inline int security_to_secctx_secid(char *secdata, u32 seclen, u32 *secid)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
{
return -EOPNOTSUPP;
@@ -2803,6 +2827,17 @@ static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *secle
static inline void security_release_secctx(char *secdata, u32 seclen)
{
}
+
+static inline u32 security_get_fscreate_secid(void)
+{
+ return 0;
+}
+
+static inline u32 security_set_fscreate_secid(u32 secid)
+{
+ return 0;
+}
+
#endif /* CONFIG_SECURITY */

#ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index 19d813d..d463e6f 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -930,6 +930,16 @@ static void dummy_release_secctx(char *secdata, u32 seclen)
{
}

+static u32 dummy_get_fscreate_secid(void)
+{
+ return 0;
+}
+
+static u32 dummy_set_fscreate_secid(u32 secid)
+{
+ return 0;
+}
+
#ifdef CONFIG_KEYS
static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
unsigned long flags)
@@ -1084,6 +1094,8 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, setprocattr);
set_to_dummy_if_null(ops, secid_to_secctx);
set_to_dummy_if_null(ops, release_secctx);
+ set_to_dummy_if_null(ops, get_fscreate_secid);
+ set_to_dummy_if_null(ops, set_fscreate_secid);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6237933..c5905b0 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4661,6 +4661,22 @@ static void selinux_release_secctx(char *secdata, u32 seclen)
kfree(secdata);
}

+static u32 selinux_get_fscreate_secid(void)
+{
+ struct task_security_struct *tsec = current->security;
+
+ return tsec->create_sid;
+}
+
+static u32 selinux_set_fscreate_secid(u32 secid)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldsid = tsec->create_sid;
+
+ tsec->create_sid = secid;
+ return oldsid;
+}
+
#ifdef CONFIG_KEYS

static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4843,6 +4859,8 @@ static struct security_operations selinux_ops = {

.secid_to_secctx = selinux_secid_to_secctx,
.release_secctx = selinux_release_secctx,
+ .get_fscreate_secid = selinux_get_fscreate_secid,
+ .set_fscreate_secid = selinux_set_fscreate_secid,

.unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send = selinux_socket_unix_may_send,

2007-08-09 16:38:27

by David Howells

[permalink] [raw]
Subject: [PATCH 06/14] CacheFiles: Add a hook to write a single page of data to an inode [try #2]

Add an address space operation to write one single page of data to an inode at
a page-aligned location (thus permitting the implementation to be highly
optimised).

This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.

Supply a generic implementation for this that uses the prepare_write() and
commit_write() address_space operations to bound a copy directly into the page
cache.

Hook the Ext2 and Ext3 operations to the generic implementation.

Signed-Off-By: David Howells <[email protected]>
---

fs/ext2/inode.c | 2 +
fs/ext3/inode.c | 3 ++
include/linux/fs.h | 7 ++++
mm/filemap.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 107 insertions(+), 0 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 0079b2c..b3e4b50 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -695,6 +695,7 @@ const struct address_space_operations ext2_aops = {
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};

const struct address_space_operations ext2_aops_xip = {
@@ -713,6 +714,7 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};

/*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index de4e316..93809eb 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1713,6 +1713,7 @@ static const struct address_space_operations ext3_ordered_aops = {
.releasepage = ext3_releasepage,
.direct_IO = ext3_direct_IO,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};

static const struct address_space_operations ext3_writeback_aops = {
@@ -1727,6 +1728,7 @@ static const struct address_space_operations ext3_writeback_aops = {
.releasepage = ext3_releasepage,
.direct_IO = ext3_direct_IO,
.migratepage = buffer_migrate_page,
+ .write_one_page = generic_file_buffered_write_one_page,
};

static const struct address_space_operations ext3_journalled_aops = {
@@ -1740,6 +1742,7 @@ static const struct address_space_operations ext3_journalled_aops = {
.bmap = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage = ext3_releasepage,
+ .write_one_page = generic_file_buffered_write_one_page,
};

void ext3_set_aops(struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6bf1395..1b1f288 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -433,6 +433,11 @@ struct address_space_operations {
int (*migratepage) (struct address_space *,
struct page *, struct page *);
int (*launder_page) (struct page *);
+ /* write the contents of the source page over the page at the specified
+ * index in the target address space (the source page does not need to
+ * be related to the target address space) */
+ int (*write_one_page)(struct address_space *, pgoff_t, struct page *);
+
};

struct backing_dev_info;
@@ -1669,6 +1674,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *,
unsigned long *, loff_t, loff_t *, size_t, size_t);
extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_page(struct address_space *,
+ pgoff_t, struct page *);
extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos);
extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
extern void do_generic_mapping_read(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 7b96487..5e419a2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2032,6 +2032,101 @@ zero_length_segment:
}
EXPORT_SYMBOL(generic_file_buffered_write);

+/**
+ * generic_file_buffered_write_one_page - Write a single page of data to an
+ * inode
+ * @mapping - The address space of the target inode
+ * @index - The target page in the target inode to fill
+ * @source - The data to write into the target page
+ *
+ * Write the data from the source page to the page in the nominated address
+ * space at the @index specified. Note that the file will not be extended if
+ * the page crosses the EOF marker, in which case only the first part of the
+ * page will be written.
+ *
+ * The @source page does not need to have any association with the file or the
+ * target page offset.
+ */
+int generic_file_buffered_write_one_page(struct address_space *mapping,
+ pgoff_t index,
+ struct page *source)
+{
+ const struct address_space_operations *a_ops = mapping->a_ops;
+ struct pagevec lru_pvec;
+ struct page *page, *cached_page = NULL;
+ unsigned psize;
+ loff_t isize;
+ long status = 0;
+
+ pagevec_init(&lru_pvec, 0);
+
+ page = __grab_cache_page(mapping, index, &cached_page, &lru_pvec);
+ if (!page) {
+ BUG_ON(cached_page);
+ return -ENOMEM;
+ }
+
+ psize = PAGE_CACHE_SIZE;
+ isize = i_size_read(mapping->host);
+ if ((isize >> PAGE_SHIFT) == index)
+ psize = isize & (PAGE_SIZE - 1);
+
+ status = a_ops->prepare_write(NULL, page, 0, psize);
+ if (unlikely(status)) {
+ isize = i_size_read(mapping->host);
+
+ if (status != AOP_TRUNCATED_PAGE)
+ unlock_page(page);
+ page_cache_release(page);
+ if (status == AOP_TRUNCATED_PAGE)
+ goto sync;
+
+ /* prepare_write() may have instantiated a few blocks outside
+ * i_size. Trim these off again.
+ */
+ if ((1ULL << (index + 1)) > isize)
+ vmtruncate(mapping->host, isize);
+ goto sync;
+ }
+
+ copy_highpage(page, source);
+ flush_dcache_page(page);
+
+ status = a_ops->commit_write(NULL, page, 0, psize);
+ if (status == AOP_TRUNCATED_PAGE) {
+ page_cache_release(page);
+ goto sync;
+ }
+
+ if (status > 0)
+ status = 0;
+
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+ if (status < 0)
+ return status;
+
+ balance_dirty_pages_ratelimited(mapping);
+ cond_resched();
+
+sync:
+ if (cached_page)
+ page_cache_release(cached_page);
+
+ /* the caller must handle O_SYNC themselves, but we handle S_SYNC and
+ * MS_SYNCHRONOUS here */
+ if (unlikely(IS_SYNC(mapping->host)) && !a_ops->writepage)
+ status = generic_osync_inode(mapping->host, mapping,
+ OSYNC_METADATA | OSYNC_DATA);
+
+ /* the caller must handle O_DIRECT for themselves */
+
+ pagevec_lru_add(&lru_pvec);
+ return status;
+}
+EXPORT_SYMBOL(generic_file_buffered_write_one_page);
+
static ssize_t
__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)

2007-08-09 16:39:30

by David Howells

[permalink] [raw]
Subject: [PATCH 10/14] CacheFiles: Add an act-as SID override in task_security_struct [try #2]

Add an act-as SID to task_security_struct that is equivalent to fsuid/fsgid in
task_struct. This permits a task to perform operations as if it is the
overriding SID, without changing its own SID as that might be needed to control
access to the process by ptrace, signals, /proc, etc.

This is useful for CacheFiles in that it allows CacheFiles to access the cache
files and directories using the cache's security context rather than the
security context of the process on whose behalf it is working, and in the
context of which it is running.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 32 +++++++
security/dummy.c | 12 +++
security/selinux/exports.c | 2
security/selinux/hooks.c | 160 +++++++++++++++++++++++--------------
security/selinux/include/objsec.h | 1
security/selinux/selinuxfs.c | 2
security/selinux/xfrm.c | 6 +
7 files changed, 148 insertions(+), 67 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 92d3da0..422015d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1154,6 +1154,16 @@ struct request_sock;
* Set the current FS security ID.
* @secid contains the security ID to set.
*
+ * @act_as_secid:
+ * Set the security ID as which to act, returning the security ID as which
+ * the process was previously acting.
+ * @secid contains the security ID to act as.
+ *
+ * @act_as_self:
+ * Reset the security ID as which to act to be the same as the process's
+ * owning security ID, and return the security ID as which the process was
+ * previously acting.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1339,6 +1349,8 @@ struct security_operations {
void (*release_secctx)(char *secdata, u32 seclen);
u32 (*get_fscreate_secid)(void);
u32 (*set_fscreate_secid)(u32 secid);
+ u32 (*act_as_secid)(u32 secid);
+ u32 (*act_as_self)(void);

#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2146,6 +2158,16 @@ static inline u32 security_set_fscreate_secid(u32 secid)
return security_ops->set_fscreate_secid(secid);
}

+static inline u32 security_act_as_secid(u32 secid)
+{
+ return security_ops->act_as_secid(secid);
+}
+
+static inline u32 security_act_as_self(void)
+{
+ return security_ops->act_as_self();
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2838,6 +2860,16 @@ static inline u32 security_set_fscreate_secid(u32 secid)
return 0;
}

+static inline u32 security_act_as_secid(u32 secid)
+{
+ return 0;
+}
+
+static inline u32 security_act_as_self(void)
+{
+ return 0;
+}
+
#endif /* CONFIG_SECURITY */

#ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index d463e6f..77ec75d 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -940,6 +940,16 @@ static u32 dummy_set_fscreate_secid(u32 secid)
return 0;
}

+static u32 dummy_act_as_secid(u32 secid)
+{
+ return 0;
+}
+
+static u32 dummy_act_as_self(void)
+{
+ return 0;
+}
+
#ifdef CONFIG_KEYS
static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
unsigned long flags)
@@ -1096,6 +1106,8 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, release_secctx);
set_to_dummy_if_null(ops, get_fscreate_secid);
set_to_dummy_if_null(ops, set_fscreate_secid);
+ set_to_dummy_if_null(ops, act_as_secid);
+ set_to_dummy_if_null(ops, act_as_self);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/exports.c b/security/selinux/exports.c
index b6f9694..b559699 100644
--- a/security/selinux/exports.c
+++ b/security/selinux/exports.c
@@ -79,7 +79,7 @@ int selinux_relabel_packet_permission(u32 sid)
if (selinux_enabled) {
struct task_security_struct *tsec = current->security;

- return avc_has_perm(tsec->sid, sid, SECCLASS_PACKET,
+ return avc_has_perm(tsec->actor_sid, sid, SECCLASS_PACKET,
PACKET__RELABELTO, NULL);
}
return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c5905b0..66af819 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -162,7 +162,8 @@ static int task_alloc_security(struct task_struct *task)
return -ENOMEM;

tsec->task = task;
- tsec->osid = tsec->sid = tsec->ptrace_sid = SECINITSID_UNLABELED;
+ tsec->osid = tsec->actor_sid = tsec->sid = tsec->ptrace_sid =
+ SECINITSID_UNLABELED;
task->security = tsec;

return 0;
@@ -189,7 +190,7 @@ static int inode_alloc_security(struct inode *inode)
isec->inode = inode;
isec->sid = SECINITSID_UNLABELED;
isec->sclass = SECCLASS_FILE;
- isec->task_sid = tsec->sid;
+ isec->task_sid = tsec->actor_sid;
inode->i_security = isec;

return 0;
@@ -219,8 +220,8 @@ static int file_alloc_security(struct file *file)
return -ENOMEM;

fsec->file = file;
- fsec->sid = tsec->sid;
- fsec->fown_sid = tsec->sid;
+ fsec->sid = tsec->actor_sid;
+ fsec->fown_sid = tsec->actor_sid;
file->f_security = fsec;

return 0;
@@ -337,12 +338,12 @@ static int may_context_mount_sb_relabel(u32 sid,
{
int rc;

- rc = avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
+ rc = avc_has_perm(tsec->actor_sid, sbsec->sid, SECCLASS_FILESYSTEM,
FILESYSTEM__RELABELFROM, NULL);
if (rc)
return rc;

- rc = avc_has_perm(tsec->sid, sid, SECCLASS_FILESYSTEM,
+ rc = avc_has_perm(tsec->actor_sid, sid, SECCLASS_FILESYSTEM,
FILESYSTEM__RELABELTO, NULL);
return rc;
}
@@ -352,7 +353,7 @@ static int may_context_mount_inode_relabel(u32 sid,
struct task_security_struct *tsec)
{
int rc;
- rc = avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
+ rc = avc_has_perm(tsec->actor_sid, sbsec->sid, SECCLASS_FILESYSTEM,
FILESYSTEM__RELABELFROM, NULL);
if (rc)
return rc;
@@ -1031,7 +1032,7 @@ static int task_has_perm(struct task_struct *tsk1,

tsec1 = tsk1->security;
tsec2 = tsk2->security;
- return avc_has_perm(tsec1->sid, tsec2->sid,
+ return avc_has_perm(tsec1->actor_sid, tsec2->sid,
SECCLASS_PROCESS, perms, NULL);
}

@@ -1048,7 +1049,7 @@ static int task_has_capability(struct task_struct *tsk,
ad.tsk = tsk;
ad.u.cap = cap;

- return avc_has_perm(tsec->sid, tsec->sid,
+ return avc_has_perm(tsec->actor_sid, tsec->actor_sid,
SECCLASS_CAPABILITY, CAP_TO_MASK(cap), &ad);
}

@@ -1060,7 +1061,7 @@ static int task_has_system(struct task_struct *tsk,

tsec = tsk->security;

- return avc_has_perm(tsec->sid, SECINITSID_KERNEL,
+ return avc_has_perm(tsec->actor_sid, SECINITSID_KERNEL,
SECCLASS_SYSTEM, perms, NULL);
}

@@ -1088,7 +1089,8 @@ static int inode_has_perm(struct task_struct *tsk,
ad.u.fs.inode = inode;
}

- return avc_has_perm(tsec->sid, isec->sid, isec->sclass, perms, adp);
+ return avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, perms,
+ adp);
}

/* Same as inode_has_perm, but pass explicit audit data containing
@@ -1131,8 +1133,8 @@ static int file_has_perm(struct task_struct *tsk,
ad.u.fs.mnt = mnt;
ad.u.fs.dentry = dentry;

- if (tsec->sid != fsec->sid) {
- rc = avc_has_perm(tsec->sid, fsec->sid,
+ if (tsec->actor_sid != fsec->sid) {
+ rc = avc_has_perm(tsec->actor_sid, fsec->sid,
SECCLASS_FD,
FD__USE,
&ad);
@@ -1166,7 +1168,7 @@ static int may_create(struct inode *dir,
AVC_AUDIT_DATA_INIT(&ad, FS);
ad.u.fs.dentry = dentry;

- rc = avc_has_perm(tsec->sid, dsec->sid, SECCLASS_DIR,
+ rc = avc_has_perm(tsec->actor_sid, dsec->sid, SECCLASS_DIR,
DIR__ADD_NAME | DIR__SEARCH,
&ad);
if (rc)
@@ -1175,13 +1177,13 @@ static int may_create(struct inode *dir,
if (tsec->create_sid && sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
newsid = tsec->create_sid;
} else {
- rc = security_transition_sid(tsec->sid, dsec->sid, tclass,
- &newsid);
+ rc = security_transition_sid(tsec->actor_sid, dsec->sid,
+ tclass, &newsid);
if (rc)
return rc;
}

- rc = avc_has_perm(tsec->sid, newsid, tclass, FILE__CREATE, &ad);
+ rc = avc_has_perm(tsec->actor_sid, newsid, tclass, FILE__CREATE, &ad);
if (rc)
return rc;

@@ -1198,7 +1200,8 @@ static int may_create_key(u32 ksid,

tsec = ctx->security;

- return avc_has_perm(tsec->sid, ksid, SECCLASS_KEY, KEY__CREATE, NULL);
+ return avc_has_perm(tsec->actor_sid, ksid, SECCLASS_KEY, KEY__CREATE,
+ NULL);
}

#define MAY_LINK 0
@@ -1226,7 +1229,7 @@ static int may_link(struct inode *dir,

av = DIR__SEARCH;
av |= (kind ? DIR__REMOVE_NAME : DIR__ADD_NAME);
- rc = avc_has_perm(tsec->sid, dsec->sid, SECCLASS_DIR, av, &ad);
+ rc = avc_has_perm(tsec->actor_sid, dsec->sid, SECCLASS_DIR, av, &ad);
if (rc)
return rc;

@@ -1245,7 +1248,7 @@ static int may_link(struct inode *dir,
return 0;
}

- rc = avc_has_perm(tsec->sid, isec->sid, isec->sclass, av, &ad);
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, av, &ad);
return rc;
}

@@ -1270,16 +1273,16 @@ static inline int may_rename(struct inode *old_dir,
AVC_AUDIT_DATA_INIT(&ad, FS);

ad.u.fs.dentry = old_dentry;
- rc = avc_has_perm(tsec->sid, old_dsec->sid, SECCLASS_DIR,
+ rc = avc_has_perm(tsec->actor_sid, old_dsec->sid, SECCLASS_DIR,
DIR__REMOVE_NAME | DIR__SEARCH, &ad);
if (rc)
return rc;
- rc = avc_has_perm(tsec->sid, old_isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, old_isec->sid,
old_isec->sclass, FILE__RENAME, &ad);
if (rc)
return rc;
if (old_is_dir && new_dir != old_dir) {
- rc = avc_has_perm(tsec->sid, old_isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, old_isec->sid,
old_isec->sclass, DIR__REPARENT, &ad);
if (rc)
return rc;
@@ -1289,15 +1292,17 @@ static inline int may_rename(struct inode *old_dir,
av = DIR__ADD_NAME | DIR__SEARCH;
if (new_dentry->d_inode)
av |= DIR__REMOVE_NAME;
- rc = avc_has_perm(tsec->sid, new_dsec->sid, SECCLASS_DIR, av, &ad);
+ rc = avc_has_perm(tsec->actor_sid, new_dsec->sid, SECCLASS_DIR, av,
+ &ad);
if (rc)
return rc;
if (new_dentry->d_inode) {
new_isec = new_dentry->d_inode->i_security;
new_is_dir = S_ISDIR(new_dentry->d_inode->i_mode);
- rc = avc_has_perm(tsec->sid, new_isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, new_isec->sid,
new_isec->sclass,
- (new_is_dir ? DIR__RMDIR : FILE__UNLINK), &ad);
+ (new_is_dir ? DIR__RMDIR : FILE__UNLINK),
+ &ad);
if (rc)
return rc;
}
@@ -1316,7 +1321,7 @@ static int superblock_has_perm(struct task_struct *tsk,

tsec = tsk->security;
sbsec = sb->s_security;
- return avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
+ return avc_has_perm(tsec->actor_sid, sbsec->sid, SECCLASS_FILESYSTEM,
perms, ad);
}

@@ -1380,7 +1385,7 @@ static int selinux_ptrace(struct task_struct *parent, struct task_struct *child)
rc = task_has_perm(parent, child, PROCESS__PTRACE);
/* Save the SID of the tracing process for later use in apply_creds. */
if (!(child->ptrace & PT_PTRACED) && !rc)
- csec->ptrace_sid = psec->sid;
+ csec->ptrace_sid = psec->actor_sid;
return rc;
}

@@ -1490,7 +1495,7 @@ static int selinux_sysctl(ctl_table *table, int op)
/* The op values are "defined" in sysctl.c, thereby creating
* a bad coupling between this module and sysctl.c */
if(op == 001) {
- error = avc_has_perm(tsec->sid, tsid,
+ error = avc_has_perm(tsec->actor_sid, tsid,
SECCLASS_DIR, DIR__SEARCH, NULL);
} else {
av = 0;
@@ -1499,7 +1504,7 @@ static int selinux_sysctl(ctl_table *table, int op)
if (op & 002)
av |= FILE__WRITE;
if (av)
- error = avc_has_perm(tsec->sid, tsid,
+ error = avc_has_perm(tsec->actor_sid, tsid,
SECCLASS_FILE, av, NULL);
}

@@ -1591,7 +1596,7 @@ static int selinux_vm_enough_memory(long pages)

rc = secondary_ops->capable(current, CAP_SYS_ADMIN);
if (rc == 0)
- rc = avc_has_perm_noaudit(tsec->sid, tsec->sid,
+ rc = avc_has_perm_noaudit(tsec->actor_sid, tsec->actor_sid,
SECCLASS_CAPABILITY,
CAP_TO_MASK(CAP_SYS_ADMIN),
0,
@@ -1644,7 +1649,7 @@ static int selinux_bprm_set_security(struct linux_binprm *bprm)
isec = inode->i_security;

/* Default to the current task SID. */
- bsec->sid = tsec->sid;
+ bsec->sid = tsec->actor_sid;

/* Reset fs, key, and sock SIDs on execve. */
tsec->create_sid = 0;
@@ -1657,7 +1662,7 @@ static int selinux_bprm_set_security(struct linux_binprm *bprm)
tsec->exec_sid = 0;
} else {
/* Check for a default transition on this program. */
- rc = security_transition_sid(tsec->sid, isec->sid,
+ rc = security_transition_sid(tsec->actor_sid, isec->sid,
SECCLASS_PROCESS, &newsid);
if (rc)
return rc;
@@ -1668,16 +1673,16 @@ static int selinux_bprm_set_security(struct linux_binprm *bprm)
ad.u.fs.dentry = bprm->file->f_path.dentry;

if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
- newsid = tsec->sid;
+ newsid = tsec->actor_sid;

- if (tsec->sid == newsid) {
- rc = avc_has_perm(tsec->sid, isec->sid,
+ if (tsec->actor_sid == newsid) {
+ rc = avc_has_perm(tsec->actor_sid, isec->sid,
SECCLASS_FILE, FILE__EXECUTE_NO_TRANS, &ad);
if (rc)
return rc;
} else {
/* Check permissions for the transition. */
- rc = avc_has_perm(tsec->sid, newsid,
+ rc = avc_has_perm(tsec->actor_sid, newsid,
SECCLASS_PROCESS, PROCESS__TRANSITION, &ad);
if (rc)
return rc;
@@ -1859,6 +1864,8 @@ static void selinux_bprm_apply_creds(struct linux_binprm *bprm, int unsafe)
return;
}
}
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = sid;
tsec->sid = sid;
}
}
@@ -2133,7 +2140,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
if (tsec->create_sid && sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
newsid = tsec->create_sid;
} else {
- rc = security_transition_sid(tsec->sid, dsec->sid,
+ rc = security_transition_sid(tsec->actor_sid, dsec->sid,
inode_mode_to_security_class(inode->i_mode),
&newsid);
if (rc) {
@@ -2324,7 +2331,7 @@ static int selinux_inode_setxattr(struct dentry *dentry, char *name, void *value
AVC_AUDIT_DATA_INIT(&ad,FS);
ad.u.fs.dentry = dentry;

- rc = avc_has_perm(tsec->sid, isec->sid, isec->sclass,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass,
FILE__RELABELFROM, &ad);
if (rc)
return rc;
@@ -2333,12 +2340,12 @@ static int selinux_inode_setxattr(struct dentry *dentry, char *name, void *value
if (rc)
return rc;

- rc = avc_has_perm(tsec->sid, newsid, isec->sclass,
+ rc = avc_has_perm(tsec->actor_sid, newsid, isec->sclass,
FILE__RELABELTO, &ad);
if (rc)
return rc;

- rc = security_validate_transition(isec->sid, newsid, tsec->sid,
+ rc = security_validate_transition(isec->sid, newsid, tsec->actor_sid,
isec->sclass);
if (rc)
return rc;
@@ -2746,7 +2753,7 @@ static int selinux_task_alloc_security(struct task_struct *tsk)
tsec2 = tsk->security;

tsec2->osid = tsec1->osid;
- tsec2->sid = tsec1->sid;
+ tsec2->actor_sid = tsec2->sid = tsec1->sid;

/* Retain the exec, fs, key, and sock SIDs across fork */
tsec2->exec_sid = tsec1->exec_sid;
@@ -2925,6 +2932,8 @@ static void selinux_task_reparent_to_init(struct task_struct *p)

tsec = p->security;
tsec->osid = tsec->sid;
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = SECINITSID_KERNEL;
tsec->sid = SECINITSID_KERNEL;
return;
}
@@ -3172,7 +3181,8 @@ static int socket_has_perm(struct task_struct *task, struct socket *sock,

AVC_AUDIT_DATA_INIT(&ad,NET);
ad.u.net.sk = sock->sk;
- err = avc_has_perm(tsec->sid, isec->sid, isec->sclass, perms, &ad);
+ err = avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, perms,
+ &ad);

out:
return err;
@@ -3189,8 +3199,8 @@ static int selinux_socket_create(int family, int type,
goto out;

tsec = current->security;
- newsid = tsec->sockcreate_sid ? : tsec->sid;
- err = avc_has_perm(tsec->sid, newsid,
+ newsid = tsec->sockcreate_sid ? : tsec->actor_sid;
+ err = avc_has_perm(tsec->actor_sid, newsid,
socket_type_to_security_class(family, type,
protocol), SOCKET__CREATE, NULL);

@@ -3210,7 +3220,7 @@ static int selinux_socket_post_create(struct socket *sock, int family,
isec = SOCK_INODE(sock)->i_security;

tsec = current->security;
- newsid = tsec->sockcreate_sid ? : tsec->sid;
+ newsid = tsec->sockcreate_sid ? : tsec->actor_sid;
isec->sclass = socket_type_to_security_class(family, type, protocol);
isec->sid = kern ? SECINITSID_KERNEL : newsid;
isec->initialized = 1;
@@ -4033,7 +4043,7 @@ static int ipc_alloc_security(struct task_struct *task,

isec->sclass = sclass;
isec->ipc_perm = perm;
- isec->sid = tsec->sid;
+ isec->sid = tsec->actor_sid;
perm->security = isec;

return 0;
@@ -4082,7 +4092,8 @@ static int ipc_has_perm(struct kern_ipc_perm *ipc_perms,
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = ipc_perms->key;

- return avc_has_perm(tsec->sid, isec->sid, isec->sclass, perms, &ad);
+ return avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, perms,
+ &ad);
}

static int selinux_msg_msg_alloc_security(struct msg_msg *msg)
@@ -4113,7 +4124,7 @@ static int selinux_msg_queue_alloc_security(struct msg_queue *msq)
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = msq->q_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_MSGQ,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_MSGQ,
MSGQ__CREATE, &ad);
if (rc) {
ipc_free_security(&msq->q_perm);
@@ -4139,7 +4150,7 @@ static int selinux_msg_queue_associate(struct msg_queue *msq, int msqflg)
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = msq->q_perm.key;

- return avc_has_perm(tsec->sid, isec->sid, SECCLASS_MSGQ,
+ return avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_MSGQ,
MSGQ__ASSOCIATE, &ad);
}

@@ -4191,7 +4202,7 @@ static int selinux_msg_queue_msgsnd(struct msg_queue *msq, struct msg_msg *msg,
* Compute new sid based on current process and
* message queue this message will be stored in
*/
- rc = security_transition_sid(tsec->sid,
+ rc = security_transition_sid(tsec->actor_sid,
isec->sid,
SECCLASS_MSG,
&msec->sid);
@@ -4203,11 +4214,11 @@ static int selinux_msg_queue_msgsnd(struct msg_queue *msq, struct msg_msg *msg,
ad.u.ipc_id = msq->q_perm.key;

/* Can this process write to the queue? */
- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_MSGQ,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_MSGQ,
MSGQ__WRITE, &ad);
if (!rc)
/* Can this process send the message */
- rc = avc_has_perm(tsec->sid, msec->sid,
+ rc = avc_has_perm(tsec->actor_sid, msec->sid,
SECCLASS_MSG, MSG__SEND, &ad);
if (!rc)
/* Can the message be put in the queue? */
@@ -4234,10 +4245,10 @@ static int selinux_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg,
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = msq->q_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid,
SECCLASS_MSGQ, MSGQ__READ, &ad);
if (!rc)
- rc = avc_has_perm(tsec->sid, msec->sid,
+ rc = avc_has_perm(tsec->actor_sid, msec->sid,
SECCLASS_MSG, MSG__RECEIVE, &ad);
return rc;
}
@@ -4260,7 +4271,7 @@ static int selinux_shm_alloc_security(struct shmid_kernel *shp)
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = shp->shm_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_SHM,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SHM,
SHM__CREATE, &ad);
if (rc) {
ipc_free_security(&shp->shm_perm);
@@ -4286,7 +4297,7 @@ static int selinux_shm_associate(struct shmid_kernel *shp, int shmflg)
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = shp->shm_perm.key;

- return avc_has_perm(tsec->sid, isec->sid, SECCLASS_SHM,
+ return avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SHM,
SHM__ASSOCIATE, &ad);
}

@@ -4359,7 +4370,7 @@ static int selinux_sem_alloc_security(struct sem_array *sma)
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = sma->sem_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_SEM,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SEM,
SEM__CREATE, &ad);
if (rc) {
ipc_free_security(&sma->sem_perm);
@@ -4385,7 +4396,7 @@ static int selinux_sem_associate(struct sem_array *sma, int semflg)
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = sma->sem_perm.key;

- return avc_has_perm(tsec->sid, isec->sid, SECCLASS_SEM,
+ return avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SEM,
SEM__ASSOCIATE, &ad);
}

@@ -4633,14 +4644,19 @@ static int selinux_setprocattr(struct task_struct *p,
error = avc_has_perm_noaudit(tsec->ptrace_sid, sid,
SECCLASS_PROCESS,
PROCESS__PTRACE, 0, &avd);
- if (!error)
+ if (!error) {
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = sid;
tsec->sid = sid;
+ }
task_unlock(p);
avc_audit(tsec->ptrace_sid, sid, SECCLASS_PROCESS,
PROCESS__PTRACE, &avd, error, NULL);
if (error)
return error;
} else {
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = sid;
tsec->sid = sid;
task_unlock(p);
}
@@ -4677,6 +4693,24 @@ static u32 selinux_set_fscreate_secid(u32 secid)
return oldsid;
}

+static u32 selinux_act_as_secid(u32 secid)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldactor_sid = tsec->actor_sid;
+
+ tsec->actor_sid = secid;
+ return oldactor_sid;
+}
+
+static u32 selinux_act_as_self(void)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldactor_sid = tsec->actor_sid;
+
+ tsec->actor_sid = tsec->sid;
+ return oldactor_sid;
+}
+
#ifdef CONFIG_KEYS

static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4726,7 +4760,7 @@ static int selinux_key_permission(key_ref_t key_ref,
if (perm == 0)
return 0;

- return avc_has_perm(tsec->sid, ksec->sid,
+ return avc_has_perm(tsec->actor_sid, ksec->sid,
SECCLASS_KEY, perm, NULL);
}

@@ -4861,6 +4895,8 @@ static struct security_operations selinux_ops = {
.release_secctx = selinux_release_secctx,
.get_fscreate_secid = selinux_get_fscreate_secid,
.set_fscreate_secid = selinux_set_fscreate_secid,
+ .act_as_secid = selinux_act_as_secid,
+ .act_as_self = selinux_act_as_self,

.unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send = selinux_socket_unix_may_send,
@@ -4926,7 +4962,7 @@ static __init int selinux_init(void)
if (task_alloc_security(current))
panic("SELinux: Failed to initialize initial task.\n");
tsec = current->security;
- tsec->osid = tsec->sid = SECINITSID_KERNEL;
+ tsec->osid = tsec->actor_sid = tsec->sid = SECINITSID_KERNEL;

sel_inode_cache = kmem_cache_create("selinux_inode_security",
sizeof(struct inode_security_struct),
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 91b88f0..884df98 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -31,6 +31,7 @@ struct task_security_struct {
struct task_struct *task; /* back pointer to task object */
u32 osid; /* SID prior to last execve */
u32 sid; /* current SID */
+ u32 actor_sid; /* act-as SID (normally == sid) */
u32 exec_sid; /* exec SID */
u32 create_sid; /* fscreate SID */
u32 keycreate_sid; /* keycreate SID */
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index c9e92da..8df8e6a 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -83,7 +83,7 @@ static int task_has_security(struct task_struct *tsk,
if (!tsec)
return -EACCES;

- return avc_has_perm(tsec->sid, SECINITSID_SECURITY,
+ return avc_has_perm(tsec->actor_sid, SECINITSID_SECURITY,
SECCLASS_SECURITY, perms, NULL);
}

diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index ba715f4..873b6ba 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -240,7 +240,7 @@ static int selinux_xfrm_sec_ctx_alloc(struct xfrm_sec_ctx **ctxp,
/*
* Does the subject have permission to set security context?
*/
- rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
+ rc = avc_has_perm(tsec->actor_sid, ctx->ctx_sid,
SECCLASS_ASSOCIATION,
ASSOCIATION__SETCONTEXT, NULL);
if (rc)
@@ -341,7 +341,7 @@ int selinux_xfrm_policy_delete(struct xfrm_policy *xp)
int rc = 0;

if (ctx)
- rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
+ rc = avc_has_perm(tsec->actor_sid, ctx->ctx_sid,
SECCLASS_ASSOCIATION,
ASSOCIATION__SETCONTEXT, NULL);

@@ -383,7 +383,7 @@ int selinux_xfrm_state_delete(struct xfrm_state *x)
int rc = 0;

if (ctx)
- rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
+ rc = avc_has_perm(tsec->actor_sid, ctx->ctx_sid,
SECCLASS_ASSOCIATION,
ASSOCIATION__SETCONTEXT, NULL);


2007-08-09 16:42:08

by David Howells

[permalink] [raw]
Subject: [PATCH 13/14] CacheFiles: A cache that backs onto a mounted filesystem [try #2]

Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a
backing store for the cache.


CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling. This is called cachefilesd and lives in
/sbin. The source for the daemon can be downloaded from:

http://people.redhat.com/~dhowells/cachefs/cachefilesd.c

And an example configuration from:

http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf

The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services. Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.

CacheFiles creates a proc-file - "/proc/fs/cachefiles" - that is used for
communication with the daemon. Only one thing may have this open at once, and
whilst it is open, a cache is at least partially in existence. The daemon
opens this and sends commands down it to control the cache.

CacheFiles is currently limited to a single cache.

CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section. This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.


============
REQUIREMENTS
============

The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:

- dnotify.

- extended attributes (xattrs).

- openat() and friends.

- bmap() support on files in the filesystem (FIBMAP ioctl).

- The use of bmap() to detect a partial page at the end of the file.

It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.


=============
CONFIGURATION
=============

The cache is configured by a script in /etc/cachefilesd.conf. These commands
set up cache ready for use. The following script commands are available:

(*) brun <N>%
(*) bcull <N>%
(*) bstop <N>%

Configure the culling limits. Optional. See the section on culling
The defaults are 7%, 5% and 1% respectively.

(*) dir <path>

Specify the directory containing the root of the cache. Mandatory.

(*) tag <name>

Specify a tag to FS-Cache to use in distinguishing multiple caches.
Optional. The default is "CacheFiles".

(*) debug <mask>

Specify a numeric bitmask to control debugging in the kernel module.
Optional. The default is zero (all off).


==================
STARTING THE CACHE
==================

The cache is started by running the daemon. The daemon opens the cache proc
file, configures the cache and tells it to begin caching. At that point the
cache binds to fscache and the cache becomes live.

The daemon is run as follows:

/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]

The flags are:

(*) -d

Increase the debugging level. This can be specified multiple times and
is cumulative with itself.

(*) -s

Send messages to stderr instead of syslog.

(*) -n

Don't daemonise and go into background.

(*) -f <configfile>

Use an alternative configuration file rather than the default one.


===============
THINGS TO AVOID
===============

Do not mount other things within the cache as this will cause problems. The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.

Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.

Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).

Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.

Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.

Do not chmod files in the cache. The module creates things with minimal
permissions to prevent random users being able to access them directly.


=============
CACHE CULLING
=============

The cache may need culling occasionally to make space. This involves
discarding objects from the cache that have been used less recently than
anything else. Culling is based on the access time of data objects. Empty
directories are culled if not in use.

Cache culling is done on the basis of the percentage of blocks available in the
underlying filesystem. There are three "limits":

(*) brun

If the amount of available space in the cache rises above this limit, then
culling is turned off.

(*) bcull

If the amount of available space in the cache falls below this limit, then
culling is started.

(*) bstop

If the amount of available space in the cache falls below this limit, then
no further allocation of disk space is permitted until culling has raised
the amount above this limit again.

These must be configured thusly:

0 <= bstop < bcull < brun < 100

Note that these are percentages of available space, and do _not_ appear as 100
minus the percentage displayed by the "df" program.

The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order. A new scan of the cache is
started as soon as space is made in the table. Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.


===============
CACHE STRUCTURE
===============

The CacheFiles module will create two directories in the directory it was
given:

(*) cache/

(*) graveyard/

The active cache objects all reside in the first directory. The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.

The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.


The module represents index objects as directories with the filename "I..." or
"J...". Note that the "cache/" directory is itself a special index.

Data objects are represented as files if they have no children, or directories
if they do. Their filenames all begin "D..." or "E...". If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.

Special objects are similar to data objects, except their filenames begin
"S..." or "T...".


If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended. Into
this directory, if possible, will be placed the representations of the child
objects:

INDEX INDEX INDEX DATA FILES
========= ========== ================================= ================
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry


If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory. The names of the intermediate directories will have
'+' prepended:

J1223/@23/+xy...z/+kl...m/Epqr


Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.

To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable. The two versions of
object filenames indicate the encoding:

OBJECT TYPE PRINTABLE ENCODED
=============== =============== ===============
Index "I..." "J..."
Data "D..." "E..."
Special "S..." "T..."

Intermediate directories are always "@" or "+" as appropriate.


Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs. The latter is used to detect stale objects in the cache and update
or retire them.


Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).


This documentation is added by the patch to:

Documentation/filesystems/caching/cachefiles.txt

Signed-Off-By: David Howells <[email protected]>
---

Documentation/filesystems/caching/cachefiles.txt | 395 ++++++++++++
fs/Kconfig | 1
fs/Makefile | 1
fs/cachefiles/Kconfig | 33 +
fs/cachefiles/Makefile | 18 +
fs/cachefiles/cf-bind.c | 288 +++++++++
fs/cachefiles/cf-daemon.c | 720 +++++++++++++++++++++
fs/cachefiles/cf-interface.c | 442 +++++++++++++
fs/cachefiles/cf-internal.h | 414 ++++++++++++
fs/cachefiles/cf-key.c | 159 +++++
fs/cachefiles/cf-main.c | 109 +++
fs/cachefiles/cf-namei.c | 743 ++++++++++++++++++++++
fs/cachefiles/cf-proc.c | 166 +++++
fs/cachefiles/cf-security.c | 107 +++
fs/cachefiles/cf-xattr.c | 292 +++++++++
15 files changed, 3888 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/caching/cachefiles.txt b/Documentation/filesystems/caching/cachefiles.txt
new file mode 100644
index 0000000..b502cff
--- /dev/null
+++ b/Documentation/filesystems/caching/cachefiles.txt
@@ -0,0 +1,395 @@
+ ===============================================
+ CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
+ ===============================================
+
+Contents:
+
+ (*) Overview.
+
+ (*) Requirements.
+
+ (*) Configuration.
+
+ (*) Starting the cache.
+
+ (*) Things to avoid.
+
+ (*) Cache culling.
+
+ (*) Cache structure.
+
+ (*) Security model and SELinux.
+
+========
+OVERVIEW
+========
+
+CacheFiles is a caching backend that's meant to use as a cache a directory on
+an already mounted filesystem of a local type (such as Ext3).
+
+CacheFiles uses a userspace daemon to do some of the cache management - such as
+reaping stale nodes and culling. This is called cachefilesd and lives in
+/sbin.
+
+The filesystem and data integrity of the cache are only as good as those of the
+filesystem providing the backing services. Note that CacheFiles does not
+attempt to journal anything since the journalling interfaces of the various
+filesystems are very specific in nature.
+
+CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
+to communication with the daemon. Only one thing may have this open at once,
+and whilst it is open, a cache is at least partially in existence. The daemon
+opens this and sends commands down it to control the cache.
+
+CacheFiles is currently limited to a single cache.
+
+CacheFiles attempts to maintain at least a certain percentage of free space on
+the filesystem, shrinking the cache by culling the objects it contains to make
+space if necessary - see the "Cache Culling" section. This means it can be
+placed on the same medium as a live set of data, and will expand to make use of
+spare space and automatically contract when the set of data requires more
+space.
+
+
+============
+REQUIREMENTS
+============
+
+The use of CacheFiles and its daemon requires the following features to be
+available in the system and in the cache filesystem:
+
+ - dnotify.
+
+ - extended attributes (xattrs).
+
+ - openat() and friends.
+
+ - bmap() support on files in the filesystem (FIBMAP ioctl).
+
+ - The use of bmap() to detect a partial page at the end of the file.
+
+It is strongly recommended that the "dir_index" option is enabled on Ext3
+filesystems being used as a cache.
+
+
+=============
+CONFIGURATION
+=============
+
+The cache is configured by a script in /etc/cachefilesd.conf. These commands
+set up cache ready for use. The following script commands are available:
+
+ (*) brun <N>%
+ (*) bcull <N>%
+ (*) bstop <N>%
+ (*) frun <N>%
+ (*) fcull <N>%
+ (*) fstop <N>%
+
+ Configure the culling limits. Optional. See the section on culling
+ The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
+
+ The commands beginning with a 'b' are file space (block) limits, those
+ beginning with an 'f' are file count limits.
+
+ (*) dir <path>
+
+ Specify the directory containing the root of the cache. Mandatory.
+
+ (*) tag <name>
+
+ Specify a tag to FS-Cache to use in distinguishing multiple caches.
+ Optional. The default is "CacheFiles".
+
+ (*) debug <mask>
+
+ Specify a numeric bitmask to control debugging in the kernel module.
+ Optional. The default is zero (all off). The following values can be
+ OR'd into the mask to collect various information:
+
+ 1 Turn on trace of function entry (_enter() macros)
+ 2 Turn on trace of function exit (_leave() macros)
+ 4 Turn on trace of internal debug points (_debug())
+
+ This mask can also be set through sysfs, eg:
+
+ echo 5 >/sys/modules/cachefiles/parameters/debug
+
+
+==================
+STARTING THE CACHE
+==================
+
+The cache is started by running the daemon. The daemon opens the cache device,
+configures the cache and tells it to begin caching. At that point the cache
+binds to fscache and the cache becomes live.
+
+The daemon is run as follows:
+
+ /sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
+
+The flags are:
+
+ (*) -d
+
+ Increase the debugging level. This can be specified multiple times and
+ is cumulative with itself.
+
+ (*) -s
+
+ Send messages to stderr instead of syslog.
+
+ (*) -n
+
+ Don't daemonise and go into background.
+
+ (*) -f <configfile>
+
+ Use an alternative configuration file rather than the default one.
+
+
+===============
+THINGS TO AVOID
+===============
+
+Do not mount other things within the cache as this will cause problems. The
+kernel module contains its own very cut-down path walking facility that ignores
+mountpoints, but the daemon can't avoid them.
+
+Do not create, rename or unlink files and directories in the cache whilst the
+cache is active, as this may cause the state to become uncertain.
+
+Renaming files in the cache might make objects appear to be other objects (the
+filename is part of the lookup key).
+
+Do not change or remove the extended attributes attached to cache files by the
+cache as this will cause the cache state management to get confused.
+
+Do not create files or directories in the cache, lest the cache get confused or
+serve incorrect data.
+
+Do not chmod files in the cache. The module creates things with minimal
+permissions to prevent random users being able to access them directly.
+
+
+=============
+CACHE CULLING
+=============
+
+The cache may need culling occasionally to make space. This involves
+discarding objects from the cache that have been used less recently than
+anything else. Culling is based on the access time of data objects. Empty
+directories are culled if not in use.
+
+Cache culling is done on the basis of the percentage of blocks and the
+percentage of files available in the underlying filesystem. There are six
+"limits":
+
+ (*) brun
+ (*) frun
+
+ If the amount of free space and the number of available files in the cache
+ rises above both these limits, then culling is turned off.
+
+ (*) bcull
+ (*) fcull
+
+ If the amount of available space or the number of available files in the
+ cache falls below either of these limits, then culling is started.
+
+ (*) bstop
+ (*) fstop
+
+ If the amount of available space or the number of available files in the
+ cache falls below either of these limits, then no further allocation of
+ disk space or files is permitted until culling has raised things above
+ these limits again.
+
+These must be configured thusly:
+
+ 0 <= bstop < bcull < brun < 100
+ 0 <= fstop < fcull < frun < 100
+
+Note that these are percentages of available space and available files, and do
+_not_ appear as 100 minus the percentage displayed by the "df" program.
+
+The userspace daemon scans the cache to build up a table of cullable objects.
+These are then culled in least recently used order. A new scan of the cache is
+started as soon as space is made in the table. Objects will be skipped if
+their atimes have changed or if the kernel module says it is still using them.
+
+
+===============
+CACHE STRUCTURE
+===============
+
+The CacheFiles module will create two directories in the directory it was
+given:
+
+ (*) cache/
+
+ (*) graveyard/
+
+The active cache objects all reside in the first directory. The CacheFiles
+kernel module moves any retired or culled objects that it can't simply unlink
+to the graveyard from which the daemon will actually delete them.
+
+The daemon uses dnotify to monitor the graveyard directory, and will delete
+anything that appears therein.
+
+
+The module represents index objects as directories with the filename "I..." or
+"J...". Note that the "cache/" directory is itself a special index.
+
+Data objects are represented as files if they have no children, or directories
+if they do. Their filenames all begin "D..." or "E...". If represented as a
+directory, data objects will have a file in the directory called "data" that
+actually holds the data.
+
+Special objects are similar to data objects, except their filenames begin
+"S..." or "T...".
+
+
+If an object has children, then it will be represented as a directory.
+Immediately in the representative directory are a collection of directories
+named for hash values of the child object keys with an '@' prepended. Into
+this directory, if possible, will be placed the representations of the child
+objects:
+
+ INDEX INDEX INDEX DATA FILES
+ ========= ========== ================================= ================
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
+ cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
+
+
+If the key is so long that it exceeds NAME_MAX with the decorations added on to
+it, then it will be cut into pieces, the first few of which will be used to
+make a nest of directories, and the last one of which will be the objects
+inside the last directory. The names of the intermediate directories will have
+'+' prepended:
+
+ J1223/@23/+xy...z/+kl...m/Epqr
+
+
+Note that keys are raw data, and not only may they exceed NAME_MAX in size,
+they may also contain things like '/' and NUL characters, and so they may not
+be suitable for turning directly into a filename.
+
+To handle this, CacheFiles will use a suitably printable filename directly and
+"base-64" encode ones that aren't directly suitable. The two versions of
+object filenames indicate the encoding:
+
+ OBJECT TYPE PRINTABLE ENCODED
+ =============== =============== ===============
+ Index "I..." "J..."
+ Data "D..." "E..."
+ Special "S..." "T..."
+
+Intermediate directories are always "@" or "+" as appropriate.
+
+
+Each object in the cache has an extended attribute label that holds the object
+type ID (required to distinguish special objects) and the auxiliary data from
+the netfs. The latter is used to detect stale objects in the cache and update
+or retire them.
+
+
+Note that CacheFiles will erase from the cache any file it doesn't recognise or
+any file of an incorrect type (such as a FIFO file or a device file).
+
+
+==========================
+SECURITY MODEL AND SELINUX
+==========================
+
+CacheFiles is implemented to deal properly with the LSM security features of
+the Linux kernel and the SELinux facility.
+
+One of the problems that CacheFiles faces is that it is generally acting on
+behalf of a process, and running in that process's context, and that includes a
+security context that is not appropriate for accessing the cache - either
+because the files in the cache are inaccessible to that process, or because if
+the process creates a file in the cache, that file may be inaccessible to other
+processes.
+
+The way CacheFiles works is to temporarily change the security context (fsuid,
+fsgid and actor security label) that the process acts as - without changing the
+security context of the process when it the target of an operation performed by
+some other process (so signalling and suchlike still work correctly).
+
+
+When the CacheFiles module is asked to bind to its cache, it:
+
+ (1) Finds the security label attached to the root cache directory and uses
+ that as the security label with which it will create files. By default,
+ this is:
+
+ cachefiles_var_t
+
+ (2) Finds the security label of the process which issued the bind request
+ (presumed to be the cachefilesd daemon), which by default will be:
+
+ cachefilesd_t
+
+ and asks LSM to supply a security ID as which it should act given the
+ daemon's label. By default, this will be:
+
+ cachefiles_kernel_t
+
+ SELinux transitions the daemon's security ID to the module's security ID
+ based on a rule of this form in the policy.
+
+ type_transition <daemon's-ID> kernel_t : process <module's-ID>;
+
+ For instance:
+
+ type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
+
+
+The module's security ID gives it permission to create, move and remove files
+and directories in the cache, to find and access directories and files in the
+cache, to set and access extended attributes on cache objects, and to read and
+write files in the cache.
+
+The daemon's security ID gives it only a very restricted set of permissions: it
+may scan directories, stat files and erase files and directories. It may
+not read or write files in the cache, and so it is precluded from accessing the
+data cached therein; nor is it permitted to create new files in the cache.
+
+
+There are policy source files available in:
+
+ http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
+
+and later versions. In that tarball, see the files:
+
+ cachefilesd.te
+ cachefilesd.fc
+ cachefilesd.if
+
+They are built and installed directly by the RPM.
+
+If a non-RPM based system is being used, then copy the above files to their own
+directory and run:
+
+ make -f /usr/share/selinux/devel/Makefile
+ semodule -i cachefilesd.pp
+
+You will need checkpolicy and selinux-policy-devel installed prior to the
+build.
+
+
+By default, the cache is located in /var/fscache, but if it is desirable that
+it should be elsewhere, than either the above policy files must be altered, or
+an auxiliary policy must be installed to label the alternate location of the
+cache.
+
+For instructions on how to add an auxiliary policy to enable the cache to be
+located elsewhere when SELinux is in enforcing mode, please see:
+
+ /usr/share/doc/cachefilesd-*/move-cache.txt
+
+When the cachefilesd rpm is installed; alternatively, the document can be found
+in the sources.
diff --git a/fs/Kconfig b/fs/Kconfig
index 3e1c276..7feb4cb 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -634,6 +634,7 @@ config GENERIC_ACL
menu "Caches"

source "fs/fscache/Kconfig"
+source "fs/cachefiles/Kconfig"

endmenu

diff --git a/fs/Makefile b/fs/Makefile
index 4eecc9a..becdb4f 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -116,6 +116,7 @@ obj-$(CONFIG_AFS_FS) += afs/
obj-$(CONFIG_BEFS_FS) += befs/
obj-$(CONFIG_HOSTFS) += hostfs/
obj-$(CONFIG_HPPFS) += hppfs/
+obj-$(CONFIG_CACHEFILES) += cachefiles/
obj-$(CONFIG_DEBUG_FS) += debugfs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_GFS2_FS) += gfs2/
diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
new file mode 100644
index 0000000..ddbdd85
--- /dev/null
+++ b/fs/cachefiles/Kconfig
@@ -0,0 +1,33 @@
+
+config CACHEFILES
+ tristate "Filesystem caching on files"
+ depends on FSCACHE
+ help
+ This permits use of a mounted filesystem as a cache for other
+ filesystems - primarily networking filesystems - thus allowing fast
+ local disk to enhance the speed of slower devices.
+
+ See Documentation/filesystems/caching/cachefiles.txt for more
+ information.
+
+config CACHEFILES_DEBUG
+ bool "Debug CacheFiles"
+ depends on CACHEFILES
+ help
+ This permits debugging to be dynamically enabled in the filesystem
+ caching on files module. If this is set, the debugging output may be
+ enabled by setting bits in /sys/modules/cachefiles/parameter/debug or
+ by including a debugging specifier in /etc/cachefilesd.conf.
+
+config CACHEFILES_HISTOGRAM
+ bool "Gather latency information on CacheFiles"
+ depends on CACHEFILES && FSCACHE_PROC
+ help
+
+ This option causes latency information to be gathered on CacheFiles
+ operation and exported through file:
+
+ /proc/fs/fscache/cachefiles/histogram
+
+ See Documentation/filesystems/caching/cachefiles.txt for more
+ information.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
new file mode 100644
index 0000000..31186a4
--- /dev/null
+++ b/fs/cachefiles/Makefile
@@ -0,0 +1,18 @@
+#
+# Makefile for caching in a mounted filesystem
+#
+
+cachefiles-y := \
+ cf-bind.o \
+ cf-daemon.o \
+ cf-interface.o \
+ cf-key.o \
+ cf-main.o \
+ cf-namei.o \
+ cf-rdwr.o \
+ cf-xattr.o
+
+cachefiles-$(CONFIG_SECURITY) += cf-security.o
+cachefiles-$(CONFIG_CACHEFILES_HISTOGRAM) += cf-proc.o
+
+obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
new file mode 100644
index 0000000..90904e6
--- /dev/null
+++ b/fs/cachefiles/cf-bind.c
@@ -0,0 +1,288 @@
+/* Bind and unbind a cache from the filesystem backing it
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/ctype.h>
+#include "cf-internal.h"
+
+static int cachefiles_daemon_add_cache(struct cachefiles_cache *caches);
+
+/*
+ * bind a directory as a cache
+ */
+int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+ _enter("{%u,%u,%u,%u,%u,%u},%s",
+ cache->frun_percent,
+ cache->fcull_percent,
+ cache->fstop_percent,
+ cache->brun_percent,
+ cache->bcull_percent,
+ cache->bstop_percent,
+ args);
+
+ /* start by checking things over */
+ ASSERT(cache->fstop_percent >= 0 &&
+ cache->fstop_percent < cache->fcull_percent &&
+ cache->fcull_percent < cache->frun_percent &&
+ cache->frun_percent < 100);
+
+ ASSERT(cache->bstop_percent >= 0 &&
+ cache->bstop_percent < cache->bcull_percent &&
+ cache->bcull_percent < cache->brun_percent &&
+ cache->brun_percent < 100);
+
+ if (*args) {
+ kerror("'bind' command doesn't take an argument");
+ return -EINVAL;
+ }
+
+ if (!cache->rootdirname) {
+ kerror("No cache directory specified");
+ return -EINVAL;
+ }
+
+ /* don't permit already bound caches to be re-bound */
+ if (test_bit(CACHEFILES_READY, &cache->flags)) {
+ kerror("Cache already bound");
+ return -EBUSY;
+ }
+
+ /* make sure we have copies of the tag and dirname strings */
+ if (!cache->tag) {
+ /* the tag string is released by the fops->release()
+ * function, so we don't release it on error here */
+ cache->tag = kstrdup("CacheFiles", GFP_KERNEL);
+ if (!cache->tag)
+ return -ENOMEM;
+ }
+
+ /* add the cache */
+ return cachefiles_daemon_add_cache(cache);
+}
+
+/*
+ * add a cache
+ */
+static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
+{
+ struct cachefiles_secctx secctx;
+ struct cachefiles_object *fsdef;
+ struct nameidata nd;
+ struct kstatfs stats;
+ struct dentry *graveyard, *cachedir, *root;
+ int ret;
+
+ _enter("");
+
+ /* we want to work under the module's security ID */
+ ret = cachefiles_get_security_ID(cache);
+ if (ret < 0)
+ return ret;
+
+ cachefiles_begin_secure_nofs(cache, &secctx);
+
+ /* allocate the root index object */
+ ret = -ENOMEM;
+
+ fsdef = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL);
+ if (!fsdef)
+ goto error_root_object;
+
+ ASSERTCMP(fsdef->backer, ==, NULL);
+
+ atomic_set(&fsdef->usage, 1);
+ fsdef->type = FSCACHE_COOKIE_TYPE_INDEX;
+
+ _debug("- fsdef %p", fsdef);
+
+ /* look up the directory at the root of the cache */
+ memset(&nd, 0, sizeof(nd));
+
+ ret = path_lookup(cache->rootdirname, LOOKUP_DIRECTORY, &nd);
+ if (ret < 0)
+ goto error_open_root;
+
+ cache->mnt = mntget(nd.mnt);
+ root = dget(nd.dentry);
+ path_release(&nd);
+
+ /* check parameters */
+ ret = -EOPNOTSUPP;
+ if (!root->d_inode ||
+ !root->d_inode->i_op ||
+ !root->d_inode->i_op->lookup ||
+ !root->d_inode->i_op->mkdir ||
+ !root->d_inode->i_op->setxattr ||
+ !root->d_inode->i_op->getxattr ||
+ !root->d_sb ||
+ !root->d_sb->s_op ||
+ !root->d_sb->s_op->statfs ||
+ !root->d_sb->s_op->sync_fs)
+ goto error_unsupported;
+
+ ret = -EROFS;
+ if (root->d_sb->s_flags & MS_RDONLY)
+ goto error_unsupported;
+
+ /* determine the security of the on-disk cache as this governs
+ * security ID of files we create */
+ ret = cachefiles_determine_cache_secid(cache, root);
+ if (ret < 0)
+ goto error_unsupported;
+
+ cachefiles_set_fscreate_secid(cache);
+
+ /* get the cache size and blocksize */
+ ret = vfs_statfs(root, &stats);
+ if (ret < 0)
+ goto error_unsupported;
+
+ ret = -ERANGE;
+ if (stats.f_bsize <= 0)
+ goto error_unsupported;
+
+ ret = -EOPNOTSUPP;
+ if (stats.f_bsize > PAGE_SIZE)
+ goto error_unsupported;
+
+ cache->bsize = stats.f_bsize;
+ cache->bshift = 0;
+ if (stats.f_bsize < PAGE_SIZE)
+ cache->bshift = PAGE_SHIFT - ilog2(stats.f_bsize);
+
+ _debug("blksize %u (shift %u)",
+ cache->bsize, cache->bshift);
+
+ _debug("size %llu, avail %llu",
+ (unsigned long long) stats.f_blocks,
+ (unsigned long long) stats.f_bavail);
+
+ /* set up caching limits */
+ do_div(stats.f_files, 100);
+ cache->fstop = stats.f_files * cache->fstop_percent;
+ cache->fcull = stats.f_files * cache->fcull_percent;
+ cache->frun = stats.f_files * cache->frun_percent;
+
+ _debug("limits {%llu,%llu,%llu} files",
+ (unsigned long long) cache->frun,
+ (unsigned long long) cache->fcull,
+ (unsigned long long) cache->fstop);
+
+ stats.f_blocks >>= cache->bshift;
+ do_div(stats.f_blocks, 100);
+ cache->bstop = stats.f_blocks * cache->bstop_percent;
+ cache->bcull = stats.f_blocks * cache->bcull_percent;
+ cache->brun = stats.f_blocks * cache->brun_percent;
+
+ _debug("limits {%llu,%llu,%llu} blocks",
+ (unsigned long long) cache->brun,
+ (unsigned long long) cache->bcull,
+ (unsigned long long) cache->bstop);
+
+ /* get the cache directory and check its type */
+ cachedir = cachefiles_get_directory(cache, root, "cache");
+ if (IS_ERR(cachedir)) {
+ ret = PTR_ERR(cachedir);
+ goto error_unsupported;
+ }
+
+ fsdef->dentry = cachedir;
+
+ ret = cachefiles_check_object_type(fsdef);
+ if (ret < 0)
+ goto error_unsupported;
+
+ /* get the graveyard directory */
+ graveyard = cachefiles_get_directory(cache, root, "graveyard");
+ if (IS_ERR(graveyard)) {
+ ret = PTR_ERR(graveyard);
+ goto error_unsupported;
+ }
+
+ cache->graveyard = graveyard;
+
+ /* publish the cache */
+ fscache_init_cache(&cache->cache,
+ &cachefiles_cache_ops,
+ "%02x:%02x",
+ MAJOR(fsdef->dentry->d_sb->s_dev),
+ MINOR(fsdef->dentry->d_sb->s_dev));
+
+ ret = fscache_add_cache(&cache->cache, &fsdef->fscache, cache->tag);
+ if (ret < 0)
+ goto error_add_cache;
+
+ /* done */
+ set_bit(CACHEFILES_READY, &cache->flags);
+ dput(root);
+
+ printk(KERN_INFO "CacheFiles:"
+ " File cache on %s registered\n",
+ cache->cache.identifier);
+
+ /* check how much space the cache has */
+ cachefiles_has_space(cache, 0, 0);
+ cachefiles_end_secure(cache, &secctx);
+ return 0;
+
+error_add_cache:
+ dput(cache->graveyard);
+ cache->graveyard = NULL;
+error_unsupported:
+ mntput(cache->mnt);
+ cache->mnt = NULL;
+ dput(fsdef->dentry);
+ fsdef->dentry = NULL;
+ dput(root);
+error_open_root:
+ kmem_cache_free(cachefiles_object_jar, fsdef);
+error_root_object:
+ cachefiles_end_secure(cache, &secctx);
+ kerror("Failed to register: %d", ret);
+ return ret;
+}
+
+/*
+ * unbind a cache on fd release
+ */
+void cachefiles_daemon_unbind(struct cachefiles_cache *cache)
+{
+ _enter("");
+
+ if (test_bit(CACHEFILES_READY, &cache->flags)) {
+ printk(KERN_INFO "CacheFiles:"
+ " File cache on %s unregistering\n",
+ cache->cache.identifier);
+
+ fscache_withdraw_cache(&cache->cache);
+ }
+
+ if (cache->cache.fsdef)
+ cache->cache.ops->put_object(cache->cache.fsdef);
+
+ dput(cache->graveyard);
+ mntput(cache->mnt);
+
+ kfree(cache->rootdirname);
+ kfree(cache->tag);
+
+ _leave("");
+}
diff --git a/fs/cachefiles/cf-daemon.c b/fs/cachefiles/cf-daemon.c
new file mode 100644
index 0000000..1939ff8
--- /dev/null
+++ b/fs/cachefiles/cf-daemon.c
@@ -0,0 +1,720 @@
+/* Daemon interface
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/ctype.h>
+#include "cf-internal.h"
+
+static int cachefiles_daemon_open(struct inode *, struct file *);
+static int cachefiles_daemon_release(struct inode *, struct file *);
+static ssize_t cachefiles_daemon_read(struct file *, char __user *, size_t, loff_t *);
+static ssize_t cachefiles_daemon_write(struct file *, const char __user *, size_t, loff_t *);
+static unsigned int cachefiles_daemon_poll(struct file *, struct poll_table_struct *);
+static int cachefiles_daemon_frun(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_fcull(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_brun(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_bcull(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args);
+
+static unsigned long cachefiles_open;
+
+const struct file_operations cachefiles_daemon_fops = {
+ .owner = THIS_MODULE,
+ .open = cachefiles_daemon_open,
+ .release = cachefiles_daemon_release,
+ .read = cachefiles_daemon_read,
+ .write = cachefiles_daemon_write,
+ .poll = cachefiles_daemon_poll,
+};
+
+struct cachefiles_daemon_cmd {
+ char name[8];
+ int (*handler)(struct cachefiles_cache *cache, char *args);
+};
+
+static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
+ { "bind", cachefiles_daemon_bind },
+ { "brun", cachefiles_daemon_brun },
+ { "bcull", cachefiles_daemon_bcull },
+ { "bstop", cachefiles_daemon_bstop },
+ { "cull", cachefiles_daemon_cull },
+ { "debug", cachefiles_daemon_debug },
+ { "dir", cachefiles_daemon_dir },
+ { "frun", cachefiles_daemon_frun },
+ { "fcull", cachefiles_daemon_fcull },
+ { "fstop", cachefiles_daemon_fstop },
+ { "inuse", cachefiles_daemon_inuse },
+ { "tag", cachefiles_daemon_tag },
+ { "", NULL }
+};
+
+
+/*
+ * do various checks
+ */
+static int cachefiles_daemon_open(struct inode *inode, struct file *file)
+{
+ struct cachefiles_cache *cache;
+
+ _enter("");
+
+ /* only the superuser may do this */
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ /* the cachefiles device may only be open once at a time */
+ if (xchg(&cachefiles_open, 1) == 1)
+ return -EBUSY;
+
+ /* allocate a cache record */
+ cache = kzalloc(sizeof(struct cachefiles_cache), GFP_KERNEL);
+ if (!cache) {
+ cachefiles_open = 0;
+ return -ENOMEM;
+ }
+
+ mutex_init(&cache->daemon_mutex);
+ cache->active_nodes = RB_ROOT;
+ rwlock_init(&cache->active_lock);
+ init_waitqueue_head(&cache->daemon_pollwq);
+
+ /* set default caching limits
+ * - limit at 1% free space and/or free files
+ * - cull below 5% free space and/or free files
+ * - cease culling above 7% free space and/or free files
+ */
+ cache->frun_percent = 7;
+ cache->fcull_percent = 5;
+ cache->fstop_percent = 1;
+ cache->brun_percent = 7;
+ cache->bcull_percent = 5;
+ cache->bstop_percent = 1;
+
+ file->private_data = cache;
+ cache->cachefilesd = file;
+ return 0;
+}
+
+/*
+ * release a cache
+ */
+static int cachefiles_daemon_release(struct inode *inode, struct file *file)
+{
+ struct cachefiles_cache *cache = file->private_data;
+
+ _enter("");
+
+ ASSERT(cache);
+
+ set_bit(CACHEFILES_DEAD, &cache->flags);
+
+ cachefiles_daemon_unbind(cache);
+
+ ASSERT(!cache->active_nodes.rb_node);
+
+ /* clean up the control file interface */
+ cache->cachefilesd = NULL;
+ file->private_data = NULL;
+ cachefiles_open = 0;
+
+ kfree(cache);
+
+ _leave("");
+ return 0;
+}
+
+/*
+ * read the cache state
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+ size_t buflen, loff_t *pos)
+{
+ struct cachefiles_cache *cache = file->private_data;
+ char buffer[256];
+ int n;
+
+ _enter(",,%zu,", buflen);
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags))
+ return 0;
+
+ /* check how much space the cache has */
+ cachefiles_has_space(cache, 0, 0);
+
+ /* summarise */
+ clear_bit(CACHEFILES_STATE_CHANGED, &cache->flags);
+
+ n = snprintf(buffer, sizeof(buffer),
+ "cull=%c"
+ " frun=%llx"
+ " fcull=%llx"
+ " fstop=%llx"
+ " brun=%llx"
+ " bcull=%llx"
+ " bstop=%llx",
+ test_bit(CACHEFILES_CULLING, &cache->flags) ? '1' : '0',
+ (unsigned long long) cache->frun,
+ (unsigned long long) cache->fcull,
+ (unsigned long long) cache->fstop,
+ (unsigned long long) cache->brun,
+ (unsigned long long) cache->bcull,
+ (unsigned long long) cache->bstop
+ );
+
+ if (n > buflen)
+ return -EMSGSIZE;
+
+ if (copy_to_user(_buffer, buffer, n) != 0)
+ return -EFAULT;
+
+ return n;
+}
+
+/*
+ * command the cache
+ */
+static ssize_t cachefiles_daemon_write(struct file *file,
+ const char __user *_data,
+ size_t datalen,
+ loff_t *pos)
+{
+ const struct cachefiles_daemon_cmd *cmd;
+ struct cachefiles_cache *cache = file->private_data;
+ ssize_t ret;
+ char *data, *args, *cp;
+
+ _enter(",,%zu,", datalen);
+
+ ASSERT(cache);
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags))
+ return -EIO;
+
+ if (datalen < 0 || datalen > PAGE_SIZE - 1)
+ return -EOPNOTSUPP;
+
+ /* drag the command string into the kernel so we can parse it */
+ data = kmalloc(datalen + 1, GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ ret = -EFAULT;
+ if (copy_from_user(data, _data, datalen) != 0)
+ goto error;
+
+ data[datalen] = '\0';
+
+ ret = -EINVAL;
+ if (memchr(data, '\0', datalen))
+ goto error;
+
+ /* strip any newline */
+ cp = memchr(data, '\n', datalen);
+ if (cp) {
+ if (cp == data)
+ goto error;
+
+ *cp = '\0';
+ }
+
+ /* parse the command */
+ ret = -EOPNOTSUPP;
+
+ for (args = data; *args; args++)
+ if (isspace(*args))
+ break;
+ if (*args) {
+ if (args == data)
+ goto error;
+ *args = '\0';
+ for (args++; isspace(*args); args++)
+ continue;
+ }
+
+ /* run the appropriate command handler */
+ for (cmd = cachefiles_daemon_cmds; cmd->name[0]; cmd++)
+ if (strcmp(cmd->name, data) == 0)
+ goto found_command;
+
+error:
+ kfree(data);
+ _leave(" = %zd", ret);
+ return ret;
+
+found_command:
+ mutex_lock(&cache->daemon_mutex);
+
+ ret = -EIO;
+ if (!test_bit(CACHEFILES_DEAD, &cache->flags))
+ ret = cmd->handler(cache, args);
+
+ mutex_unlock(&cache->daemon_mutex);
+
+ if (ret == 0)
+ ret = datalen;
+ goto error;
+}
+
+/*
+ * poll for culling state
+ * - use POLLOUT to indicate culling state
+ */
+static unsigned int cachefiles_daemon_poll(struct file *file,
+ struct poll_table_struct *poll)
+{
+ struct cachefiles_cache *cache = file->private_data;
+ unsigned int mask;
+
+ poll_wait(file, &cache->daemon_pollwq, poll);
+ mask = 0;
+
+ if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+ mask |= POLLIN;
+
+ if (test_bit(CACHEFILES_CULLING, &cache->flags))
+ mask |= POLLOUT;
+
+ return mask;
+}
+
+/*
+ * give a range error for cache space constraints
+ * - can be tail-called
+ */
+static int cachefiles_daemon_range_error(struct cachefiles_cache *cache, char *args)
+{
+ kerror("Free space limits must be in range"
+ " 0%%<=stop<cull<run<100%%");
+
+ return -EINVAL;
+}
+
+/*
+ * set the percentage of files at which to stop culling
+ * - command: "frun <N>%"
+ */
+static int cachefiles_daemon_frun(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long frun;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ frun = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (frun <= cache->fcull_percent || frun >= 100)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->frun_percent = frun;
+ return 0;
+}
+
+/*
+ * set the percentage of files at which to start culling
+ * - command: "fcull <N>%"
+ */
+static int cachefiles_daemon_fcull(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long fcull;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ fcull = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (fcull <= cache->fstop_percent || fcull >= cache->frun_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->fcull_percent = fcull;
+ return 0;
+}
+
+/*
+ * set the percentage of files at which to stop allocating
+ * - command: "fstop <N>%"
+ */
+static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long fstop;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ fstop = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (fstop < 0 || fstop >= cache->fcull_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->fstop_percent = fstop;
+ return 0;
+}
+
+/*
+ * set the percentage of blocks at which to stop culling
+ * - command: "brun <N>%"
+ */
+static int cachefiles_daemon_brun(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long brun;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ brun = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (brun <= cache->bcull_percent || brun >= 100)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->brun_percent = brun;
+ return 0;
+}
+
+/*
+ * set the percentage of blocks at which to start culling
+ * - command: "bcull <N>%"
+ */
+static int cachefiles_daemon_bcull(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long bcull;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ bcull = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (bcull <= cache->bstop_percent || bcull >= cache->brun_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->bcull_percent = bcull;
+ return 0;
+}
+
+/*
+ * set the percentage of blocks at which to stop allocating
+ * - command: "bstop <N>%"
+ */
+static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long bstop;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ bstop = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (bstop < 0 || bstop >= cache->bcull_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->bstop_percent = bstop;
+ return 0;
+}
+
+/*
+ * set the cache directory
+ * - command: "dir <name>"
+ */
+static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args)
+{
+ char *dir;
+
+ _enter(",%s", args);
+
+ if (!*args) {
+ kerror("Empty directory specified");
+ return -EINVAL;
+ }
+
+ if (cache->rootdirname) {
+ kerror("Second cache directory specified");
+ return -EEXIST;
+ }
+
+ dir = kstrdup(args, GFP_KERNEL);
+ if (!dir)
+ return -ENOMEM;
+
+ cache->rootdirname = dir;
+ return 0;
+}
+
+/*
+ * set the cache tag
+ * - command: "tag <name>"
+ */
+static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args)
+{
+ char *tag;
+
+ _enter(",%s", args);
+
+ if (!*args) {
+ kerror("Empty tag specified");
+ return -EINVAL;
+ }
+
+ if (cache->tag)
+ return -EEXIST;
+
+ tag = kstrdup(args, GFP_KERNEL);
+ if (!tag)
+ return -ENOMEM;
+
+ cache->tag = tag;
+ return 0;
+}
+
+/*
+ * request a node in the cache be culled from the current working directory
+ * - command: "cull <name>"
+ */
+static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args)
+{
+ struct cachefiles_secctx secctx;
+ struct fs_struct *fs;
+ struct dentry *dir;
+ int ret;
+
+ _enter(",%s", args);
+
+ if (strchr(args, '/'))
+ goto inval;
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags)) {
+ kerror("cull applied to unready cache");
+ return -EIO;
+ }
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ kerror("cull applied to dead cache");
+ return -EIO;
+ }
+
+ /* extract the directory dentry from the cwd */
+ fs = current->fs;
+ read_lock(&fs->lock);
+ dir = dget(fs->pwd);
+ read_unlock(&fs->lock);
+
+ if (!S_ISDIR(dir->d_inode->i_mode))
+ goto notdir;
+
+ cachefiles_begin_secure(cache, &secctx);
+ ret = cachefiles_cull(cache, dir, args);
+ cachefiles_end_secure(cache, &secctx);
+
+ dput(dir);
+ _leave(" = %d", ret);
+ return ret;
+
+notdir:
+ dput(dir);
+ kerror("cull command requires dirfd to be a directory");
+ return -ENOTDIR;
+
+inval:
+ kerror("cull command requires dirfd and filename");
+ return -EINVAL;
+}
+
+/*
+ * set debugging mode
+ * - command: "debug <mask>"
+ */
+static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long mask;
+
+ _enter(",%s", args);
+
+ mask = simple_strtoul(args, &args, 0);
+ if (args[0] != '\0')
+ goto inval;
+
+ cachefiles_debug = mask;
+ _leave(" = 0");
+ return 0;
+
+inval:
+ kerror("debug command requires mask");
+ return -EINVAL;
+}
+
+/*
+ * find out whether an object in the current working directory is in use or not
+ * - command: "inuse <name>"
+ */
+static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
+{
+ struct cachefiles_secctx secctx;
+ struct fs_struct *fs;
+ struct dentry *dir;
+ int ret;
+
+ _enter(",%s", args);
+
+ if (strchr(args, '/'))
+ goto inval;
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags)) {
+ kerror("inuse applied to unready cache");
+ return -EIO;
+ }
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ kerror("inuse applied to dead cache");
+ return -EIO;
+ }
+
+ /* extract the directory dentry from the cwd */
+ fs = current->fs;
+ read_lock(&fs->lock);
+ dir = dget(fs->pwd);
+ read_unlock(&fs->lock);
+
+ if (!S_ISDIR(dir->d_inode->i_mode))
+ goto notdir;
+
+ cachefiles_begin_secure(cache, &secctx);
+ ret = cachefiles_check_in_use(cache, dir, args);
+ cachefiles_end_secure(cache, &secctx);
+
+ dput(dir);
+ _leave(" = %d", ret);
+ return ret;
+
+notdir:
+ dput(dir);
+ kerror("inuse command requires dirfd to be a directory");
+ return -ENOTDIR;
+
+inval:
+ kerror("inuse command requires dirfd and filename");
+ return -EINVAL;
+}
+
+/*
+ * see if we have space for a number of pages and/or a number of files in the
+ * cache
+ */
+int cachefiles_has_space(struct cachefiles_cache *cache,
+ unsigned fnr, unsigned bnr)
+{
+ struct kstatfs stats;
+ int ret;
+
+ _enter("{%llu,%llu,%llu,%llu,%llu,%llu},%u,%u",
+ (unsigned long long) cache->frun,
+ (unsigned long long) cache->fcull,
+ (unsigned long long) cache->fstop,
+ (unsigned long long) cache->brun,
+ (unsigned long long) cache->bcull,
+ (unsigned long long) cache->bstop,
+ fnr, bnr);
+
+ /* find out how many pages of blockdev are available */
+ memset(&stats, 0, sizeof(stats));
+
+ ret = vfs_statfs(cache->mnt->mnt_root, &stats);
+ if (ret < 0) {
+ if (ret == -EIO)
+ cachefiles_io_error(cache, "statfs failed");
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ stats.f_bavail >>= cache->bshift;
+
+ _debug("avail %llu,%llu",
+ (unsigned long long) stats.f_ffree,
+ (unsigned long long) stats.f_bavail);
+
+ /* see if there is sufficient space */
+ if (stats.f_ffree > fnr)
+ stats.f_ffree -= fnr;
+ else
+ stats.f_ffree = 0;
+
+ if (stats.f_bavail > bnr)
+ stats.f_bavail -= bnr;
+ else
+ stats.f_bavail = 0;
+
+ ret = -ENOBUFS;
+ if (stats.f_ffree < cache->fstop ||
+ stats.f_bavail < cache->bstop)
+ goto begin_cull;
+
+ ret = 0;
+ if (stats.f_ffree < cache->fcull ||
+ stats.f_bavail < cache->bcull)
+ goto begin_cull;
+
+ if (test_bit(CACHEFILES_CULLING, &cache->flags) &&
+ stats.f_ffree >= cache->frun &&
+ stats.f_bavail >= cache->brun &&
+ test_and_clear_bit(CACHEFILES_CULLING, &cache->flags)
+ ) {
+ _debug("cease culling");
+ cachefiles_state_changed(cache);
+ }
+
+ _leave(" = 0");
+ return 0;
+
+begin_cull:
+ if (!test_and_set_bit(CACHEFILES_CULLING, &cache->flags)) {
+ _debug("### CULL CACHE ###");
+ cachefiles_state_changed(cache);
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
new file mode 100644
index 0000000..4f29f21
--- /dev/null
+++ b/fs/cachefiles/cf-interface.c
@@ -0,0 +1,442 @@
+/* FS-Cache interface to CacheFiles
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/mount.h>
+#include <linux/buffer_head.h>
+#include "cf-internal.h"
+
+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+
+struct cachefiles_lookup_data {
+ struct cachefiles_xattr *auxdata; /* auxiliary data */
+ char *key; /* key path */
+};
+
+static int cachefiles_attr_changed(struct fscache_object *_object);
+
+/*
+ * allocate an object record for a cookie lookup and prepare the lookup data
+ */
+static struct fscache_object *cachefiles_alloc_object(
+ struct fscache_cache *_cache,
+ struct fscache_cookie *cookie)
+{
+ struct cachefiles_lookup_data *lookup_data;
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ struct cachefiles_xattr *auxdata;
+ unsigned keylen, auxlen;
+ void *buffer;
+ char *key;
+
+ cache = container_of(_cache, struct cachefiles_cache, cache);
+
+ _enter("{%s},%p,", cache->cache.identifier, cookie);
+
+ lookup_data = kmalloc(sizeof(lookup_data), GFP_KERNEL);
+ if (!lookup_data)
+ goto nomem_lookup_data;
+
+ /* create a new object record and a temporary leaf image */
+ object = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL);
+ if (!object)
+ goto nomem_object;
+
+ ASSERTCMP(object->backer, ==, NULL);
+
+ BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
+ atomic_set(&object->usage, 1);
+
+ fscache_object_init(&object->fscache);
+ object->fscache.cookie = cookie;
+ object->fscache.cache = &cache->cache;
+
+ object->type = cookie->def->type;
+
+ /* get hold of the raw key
+ * - stick the length on the front and leave space on the back for the
+ * encoder
+ */
+ buffer = kmalloc((2 + 512) + 3, GFP_KERNEL);
+ if (!buffer)
+ goto nomem_buffer;
+
+ keylen = cookie->def->get_key(cookie->netfs_data, buffer + 2, 512);
+ ASSERTCMP(keylen, <, 512);
+
+ *(uint16_t *)buffer = keylen;
+ ((char *)buffer)[keylen + 2] = 0;
+ ((char *)buffer)[keylen + 3] = 0;
+ ((char *)buffer)[keylen + 4] = 0;
+
+ /* turn the raw key into something that can work with as a filename */
+ key = cachefiles_cook_key(buffer, keylen + 2, object->type);
+ if (!key)
+ goto nomem_key;
+
+ /* get hold of the auxiliary data and prepend the object type */
+ auxdata = buffer;
+ auxlen = 0;
+ if (cookie->def->get_aux) {
+ auxlen = cookie->def->get_aux(cookie->netfs_data,
+ auxdata->data, 511);
+ ASSERTCMP(auxlen, <, 511);
+ }
+
+ auxdata->len = auxlen + 1;
+ auxdata->type = cookie->def->type;
+
+ lookup_data->auxdata = auxdata;
+ lookup_data->key = key;
+ object->lookup_data = lookup_data;
+
+ _leave(" = %p [%p]", &object->fscache, lookup_data);
+ return &object->fscache;
+
+nomem_key:
+ kfree(buffer);
+nomem_buffer:
+ BUG_ON(test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
+ kmem_cache_free(cachefiles_object_jar, object);
+nomem_object:
+ kfree(lookup_data);
+nomem_lookup_data:
+ _leave(" = -ENOMEM");
+ return ERR_PTR(-ENOMEM);
+}
+
+/*
+ * attempt to look up the nominated node in this cache
+ */
+static void cachefiles_lookup_object(struct fscache_object *_object)
+{
+ struct cachefiles_lookup_data *lookup_data;
+ struct cachefiles_object *parent, *object;
+ struct cachefiles_secctx secctx;
+ struct cachefiles_cache *cache;
+ int ret;
+
+ _enter("{OBJ%x}", _object->debug_id);
+
+ cache = container_of(_object->cache, struct cachefiles_cache, cache);
+ parent = container_of(_object->parent,
+ struct cachefiles_object, fscache);
+ object = container_of(_object, struct cachefiles_object, fscache);
+ lookup_data = object->lookup_data;
+
+ ASSERTCMP(lookup_data, !=, NULL);
+
+ /* look up the key, creating any missing bits */
+ cachefiles_begin_secure(cache, &secctx);
+ ret = cachefiles_walk_to_object(parent, object,
+ lookup_data->key,
+ lookup_data->auxdata);
+ cachefiles_end_secure(cache, &secctx);
+
+ /* polish off by setting the attributes of non-index files */
+ if (ret == 0 &&
+ object->fscache.cookie->def->type != FSCACHE_COOKIE_TYPE_INDEX)
+ cachefiles_attr_changed(&object->fscache);
+
+ if (ret < 0)
+ fscache_object_lookup_error(&object->fscache);
+
+ _leave(" [%d]", ret);
+}
+
+/*
+ * indication of lookup completion
+ */
+static void cachefiles_lookup_complete(struct fscache_object *_object)
+{
+ struct cachefiles_object *object;
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+
+ _enter("{OBJ%x,%p}", object->fscache.debug_id, object->lookup_data);
+
+ if (object->lookup_data) {
+ kfree(object->lookup_data->key);
+ kfree(object->lookup_data->auxdata);
+ kfree(object->lookup_data);
+ object->lookup_data = NULL;
+ }
+}
+
+/*
+ * increment the usage count on an inode object (may fail if unmounting)
+ */
+static struct fscache_object *cachefiles_grab_object(struct fscache_object *_object)
+{
+ struct cachefiles_object *object;
+
+ _enter("{OBJ%x}", _object->debug_id);
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+
+#ifdef CACHEFILES_DEBUG_SLAB
+ ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000);
+#endif
+
+ atomic_inc(&object->usage);
+ return &object->fscache;
+}
+
+/*
+ * update the auxilliary data for an object object on disk
+ */
+static void cachefiles_update_object(struct fscache_object *_object)
+{
+ struct cachefiles_secctx secctx;
+ struct cachefiles_object *object;
+ struct cachefiles_xattr *auxdata;
+ struct cachefiles_cache *cache;
+ struct fscache_cookie *cookie;
+ unsigned auxlen;
+
+ _enter("{OBJ%x}", _object->debug_id);
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache, struct cachefiles_cache,
+ cache);
+ cookie = object->fscache.cookie;
+
+ if (!cookie->def->get_aux) {
+ _leave(" [no aux]");
+ return;
+ }
+
+ auxdata = kmalloc(2 + 512 + 3, GFP_KERNEL);
+ if (!auxdata) {
+ _leave(" [nomem]");
+ return;
+ }
+
+ auxlen = cookie->def->get_aux(cookie->netfs_data, auxdata->data, 511);
+ ASSERTCMP(auxlen, <, 511);
+
+ auxdata->len = auxlen + 1;
+ auxdata->type = cookie->def->type;
+
+ cachefiles_begin_secure(cache, &secctx);
+ cachefiles_update_object_xattr(object, auxdata);
+ cachefiles_end_secure(cache, &secctx);
+ kfree(auxdata);
+ _leave("");
+}
+
+/*
+ * discard the resources pinned by an object and effect retirement if
+ * requested
+ */
+static void cachefiles_drop_object(struct fscache_object *_object)
+{
+ struct cachefiles_secctx secctx;
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+
+ ASSERT(_object);
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+
+ _enter("{OBJ%x,%d}",
+ object->fscache.debug_id, atomic_read(&object->usage));
+
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+#ifdef CACHEFILES_DEBUG_SLAB
+ ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000);
+#endif
+
+ /* delete retired objects */
+ if (object->fscache.state == FSCACHE_OBJECT_RECYCLING &&
+ _object != cache->cache.fsdef
+ ) {
+ _debug("- retire object OBJ%x", object->fscache.debug_id);
+ cachefiles_begin_secure(cache, &secctx);
+ cachefiles_delete_object(cache, object);
+ cachefiles_end_secure(cache, &secctx);
+ }
+
+ /* close the filesystem stuff attached to the object */
+ if (object->backer != object->dentry)
+ dput(object->backer);
+ object->backer = NULL;
+
+ /* note that the object is now inactive */
+ if (test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags)) {
+ write_lock(&cache->active_lock);
+ if (!test_and_clear_bit(CACHEFILES_OBJECT_ACTIVE,
+ &object->flags))
+ BUG();
+ rb_erase(&object->active_node, &cache->active_nodes);
+ wake_up_bit(&object->flags, CACHEFILES_OBJECT_ACTIVE);
+ write_unlock(&cache->active_lock);
+ }
+
+ dput(object->dentry);
+ object->dentry = NULL;
+
+ _leave("");
+}
+
+/*
+ * dispose of a reference to an object
+ */
+static void cachefiles_put_object(struct fscache_object *_object)
+{
+ struct cachefiles_object *object;
+
+ ASSERT(_object);
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+
+ _enter("{OBJ%x,%d}",
+ object->fscache.debug_id, atomic_read(&object->usage));
+
+#ifdef CACHEFILES_DEBUG_SLAB
+ ASSERT((atomic_read(&object->usage) & 0xffff0000) != 0x6b6b0000);
+#endif
+
+ ASSERTIFCMP(object->fscache.parent,
+ object->fscache.parent->n_children, >, 0);
+
+ if (atomic_dec_and_test(&object->usage)) {
+ _debug("- kill object OBJ%x", object->fscache.debug_id);
+
+ ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
+ ASSERTCMP(object->fscache.parent, ==, NULL);
+ ASSERTCMP(object->backer, ==, NULL);
+ ASSERTCMP(object->dentry, ==, NULL);
+ ASSERTCMP(object->fscache.n_ops, ==, 0);
+ ASSERTCMP(object->fscache.n_children, ==, 0);
+
+ if (object->lookup_data) {
+ kfree(object->lookup_data->key);
+ kfree(object->lookup_data->auxdata);
+ kfree(object->lookup_data);
+ object->lookup_data = NULL;
+ }
+
+ kmem_cache_free(cachefiles_object_jar, object);
+ }
+
+ _leave("");
+}
+
+/*
+ * sync a cache
+ */
+static void cachefiles_sync_cache(struct fscache_cache *_cache)
+{
+ struct cachefiles_secctx secctx;
+ struct cachefiles_cache *cache;
+ int ret;
+
+ _enter("%p", _cache);
+
+ cache = container_of(_cache, struct cachefiles_cache, cache);
+
+ /* make sure all pages pinned by operations on behalf of the netfs are
+ * written to disc */
+ cachefiles_begin_secure(cache, &secctx);
+ ret = fsync_super(cache->mnt->mnt_sb);
+ cachefiles_end_secure(cache, &secctx);
+
+ if (ret == -EIO)
+ cachefiles_io_error(cache,
+ "Attempt to sync backing fs superblock"
+ " returned error %d",
+ ret);
+}
+
+/*
+ * notification the attributes on an object have changed
+ * - called with reads/writes excluded by FS-Cache
+ */
+static int cachefiles_attr_changed(struct fscache_object *_object)
+{
+ struct cachefiles_secctx secctx;
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ struct iattr newattrs;
+ loff_t ni_size, oi_size;
+ int ret;
+
+ _object->cookie->def->get_attr(_object->cookie->netfs_data, &ni_size);
+
+ _enter("{OBJ%x},[%llu]", _object->debug_id, ni_size);
+
+ object = container_of(_object, struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ if (ni_size == object->i_size)
+ return 0;
+
+ if (!object->backer)
+ return -ENOBUFS;
+
+ ASSERT(S_ISREG(object->backer->d_inode->i_mode));
+
+ fscache_set_store_limit(&object->fscache, ni_size);
+
+ oi_size = i_size_read(object->backer->d_inode);
+ if (oi_size == ni_size)
+ return 0;
+
+ newattrs.ia_size = ni_size;
+ newattrs.ia_valid = ATTR_SIZE;
+
+ cachefiles_begin_secure(cache, &secctx);
+ mutex_lock(&object->backer->d_inode->i_mutex);
+ ret = notify_change(object->backer, &newattrs);
+ mutex_unlock(&object->backer->d_inode->i_mutex);
+ cachefiles_end_secure(cache, &secctx);
+
+ if (ret == -EIO) {
+ fscache_set_store_limit(&object->fscache, 0);
+ cachefiles_io_error_obj(object, "Size set failed");
+ ret = -ENOBUFS;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * dissociate a cache from all the pages it was backing
+ */
+static void cachefiles_dissociate_pages(struct fscache_cache *cache)
+{
+ _enter("");
+}
+
+const struct fscache_cache_ops cachefiles_cache_ops = {
+ .name = "cachefiles",
+ .alloc_object = cachefiles_alloc_object,
+ .lookup_object = cachefiles_lookup_object,
+ .lookup_complete = cachefiles_lookup_complete,
+ .grab_object = cachefiles_grab_object,
+ .update_object = cachefiles_update_object,
+ .drop_object = cachefiles_drop_object,
+ .put_object = cachefiles_put_object,
+ .sync_cache = cachefiles_sync_cache,
+ .attr_changed = cachefiles_attr_changed,
+ .read_or_alloc_page = cachefiles_read_or_alloc_page,
+ .read_or_alloc_pages = cachefiles_read_or_alloc_pages,
+ .allocate_page = cachefiles_allocate_page,
+ .allocate_pages = cachefiles_allocate_pages,
+ .write_page = cachefiles_write_page,
+ .uncache_page = cachefiles_uncache_page,
+ .dissociate_pages = cachefiles_dissociate_pages,
+};
diff --git a/fs/cachefiles/cf-internal.h b/fs/cachefiles/cf-internal.h
new file mode 100644
index 0000000..1fe6d87
--- /dev/null
+++ b/fs/cachefiles/cf-internal.h
@@ -0,0 +1,414 @@
+/* General netfs cache on cache files internal defs
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/fscache-cache.h>
+#include <linux/timer.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+#include <linux/security.h>
+
+struct cachefiles_cache;
+struct cachefiles_object;
+
+extern unsigned cachefiles_debug;
+#define CACHEFILES_DEBUG_KENTER 1
+#define CACHEFILES_DEBUG_KLEAVE 2
+#define CACHEFILES_DEBUG_KDEBUG 4
+
+/*
+ * node records
+ */
+struct cachefiles_object {
+ struct fscache_object fscache; /* fscache handle */
+ struct cachefiles_lookup_data *lookup_data; /* cached lookup data */
+ struct dentry *dentry; /* the file/dir representing this object */
+ struct dentry *backer; /* backing file */
+ loff_t i_size; /* object size */
+ unsigned long flags;
+#define CACHEFILES_OBJECT_ACTIVE 0 /* T if marked active */
+ atomic_t usage; /* object usage count */
+ uint8_t type; /* object type */
+ uint8_t new; /* T if object new */
+ spinlock_t work_lock;
+ struct rb_node active_node; /* link in active tree (dentry is key) */
+};
+
+extern struct kmem_cache *cachefiles_object_jar;
+
+/*
+ * Cache files cache definition
+ */
+struct cachefiles_cache {
+ struct fscache_cache cache; /* FS-Cache record */
+ struct vfsmount *mnt; /* mountpoint holding the cache */
+ struct dentry *graveyard; /* directory into which dead objects go */
+ struct file *cachefilesd; /* manager daemon handle */
+ struct mutex daemon_mutex; /* command serialisation mutex */
+ wait_queue_head_t daemon_pollwq; /* poll waitqueue for daemon */
+ struct rb_root active_nodes; /* active nodes (can't be culled) */
+ rwlock_t active_lock; /* lock for active_nodes */
+ atomic_t gravecounter; /* graveyard uniquifier */
+ u32 access_secid; /* cache access security ID */
+ u32 cache_secid; /* cache fs object security ID */
+ unsigned frun_percent; /* when to stop culling (% files) */
+ unsigned fcull_percent; /* when to start culling (% files) */
+ unsigned fstop_percent; /* when to stop allocating (% files) */
+ unsigned brun_percent; /* when to stop culling (% blocks) */
+ unsigned bcull_percent; /* when to start culling (% blocks) */
+ unsigned bstop_percent; /* when to stop allocating (% blocks) */
+ unsigned bsize; /* cache's block size */
+ unsigned bshift; /* min(ilog2(PAGE_SIZE / bsize), 0) */
+ uint64_t frun; /* when to stop culling */
+ uint64_t fcull; /* when to start culling */
+ uint64_t fstop; /* when to stop allocating */
+ sector_t brun; /* when to stop culling */
+ sector_t bcull; /* when to start culling */
+ sector_t bstop; /* when to stop allocating */
+ unsigned long flags;
+#define CACHEFILES_READY 0 /* T if cache prepared */
+#define CACHEFILES_DEAD 1 /* T if cache dead */
+#define CACHEFILES_CULLING 2 /* T if cull engaged */
+#define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */
+ char *rootdirname; /* name of cache root directory */
+ char *tag; /* cache binding tag */
+};
+
+/*
+ * backing file read tracking
+ */
+struct cachefiles_one_read {
+ wait_queue_t monitor; /* link into monitored waitqueue */
+ struct page *back_page; /* backing file page we're waiting for */
+ struct page *netfs_page; /* netfs page we're going to fill */
+ struct fscache_retrieval *op; /* retrieval op covering this */
+ struct list_head op_link; /* link in op's todo list */
+};
+
+/*
+ * backing file write tracking
+ */
+struct cachefiles_one_write {
+ struct page *netfs_page; /* netfs page to copy */
+ struct cachefiles_object *object;
+ struct list_head obj_link; /* link in object's lists */
+ fscache_rw_complete_t end_io_func;
+ void *context;
+};
+
+/*
+ * auxiliary data xattr buffer
+ */
+struct cachefiles_xattr {
+ uint16_t len;
+ uint8_t type;
+ uint8_t data[];
+};
+
+/*
+ * note change of state for daemon
+ */
+static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
+{
+ set_bit(CACHEFILES_STATE_CHANGED, &cache->flags);
+ wake_up_all(&cache->daemon_pollwq);
+}
+
+/*
+ * cf-bind.c
+ */
+extern int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args);
+extern void cachefiles_daemon_unbind(struct cachefiles_cache *cache);
+
+/*
+ * cf-daemon.c
+ */
+extern const struct file_operations cachefiles_daemon_fops;
+
+extern int cachefiles_has_space(struct cachefiles_cache *cache,
+ unsigned fnr, unsigned bnr);
+
+/*
+ * cf-interface.c
+ */
+extern const struct fscache_cache_ops cachefiles_cache_ops;
+
+/*
+ * cf-key.c
+ */
+extern char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type);
+
+/*
+ * cf-namei.c
+ */
+extern int cachefiles_delete_object(struct cachefiles_cache *cache,
+ struct cachefiles_object *object);
+extern int cachefiles_walk_to_object(struct cachefiles_object *parent,
+ struct cachefiles_object *object,
+ const char *key,
+ struct cachefiles_xattr *auxdata);
+extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ const char *name);
+
+extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename);
+
+extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
+ struct dentry *dir, char *filename);
+
+/*
+ * cf-proc.c
+ */
+#ifdef CONFIG_CACHEFILES_HISTOGRAM
+extern atomic_t cachefiles_lookup_histogram[HZ];
+extern atomic_t cachefiles_mkdir_histogram[HZ];
+extern atomic_t cachefiles_create_histogram[HZ];
+
+extern int __init cachefiles_proc_init(void);
+extern void cachefiles_proc_cleanup(void);
+static inline
+void cachefiles_hist(atomic_t histogram[], unsigned long start_jif)
+{
+ unsigned long jif = jiffies - start_jif;
+ if (jif >= HZ)
+ jif = HZ - 1;
+ atomic_inc(&histogram[jif]);
+}
+
+#else
+#define cachefiles_proc_init() (0)
+#define cachefiles_proc_cleanup() do {} while(0)
+#define cachefiles_hist(hist, start_jif) do {} while(0)
+#endif
+
+/*
+ * cf-rdwr.c
+ */
+extern int cachefiles_read_or_alloc_page(struct fscache_retrieval *,
+ struct page *, gfp_t);
+extern int cachefiles_read_or_alloc_pages(struct fscache_retrieval *,
+ struct list_head *, unsigned *,
+ gfp_t);
+extern int cachefiles_allocate_page(struct fscache_retrieval *, struct page *,
+ gfp_t);
+extern int cachefiles_allocate_pages(struct fscache_retrieval *,
+ struct list_head *, unsigned *, gfp_t);
+extern int cachefiles_write_page(struct fscache_storage *, struct page *);
+extern void cachefiles_uncache_page(struct fscache_object *, struct page *);
+
+/*
+ * cf-security.c
+ */
+#ifdef CONFIG_SECURITY
+extern int cachefiles_get_security_ID(struct cachefiles_cache *cache);
+extern int cachefiles_determine_cache_secid(struct cachefiles_cache *cache,
+ struct dentry *root);
+static inline
+void cachefiles_set_fscreate_secid(struct cachefiles_cache *cache)
+{
+ security_set_fscreate_secid(cache->cache_secid);
+}
+#else
+#define cachefiles_get_security_ID(cache) (0)
+#define cachefiles_determine_cache_secid(cache, root) (0)
+#define cachefiles_set_fscreate_secid(cache) do {} while(0)
+#endif
+
+struct cachefiles_secctx {
+ uid_t fsuid; /* save for current->fsuid */
+ gid_t fsgid; /* save for current->fsgid */
+#ifdef CONFIG_SECURITY
+ u32 fscreate_secid; /* save for current fscreate security ID */
+#endif
+};
+
+static inline void cachefiles_begin_secure_nofs(struct cachefiles_cache *cache,
+ struct cachefiles_secctx *ctx)
+{
+#ifdef CONFIG_SECURITY
+ security_act_as_secid(cache->access_secid);
+ ctx->fscreate_secid = security_get_fscreate_secid();
+#endif
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+}
+
+static inline void cachefiles_begin_secure(struct cachefiles_cache *cache,
+ struct cachefiles_secctx *ctx)
+{
+#ifdef CONFIG_SECURITY
+ security_act_as_secid(cache->access_secid);
+ ctx->fscreate_secid = security_set_fscreate_secid(cache->cache_secid);
+#endif
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+}
+
+static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
+ const struct cachefiles_secctx *ctx)
+{
+ current->fsuid = ctx->fsuid;
+ current->fsgid = ctx->fsgid;
+#ifdef CONFIG_SECURITY
+ security_set_fscreate_secid(ctx->fscreate_secid);
+ security_act_as_self();
+#endif
+}
+
+/*
+ * cf-xattr.c
+ */
+extern int cachefiles_check_object_type(struct cachefiles_object *object);
+extern int cachefiles_set_object_xattr(struct cachefiles_object *object,
+ struct cachefiles_xattr *auxdata);
+extern int cachefiles_update_object_xattr(struct cachefiles_object *object,
+ struct cachefiles_xattr *auxdata);
+extern int cachefiles_check_object_xattr(struct cachefiles_object *object,
+ struct cachefiles_xattr *auxdata);
+extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
+ struct dentry *dentry);
+
+
+/*
+ * error handling
+ */
+#define kerror(FMT,...) printk(KERN_ERR "CacheFiles: "FMT"\n" ,##__VA_ARGS__);
+
+#define cachefiles_io_error(___cache, FMT, ...) \
+do { \
+ kerror("I/O Error: " FMT ,##__VA_ARGS__); \
+ fscache_io_error(&(___cache)->cache); \
+ set_bit(CACHEFILES_DEAD, &(___cache)->flags); \
+} while(0)
+
+#define cachefiles_io_error_obj(object, FMT, ...) \
+do { \
+ struct cachefiles_cache *___cache; \
+ \
+ ___cache = container_of((object)->fscache.cache, \
+ struct cachefiles_cache, cache); \
+ cachefiles_io_error(___cache, FMT ,##__VA_ARGS__); \
+} while(0)
+
+
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT,...) \
+ printk("[%-6.6s] "FMT"\n",current->comm ,##__VA_ARGS__)
+
+/* make sure we maintain the format strings, even when debugging is disabled */
+static inline void _dbprintk(const char *fmt, ...)
+ __attribute__((format(printf,1,2)));
+static inline void _dbprintk(const char *fmt, ...)
+{
+}
+
+#define kenter(FMT,...) dbgprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define kleave(FMT,...) dbgprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define kdebug(FMT,...) dbgprintk(FMT ,##__VA_ARGS__)
+
+
+#if defined(__KDEBUG)
+#define _enter(FMT,...) kenter(FMT,##__VA_ARGS__)
+#define _leave(FMT,...) kleave(FMT,##__VA_ARGS__)
+#define _debug(FMT,...) kdebug(FMT,##__VA_ARGS__)
+
+#elif defined(CONFIG_CACHEFILES_DEBUG)
+#define _enter(FMT,...) \
+do { \
+ if (cachefiles_debug & CACHEFILES_DEBUG_KENTER) \
+ kenter(FMT,##__VA_ARGS__); \
+} while (0)
+
+#define _leave(FMT,...) \
+do { \
+ if (cachefiles_debug & CACHEFILES_DEBUG_KLEAVE) \
+ kleave(FMT,##__VA_ARGS__); \
+} while (0)
+
+#define _debug(FMT,...) \
+do { \
+ if (cachefiles_debug & CACHEFILES_DEBUG_KDEBUG) \
+ kdebug(FMT,##__VA_ARGS__); \
+} while (0)
+
+#else
+#define _enter(FMT,...) _dbprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define _leave(FMT,...) _dbprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define _debug(FMT,...) _dbprintk(FMT ,##__VA_ARGS__)
+#endif
+
+#if 1 // defined(__KDEBUGALL)
+
+#define ASSERT(X) \
+do { \
+ if (unlikely(!(X))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "CacheFiles: Assertion failed\n"); \
+ BUG(); \
+ } \
+} while(0)
+
+#define ASSERTCMP(X, OP, Y) \
+do { \
+ if (unlikely(!((X) OP (Y)))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "CacheFiles: Assertion failed\n"); \
+ printk(KERN_ERR "%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while(0)
+
+#define ASSERTIF(C, X) \
+do { \
+ if (unlikely((C) && !(X))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "CacheFiles: Assertion failed\n"); \
+ BUG(); \
+ } \
+} while(0)
+
+#define ASSERTIFCMP(C, X, OP, Y) \
+do { \
+ if (unlikely((C) && !((X) OP (Y)))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "CacheFiles: Assertion failed\n"); \
+ printk(KERN_ERR "%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while(0)
+
+#else
+
+#define ASSERT(X) \
+do { \
+} while(0)
+
+#define ASSERTCMP(X, OP, Y) \
+do { \
+} while(0)
+
+#define ASSERTIF(C, X) \
+do { \
+} while(0)
+
+#define ASSERTIFCMP(C, X, OP, Y) \
+do { \
+} while(0)
+
+#endif
diff --git a/fs/cachefiles/cf-key.c b/fs/cachefiles/cf-key.c
new file mode 100644
index 0000000..6956eec
--- /dev/null
+++ b/fs/cachefiles/cf-key.c
@@ -0,0 +1,159 @@
+/* Key to pathname encoder
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/slab.h>
+#include "cf-internal.h"
+
+static const char cachefiles_charmap[64] =
+ "0123456789" /* 0 - 9 */
+ "abcdefghijklmnopqrstuvwxyz" /* 10 - 35 */
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ" /* 36 - 61 */
+ "_-" /* 62 - 63 */
+ ;
+
+static const char cachefiles_filecharmap[256] = {
+ /* we skip space and tab and control chars */
+ [ 33 ... 46 ] = 1, /* '!' -> '.' */
+ /* we skip '/' as it's significant to pathwalk */
+ [ 48 ... 127 ] = 1, /* '0' -> '~' */
+};
+
+/*
+ * turn the raw key into something cooked
+ * - the raw key should include the length in the two bytes at the front
+ * - the key may be up to 514 bytes in length (including the length word)
+ * - "base64" encode the strange keys, mapping 3 bytes of raw to four of
+ * cooked
+ * - need to cut the cooked key into 252 char lengths (189 raw bytes)
+ */
+char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type)
+{
+ unsigned char csum, ch;
+ unsigned int acc;
+ char *key;
+ int loop, len, max, seg, mark, print;
+
+ _enter(",%d", keylen);
+
+ BUG_ON(keylen < 2 || keylen > 514);
+
+ csum = raw[0] + raw[1];
+ print = 1;
+ for (loop = 2; loop < keylen; loop++) {
+ ch = raw[loop];
+ csum += ch;
+ print &= cachefiles_filecharmap[ch];
+ }
+
+ if (print) {
+ /* if the path is usable ASCII, then we render it directly */
+ max = keylen - 2;
+ max += 2; /* two base64'd length chars on the front */
+ max += 5; /* @checksum/M */
+ max += 3 * 2; /* maximum number of segment dividers (".../M")
+ * is ((514 + 251) / 252) = 3
+ */
+ max += 1; /* NUL on end */
+ } else {
+ /* calculate the maximum length of the cooked key */
+ keylen = (keylen + 2) / 3;
+
+ max = keylen * 4;
+ max += 5; /* @checksum/M */
+ max += 3 * 2; /* maximum number of segment dividers (".../M")
+ * is ((514 + 188) / 189) = 3
+ */
+ max += 1; /* NUL on end */
+ }
+
+ max += 1; /* 2nd NUL on end */
+
+ _debug("max: %d", max);
+
+ key = kmalloc(max, GFP_KERNEL);
+ if (!key)
+ return NULL;
+
+ len = 0;
+
+ /* build the cooked key */
+ sprintf(key, "@%02x%c+", (unsigned) csum, 0);
+ len = 5;
+ mark = len - 1;
+
+ if (print) {
+ acc = *(uint16_t *) raw;
+ raw += 2;
+
+ key[len + 1] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ key[len] = cachefiles_charmap[acc & 63];
+ len += 2;
+
+ seg = 250;
+ for (loop = keylen; loop > 0; loop--) {
+ if (seg <= 0) {
+ key[len++] = '\0';
+ mark = len;
+ key[len++] = '+';
+ seg = 252;
+ }
+
+ key[len++] = *raw++;
+ ASSERT(len < max);
+ }
+
+ switch (type) {
+ case FSCACHE_COOKIE_TYPE_INDEX: type = 'I'; break;
+ case FSCACHE_COOKIE_TYPE_DATAFILE: type = 'D'; break;
+ default: type = 'S'; break;
+ }
+ } else {
+ seg = 252;
+ for (loop = keylen; loop > 0; loop--) {
+ if (seg <= 0) {
+ key[len++] = '\0';
+ mark = len;
+ key[len++] = '+';
+ seg = 252;
+ }
+
+ acc = *raw++;
+ acc |= *raw++ << 8;
+ acc |= *raw++ << 16;
+
+ _debug("acc: %06x", acc);
+
+ key[len++] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ key[len++] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ key[len++] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ key[len++] = cachefiles_charmap[acc & 63];
+
+ ASSERT(len < max);
+ }
+
+ switch (type) {
+ case FSCACHE_COOKIE_TYPE_INDEX: type = 'J'; break;
+ case FSCACHE_COOKIE_TYPE_DATAFILE: type = 'E'; break;
+ default: type = 'T'; break;
+ }
+ }
+
+ key[mark] = type;
+ key[len++] = 0;
+ key[len] = 0;
+
+ _leave(" = %p %d", key, len);
+ return key;
+}
diff --git a/fs/cachefiles/cf-main.c b/fs/cachefiles/cf-main.c
new file mode 100644
index 0000000..67bfdb3
--- /dev/null
+++ b/fs/cachefiles/cf-main.c
@@ -0,0 +1,109 @@
+/* Network filesystem caching backend to use cache files on a premounted
+ * filesystem
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/sysctl.h>
+#include <linux/miscdevice.h>
+#include "cf-internal.h"
+
+unsigned cachefiles_debug;
+module_param_named(debug, cachefiles_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(cachefiles_debug, "CacheFiles debugging mask");
+
+MODULE_DESCRIPTION("Mounted-filesystem based cache");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+struct kmem_cache *cachefiles_object_jar;
+
+static struct miscdevice cachefiles_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "cachefiles",
+ .fops = &cachefiles_daemon_fops,
+};
+
+static void cachefiles_object_init_once(void *_object,
+ struct kmem_cache *cachep,
+ unsigned long flags)
+{
+ struct cachefiles_object *object = _object;
+
+ memset(object, 0, sizeof(*object));
+ fscache_object_init(&object->fscache);
+ spin_lock_init(&object->work_lock);
+}
+
+/*
+ * initialise the fs caching module
+ */
+static int __init cachefiles_init(void)
+{
+ int ret;
+
+ ret = misc_register(&cachefiles_dev);
+ if (ret < 0)
+ goto error_dev;
+
+ /* create an object jar */
+ ret = -ENOMEM;
+ cachefiles_object_jar =
+ kmem_cache_create("cachefiles_object_jar",
+ sizeof(struct cachefiles_object),
+ 0,
+ SLAB_HWCACHE_ALIGN,
+ cachefiles_object_init_once);
+ if (!cachefiles_object_jar) {
+ printk(KERN_NOTICE
+ "CacheFiles: Failed to allocate an object jar\n");
+ goto error_object_jar;
+ }
+
+ ret = cachefiles_proc_init();
+ if (ret < 0)
+ goto error_proc;
+
+ printk(KERN_INFO "CacheFiles: Loaded\n");
+ return 0;
+
+error_proc:
+ kmem_cache_destroy(cachefiles_object_jar);
+error_object_jar:
+ misc_deregister(&cachefiles_dev);
+error_dev:
+ kerror("failed to register: %d", ret);
+ return ret;
+}
+
+fs_initcall(cachefiles_init);
+
+/*
+ * clean up on module removal
+ */
+static void __exit cachefiles_exit(void)
+{
+ printk(KERN_INFO "CacheFiles: Unloading\n");
+
+ cachefiles_proc_cleanup();
+ kmem_cache_destroy(cachefiles_object_jar);
+ misc_deregister(&cachefiles_dev);
+}
+
+module_exit(cachefiles_exit);
diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
new file mode 100644
index 0000000..809ddcb
--- /dev/null
+++ b/fs/cachefiles/cf-namei.c
@@ -0,0 +1,743 @@
+/* CacheFiles path walking and related routines
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/quotaops.h>
+#include <linux/xattr.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/security.h>
+#include "cf-internal.h"
+
+static int cachefiles_wait_bit(void *flags)
+{
+ schedule();
+ return 0;
+}
+
+/*
+ * record the fact that an object is now active
+ */
+static void cachefiles_mark_object_active(struct cachefiles_cache *cache,
+ struct cachefiles_object *object)
+{
+ struct cachefiles_object *xobject;
+ struct rb_node **_p, *_parent = NULL;
+ struct dentry *dentry;
+
+ _enter(",%p", object);
+
+try_again:
+ write_lock(&cache->active_lock);
+
+ if (test_and_set_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags))
+ BUG();
+
+ dentry = object->dentry;
+ _p = &cache->active_nodes.rb_node;
+ while (*_p) {
+ _parent = *_p;
+ xobject = rb_entry(_parent,
+ struct cachefiles_object, active_node);
+
+ if (xobject->dentry > dentry)
+ _p = &(*_p)->rb_left;
+ else if (xobject->dentry < dentry)
+ _p = &(*_p)->rb_right;
+ else
+ goto wait_for_old_object;
+ }
+
+ rb_link_node(&object->active_node, _parent, _p);
+ rb_insert_color(&object->active_node, &cache->active_nodes);
+
+ write_unlock(&cache->active_lock);
+ _leave("");
+ return;
+
+ /* an old object from a previous incarnation is hogging the slot - we
+ * need to wait for it to be destroyed */
+wait_for_old_object:
+ _debug("old OBJ%x", xobject->fscache.debug_id);
+ ASSERTCMP(xobject->fscache.state, >=, FSCACHE_OBJECT_DYING);
+ atomic_inc(&xobject->usage);
+ //clear_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags);
+ write_unlock(&cache->active_lock);
+
+ _debug(">>> wait");
+ wait_on_bit(&xobject->flags, CACHEFILES_OBJECT_ACTIVE,
+ cachefiles_wait_bit, TASK_UNINTERRUPTIBLE);
+ _debug("<<< waited");
+
+ cache->cache.ops->put_object(&xobject->fscache);
+ goto try_again;
+}
+
+/*
+ * delete an object representation from the cache
+ * - file backed objects are unlinked
+ * - directory backed objects are stuffed into the graveyard for userspace to
+ * delete
+ * - unlocks the directory mutex
+ */
+static int cachefiles_bury_object(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ struct dentry *rep)
+{
+ struct dentry *grave, *trap;
+ char nbuffer[8 + 8 + 1];
+ int ret;
+
+ _enter(",'%*.*s','%*.*s'",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name,
+ rep->d_name.len, rep->d_name.len, rep->d_name.name);
+
+ /* non-directories can just be unlinked */
+ if (!S_ISDIR(rep->d_inode->i_mode)) {
+ _debug("unlink stale object");
+ ret = vfs_unlink(dir->d_inode, rep);
+
+ mutex_unlock(&dir->d_inode->i_mutex);
+
+ if (ret == -EIO)
+ cachefiles_io_error(cache, "Unlink failed");
+
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ /* directories have to be moved to the graveyard */
+ _debug("move stale object to graveyard");
+ mutex_unlock(&dir->d_inode->i_mutex);
+
+try_again:
+ /* first step is to make up a grave dentry in the graveyard */
+ sprintf(nbuffer, "%08x%08x",
+ (uint32_t) xtime.tv_sec,
+ (uint32_t) atomic_inc_return(&cache->gravecounter));
+
+ /* do the multiway lock magic */
+ trap = lock_rename(cache->graveyard, dir);
+
+ /* do some checks before getting the grave dentry */
+ if (rep->d_parent != dir) {
+ /* the entry was probably culled when we dropped the parent dir
+ * lock */
+ unlock_rename(cache->graveyard, dir);
+ _leave(" = 0 [culled?]");
+ return 0;
+ }
+
+ if (!S_ISDIR(cache->graveyard->d_inode->i_mode)) {
+ unlock_rename(cache->graveyard, dir);
+ cachefiles_io_error(cache, "Graveyard no longer a directory");
+ return -EIO;
+ }
+
+ if (trap == rep) {
+ unlock_rename(cache->graveyard, dir);
+ cachefiles_io_error(cache, "May not make directory loop");
+ return -EIO;
+ }
+
+ if (d_mountpoint(rep)) {
+ unlock_rename(cache->graveyard, dir);
+ cachefiles_io_error(cache, "Mountpoint in cache");
+ return -EIO;
+ }
+
+ grave = lookup_one_len(nbuffer, cache->graveyard, strlen(nbuffer));
+ if (IS_ERR(grave)) {
+ unlock_rename(cache->graveyard, dir);
+
+ if (PTR_ERR(grave) == -ENOMEM) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ cachefiles_io_error(cache, "Lookup error %ld",
+ PTR_ERR(grave));
+ return -EIO;
+ }
+
+ if (grave->d_inode) {
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ grave = NULL;
+ cond_resched();
+ goto try_again;
+ }
+
+ if (d_mountpoint(grave)) {
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ cachefiles_io_error(cache, "Mountpoint in graveyard");
+ return -EIO;
+ }
+
+ /* target should not be an ancestor of source */
+ if (trap == grave) {
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ cachefiles_io_error(cache, "May not make directory loop");
+ return -EIO;
+ }
+
+ /* attempt the rename */
+ ret = vfs_rename(dir->d_inode, rep, cache->graveyard->d_inode, grave);
+ if (ret != 0 && ret != -ENOMEM)
+ cachefiles_io_error(cache, "Rename failed with error %d", ret);
+
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ _leave(" = 0");
+ return 0;
+}
+
+/*
+ * delete an object representation from the cache
+ */
+int cachefiles_delete_object(struct cachefiles_cache *cache,
+ struct cachefiles_object *object)
+{
+ struct dentry *dir;
+ int ret;
+
+ _enter(",{%p}", object->dentry);
+
+ ASSERT(object->dentry);
+ ASSERT(object->dentry->d_inode);
+ ASSERT(object->dentry->d_parent);
+
+ dir = dget_parent(object->dentry);
+
+ mutex_lock(&dir->d_inode->i_mutex);
+ ret = cachefiles_bury_object(cache, dir, object->dentry);
+
+ dput(dir);
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * walk from the parent object to the child object through the backing
+ * filesystem, creating directories as we go
+ */
+int cachefiles_walk_to_object(struct cachefiles_object *parent,
+ struct cachefiles_object *object,
+ const char *key,
+ struct cachefiles_xattr *auxdata)
+{
+ struct cachefiles_cache *cache;
+ struct dentry *dir, *next = NULL;
+ unsigned long start;
+ const char *name;
+ int ret, nlen;
+
+ _enter("{%p},,%s,", parent->dentry, key);
+
+ cache = container_of(parent->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ ASSERT(parent->dentry);
+ ASSERT(parent->dentry->d_inode);
+
+ if (!(S_ISDIR(parent->dentry->d_inode->i_mode))) {
+ // TODO: convert file to dir
+ _leave("looking up in none directory");
+ return -ENOBUFS;
+ }
+
+ dir = dget(parent->dentry);
+
+advance:
+ /* attempt to transit the first directory component */
+ name = key;
+ nlen = strlen(key);
+
+ /* key ends in a double NUL */
+ key = key + nlen + 1;
+ if (!*key)
+ key = NULL;
+
+lookup_again:
+ /* search the current directory for the element name */
+ _debug("lookup '%s'", name);
+
+ mutex_lock(&dir->d_inode->i_mutex);
+
+ start = jiffies;
+ next = lookup_one_len(name, dir, nlen);
+ cachefiles_hist(cachefiles_lookup_histogram, start);
+ if (IS_ERR(next))
+ goto lookup_error;
+
+ _debug("next -> %p %s", next, next->d_inode ? "positive" : "negative");
+
+ if (!key)
+ object->new = !next->d_inode;
+
+ /* if this element of the path doesn't exist, then the lookup phase
+ * failed, and we can release any readers in the certain knowledge that
+ * there's nothing for them to actually read */
+ if (!next->d_inode)
+ fscache_object_lookup_negative(&object->fscache);
+
+ /* we need to create the object if it's negative */
+ if (key || object->type == FSCACHE_COOKIE_TYPE_INDEX) {
+ /* index objects and intervening tree levels must be subdirs */
+ if (!next->d_inode) {
+ // TODO advance object state
+ ret = cachefiles_has_space(cache, 1, 0);
+ if (ret < 0)
+ goto create_error;
+
+ start = jiffies;
+ ret = vfs_mkdir(dir->d_inode, next, 0);
+ cachefiles_hist(cachefiles_mkdir_histogram, start);
+ if (ret < 0)
+ goto create_error;
+
+ ASSERT(next->d_inode);
+
+ _debug("mkdir -> %p{%p{ino=%lu}}",
+ next, next->d_inode, next->d_inode->i_ino);
+
+ } else if (!S_ISDIR(next->d_inode->i_mode)) {
+ kerror("inode %lu is not a directory",
+ next->d_inode->i_ino);
+ ret = -ENOBUFS;
+ goto error;
+ }
+
+ } else {
+ /* non-index objects start out life as files */
+ if (!next->d_inode) {
+ // TODO advance object state
+ ret = cachefiles_has_space(cache, 1, 0);
+ if (ret < 0)
+ goto create_error;
+
+ start = jiffies;
+ ret = vfs_create(dir->d_inode, next, S_IFREG, NULL);
+ cachefiles_hist(cachefiles_create_histogram, start);
+ if (ret < 0)
+ goto create_error;
+
+ ASSERT(next->d_inode);
+
+ _debug("create -> %p{%p{ino=%lu}}",
+ next, next->d_inode, next->d_inode->i_ino);
+
+ } else if (!S_ISDIR(next->d_inode->i_mode) &&
+ !S_ISREG(next->d_inode->i_mode)
+ ) {
+ kerror("inode %lu is not a file or directory",
+ next->d_inode->i_ino);
+ ret = -ENOBUFS;
+ goto error;
+ }
+ }
+
+ /* process the next component */
+ if (key) {
+ _debug("advance");
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(dir);
+ dir = next;
+ next = NULL;
+ goto advance;
+ }
+
+ /* we've found the object we were looking for */
+ object->dentry = next;
+
+ /* if we've found that the terminal object exists, then we need to
+ * check its attributes and delete it if it's out of date */
+ if (!object->new) {
+ _debug("validate '%*.*s'",
+ next->d_name.len, next->d_name.len, next->d_name.name);
+
+ ret = cachefiles_check_object_xattr(object, auxdata);
+ if (ret == -ESTALE) {
+ /* delete the object (the deleter drops the directory
+ * mutex) */
+ object->dentry = NULL;
+
+ ret = cachefiles_bury_object(cache, dir, next);
+ dput(next);
+ next = NULL;
+
+ if (ret < 0)
+ goto delete_error;
+
+ _debug("redo lookup");
+ goto lookup_again;
+ }
+ }
+
+ /* note that we're now using this object */
+ cachefiles_mark_object_active(cache, object);
+
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(dir);
+ dir = NULL;
+
+ _debug("=== OBTAINED_OBJECT ===");
+
+ if (object->new) {
+ /* attach data to a newly constructed terminal object */
+ ret = cachefiles_set_object_xattr(object, auxdata);
+ if (ret < 0)
+ goto check_error;
+ } else {
+ /* always update the atime on an object we've just looked up
+ * (this is used to keep track of culling, and atimes are only
+ * updated by read, write and readdir but not lookup or
+ * open) */
+ touch_atime(cache->mnt, next);
+ }
+
+ /* open a file interface onto a data file */
+ if (object->type != FSCACHE_COOKIE_TYPE_INDEX) {
+ if (S_ISREG(object->dentry->d_inode->i_mode)) {
+ const struct address_space_operations *aops;
+
+ ret = -EPERM;
+ aops = object->dentry->d_inode->i_mapping->a_ops;
+ if (!aops->bmap ||
+ !aops->prepare_write ||
+ !aops->commit_write ||
+ !aops->write_one_page)
+ goto check_error;
+
+ object->backer = object->dentry;
+ } else {
+ BUG(); // TODO: open file in data-class subdir
+ }
+ }
+
+ object->new = 0;
+ fscache_obtained_object(&object->fscache);
+
+ _leave(" = 0 [%lu]", object->dentry->d_inode->i_ino);
+ return 0;
+
+create_error:
+ _debug("create error %d", ret);
+ if (ret == -EIO)
+ cachefiles_io_error(cache, "Create/mkdir failed");
+ goto error;
+
+check_error:
+ _debug("check error %d", ret);
+ write_lock(&cache->active_lock);
+ rb_erase(&object->active_node, &cache->active_nodes);
+ write_unlock(&cache->active_lock);
+
+ dput(object->dentry);
+ object->dentry = NULL;
+ goto error_out;
+
+delete_error:
+ _debug("delete error %d", ret);
+ goto error_out2;
+
+lookup_error:
+ _debug("lookup error %ld", PTR_ERR(next));
+ ret = PTR_ERR(next);
+ if (ret == -EIO)
+ cachefiles_io_error(cache, "Lookup failed");
+ next = NULL;
+error:
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(next);
+error_out2:
+ dput(dir);
+error_out:
+ if (ret == -ENOSPC)
+ ret = -ENOBUFS;
+
+ _leave(" = error %d", -ret);
+ return ret;
+}
+
+/*
+ * get a subdirectory
+ */
+struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ const char *dirname)
+{
+ struct dentry *subdir;
+ unsigned long start;
+ int ret;
+
+ _enter(",,%s", dirname);
+
+ /* search the current directory for the element name */
+ mutex_lock(&dir->d_inode->i_mutex);
+
+ start = jiffies;
+ subdir = lookup_one_len(dirname, dir, strlen(dirname));
+ cachefiles_hist(cachefiles_lookup_histogram, start);
+ if (IS_ERR(subdir)) {
+ if (PTR_ERR(subdir) == -ENOMEM)
+ goto nomem_d_alloc;
+ goto lookup_error;
+ }
+
+ _debug("subdir -> %p %s",
+ subdir, subdir->d_inode ? "positive" : "negative");
+
+ /* we need to create the subdir if it doesn't exist yet */
+ if (!subdir->d_inode) {
+ ret = cachefiles_has_space(cache, 1, 0);
+ if (ret < 0)
+ goto mkdir_error;
+
+ _debug("attempt mkdir");
+
+ ret = vfs_mkdir(dir->d_inode, subdir, 0700);
+ if (ret < 0)
+ goto mkdir_error;
+
+ ASSERT(subdir->d_inode);
+
+ _debug("mkdir -> %p{%p{ino=%lu}}",
+ subdir,
+ subdir->d_inode,
+ subdir->d_inode->i_ino);
+ }
+
+ mutex_unlock(&dir->d_inode->i_mutex);
+
+ /* we need to make sure the subdir is a directory */
+ ASSERT(subdir->d_inode);
+
+ if (!S_ISDIR(subdir->d_inode->i_mode)) {
+ kerror("%s is not a directory", dirname);
+ ret = -EIO;
+ goto check_error;
+ }
+
+ ret = -EPERM;
+ if (!subdir->d_inode->i_op ||
+ !subdir->d_inode->i_op->setxattr ||
+ !subdir->d_inode->i_op->getxattr ||
+ !subdir->d_inode->i_op->lookup ||
+ !subdir->d_inode->i_op->mkdir ||
+ !subdir->d_inode->i_op->create ||
+ !subdir->d_inode->i_op->rename ||
+ !subdir->d_inode->i_op->rmdir ||
+ !subdir->d_inode->i_op->unlink)
+ goto check_error;
+
+ _leave(" = [%lu]", subdir->d_inode->i_ino);
+ return subdir;
+
+check_error:
+ dput(subdir);
+ _leave(" = %d [check]", ret);
+ return ERR_PTR(ret);
+
+mkdir_error:
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(subdir);
+ kerror("mkdir %s failed with error %d", dirname, ret);
+ return ERR_PTR(ret);
+
+lookup_error:
+ mutex_unlock(&dir->d_inode->i_mutex);
+ ret = PTR_ERR(subdir);
+ kerror("Lookup %s failed with error %d", dirname, ret);
+ return ERR_PTR(ret);
+
+nomem_d_alloc:
+ mutex_unlock(&dir->d_inode->i_mutex);
+ _leave(" = -ENOMEM");
+ return ERR_PTR(-ENOMEM);
+}
+
+/*
+ * find out if an object is in use or not
+ * - if finds object and it's not in use:
+ * - returns a pointer to the object and a reference on it
+ * - returns with the directory locked
+ */
+static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ char *filename)
+{
+ struct cachefiles_object *object;
+ struct rb_node *_n;
+ struct dentry *victim;
+ unsigned long start;
+ int ret;
+
+ _enter(",%*.*s/,%s",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+ /* look up the victim */
+ mutex_lock_nested(&dir->d_inode->i_mutex, 1);
+
+ start = jiffies;
+ victim = lookup_one_len(filename, dir, strlen(filename));
+ cachefiles_hist(cachefiles_lookup_histogram, start);
+ if (IS_ERR(victim))
+ goto lookup_error;
+
+ _debug("victim -> %p %s",
+ victim, victim->d_inode ? "positive" : "negative");
+
+ /* if the object is no longer there then we probably retired the object
+ * at the netfs's request whilst the cull was in progress
+ */
+ if (!victim->d_inode) {
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(victim);
+ _leave(" = -ENOENT [absent]");
+ return ERR_PTR(-ENOENT);
+ }
+
+ /* check to see if we're using this object */
+ read_lock(&cache->active_lock);
+
+ _n = cache->active_nodes.rb_node;
+
+ while (_n) {
+ object = rb_entry(_n, struct cachefiles_object, active_node);
+
+ if (object->dentry > victim)
+ _n = _n->rb_left;
+ else if (object->dentry < victim)
+ _n = _n->rb_right;
+ else
+ goto object_in_use;
+ }
+
+ read_unlock(&cache->active_lock);
+
+ _leave(" = %p", victim);
+ return victim;
+
+object_in_use:
+ read_unlock(&cache->active_lock);
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(victim);
+ _leave(" = -EBUSY [in use]");
+ return ERR_PTR(-EBUSY);
+
+lookup_error:
+ mutex_unlock(&dir->d_inode->i_mutex);
+ ret = PTR_ERR(victim);
+ if (ret == -ENOENT) {
+ /* file or dir now absent - probably retired by netfs */
+ _leave(" = -ESTALE [absent]");
+ return ERR_PTR(-ESTALE);
+ }
+
+ if (ret == -EIO) {
+ cachefiles_io_error(cache, "Lookup failed");
+ } else if (ret != -ENOMEM) {
+ kerror("Internal error: %d", ret);
+ ret = -EIO;
+ }
+
+ _leave(" = %d", ret);
+ return ERR_PTR(ret);
+}
+
+/*
+ * cull an object if it's not in use
+ * - called only by cache manager daemon
+ */
+int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+ int ret;
+
+ _enter(",%*.*s/,%s",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+ victim = cachefiles_check_active(cache, dir, filename);
+ if (IS_ERR(victim))
+ return PTR_ERR(victim);
+
+ _debug("victim -> %p %s",
+ victim, victim->d_inode ? "positive" : "negative");
+
+ /* okay... the victim is not being used so we can cull it
+ * - start by marking it as stale
+ */
+ _debug("victim is cullable");
+
+ ret = cachefiles_remove_object_xattr(cache, victim);
+ if (ret < 0)
+ goto error_unlock;
+
+ /* actually remove the victim (drops the dir mutex) */
+ _debug("bury");
+
+ ret = cachefiles_bury_object(cache, dir, victim);
+ if (ret < 0)
+ goto error;
+
+ dput(victim);
+ _leave(" = 0");
+ return 0;
+
+error_unlock:
+ mutex_unlock(&dir->d_inode->i_mutex);
+error:
+ dput(victim);
+ if (ret == -ENOENT) {
+ /* file or dir now absent - probably retired by netfs */
+ _leave(" = -ESTALE [absent]");
+ return -ESTALE;
+ }
+
+ if (ret != -ENOMEM) {
+ kerror("Internal error: %d", ret);
+ ret = -EIO;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * find out if an object is in use or not
+ * - called only by cache manager daemon
+ * - returns -EBUSY or 0 to indicate whether an object is in use or not
+ */
+int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+
+ _enter(",%*.*s/,%s",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+ victim = cachefiles_check_active(cache, dir, filename);
+ if (IS_ERR(victim))
+ return PTR_ERR(victim);
+
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(victim);
+ _leave(" = 0");
+ return 0;
+}
diff --git a/fs/cachefiles/cf-proc.c b/fs/cachefiles/cf-proc.c
new file mode 100644
index 0000000..c0d5444
--- /dev/null
+++ b/fs/cachefiles/cf-proc.c
@@ -0,0 +1,166 @@
+/* CacheFiles statistics
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include "cf-internal.h"
+
+struct cachefiles_proc {
+ unsigned nlines;
+ const struct seq_operations *ops;
+};
+
+atomic_t cachefiles_lookup_histogram[HZ];
+atomic_t cachefiles_mkdir_histogram[HZ];
+atomic_t cachefiles_create_histogram[HZ];
+
+static struct proc_dir_entry *proc_cachefiles;
+
+static int cachefiles_proc_open(struct inode *inode, struct file *file);
+static void *cachefiles_proc_start(struct seq_file *m, loff_t *pos);
+static void cachefiles_proc_stop(struct seq_file *m, void *v);
+static void *cachefiles_proc_next(struct seq_file *m, void *v, loff_t *pos);
+static int cachefiles_histogram_show(struct seq_file *m, void *v);
+
+static const struct file_operations cachefiles_proc_fops = {
+ .open = cachefiles_proc_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static const struct seq_operations cachefiles_histogram_ops = {
+ .start = cachefiles_proc_start,
+ .stop = cachefiles_proc_stop,
+ .next = cachefiles_proc_next,
+ .show = cachefiles_histogram_show,
+};
+
+static const struct cachefiles_proc cachefiles_histogram = {
+ .nlines = HZ + 1,
+ .ops = &cachefiles_histogram_ops,
+};
+
+/*
+ * initialise the /proc/fs/fscache/cachefiles/ directory
+ */
+int __init cachefiles_proc_init(void)
+{
+ struct proc_dir_entry *p;
+
+ _enter("");
+
+ proc_cachefiles = proc_mkdir("cachefiles", proc_fscache);
+ if (!proc_cachefiles)
+ goto error_dir;
+ proc_cachefiles->owner = THIS_MODULE;
+
+ p = create_proc_entry("histogram", 0, proc_cachefiles);
+ if (!p)
+ goto error_histogram;
+ p->proc_fops = &cachefiles_proc_fops;
+ p->owner = THIS_MODULE;
+ p->data = (void *) &cachefiles_histogram;
+
+ _leave(" = 0");
+ return 0;
+
+error_histogram:
+ remove_proc_entry("fs/cachefiles", NULL);
+error_dir:
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+}
+
+/*
+ * clean up the /proc/fs/fscache/cachefiles/ directory
+ */
+void cachefiles_proc_cleanup(void)
+{
+ remove_proc_entry("histogram", proc_cachefiles);
+ remove_proc_entry("cachefiles", proc_fscache);
+}
+
+/*
+ * open "/proc/fs/fscache/cachefiles/XXX" which provide statistics summaries
+ */
+static int cachefiles_proc_open(struct inode *inode, struct file *file)
+{
+ const struct cachefiles_proc *proc = PDE(inode)->data;
+ struct seq_file *m;
+ int ret;
+
+ ret = seq_open(file, proc->ops);
+ if (ret == 0) {
+ m = file->private_data;
+ m->private = (void *) proc;
+ }
+ return ret;
+}
+
+/*
+ * set up the iterator to start reading from the first line
+ */
+static void *cachefiles_proc_start(struct seq_file *m, loff_t *_pos)
+{
+ if (*_pos == 0)
+ *_pos = 1;
+ return (void *)(unsigned long) *_pos;
+}
+
+/*
+ * move to the next line
+ */
+static void *cachefiles_proc_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ const struct cachefiles_proc *proc = m->private;
+
+ (*pos)++;
+ return *pos > proc->nlines ? NULL : (void *)(unsigned long) *pos;
+}
+
+/*
+ * clean up after reading
+ */
+static void cachefiles_proc_stop(struct seq_file *m, void *v)
+{
+}
+
+/*
+ * display the time-taken histogram
+ */
+static int cachefiles_histogram_show(struct seq_file *m, void *v)
+{
+ unsigned long index;
+ unsigned x, y, z, t;
+
+ switch ((unsigned long) v) {
+ case 1:
+ seq_puts(m, "JIFS SECS LOOKUPS MKDIRS CREATES\n");
+ return 0;
+ case 2:
+ seq_puts(m, "===== ===== ========= ========= =========\n");
+ return 0;
+ default:
+ index = (unsigned long) v - 3;
+ x = atomic_read(&cachefiles_lookup_histogram[index]);
+ y = atomic_read(&cachefiles_mkdir_histogram[index]);
+ z = atomic_read(&cachefiles_create_histogram[index]);
+ if (x == 0 && y == 0 && z == 0)
+ return 0;
+
+ t = (index * 1000) / HZ;
+
+ seq_printf(m, "%4lu 0.%03u %9u %9u %9u\n", index, t, x, y, z);
+ return 0;
+ }
+}
diff --git a/fs/cachefiles/cf-security.c b/fs/cachefiles/cf-security.c
new file mode 100644
index 0000000..863cbc1
--- /dev/null
+++ b/fs/cachefiles/cf-security.c
@@ -0,0 +1,107 @@
+/* CacheFiles security management
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include "cf-internal.h"
+
+/*
+ * determine the security context within which we access the cache from within
+ * the kernel
+ */
+int cachefiles_get_security_ID(struct cachefiles_cache *cache)
+{
+ char *seclabel;
+ u32 seclen, daemon_secid;
+ int ret;
+
+ _enter("");
+
+ cache->access_secid = 0;
+
+ /* ask the security policy to tell us what security ID we should be
+ * using to access the cache, given the security ID that our daemon is
+ * using */
+ security_task_getsecid(current, &daemon_secid);
+
+ ret = security_secid_to_secctx(daemon_secid, &seclabel, &seclen);
+ if (ret < 0)
+ goto error;
+ _debug("Cache Daemon SecID: %x '%s'", daemon_secid, seclabel);
+ kfree(seclabel);
+
+ ret = security_cachefiles_get_secid(daemon_secid, &cache->access_secid);
+ if (ret < 0) {
+ printk(KERN_ERR "CacheFiles:"
+ " Security can't provide module SecID: error %d",
+ ret);
+ goto error;
+ }
+
+ ret = security_secid_to_secctx(cache->access_secid, &seclabel, &seclen);
+ if (ret < 0)
+ goto error;
+ _debug("Cache Module SecID: %x '%s'", cache->access_secid, seclabel);
+ kfree(seclabel);
+
+error:
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * check the security details of the on-disk cache
+ * - must be called with security imposed
+ */
+int cachefiles_determine_cache_secid(struct cachefiles_cache *cache,
+ struct dentry *root)
+{
+ char *seclabel;
+ u32 seclen;
+ int ret;
+
+ _enter("");
+
+ /* use the cache root dir's security ID as the SECID with which to create
+ * files */
+ cache->cache_secid = security_inode_get_secid(root->d_inode);
+
+ ret = security_secid_to_secctx(cache->cache_secid, &seclabel, &seclen);
+ if (ret < 0)
+ goto error;
+ _debug("Cache SecID: %x '%s'", cache->cache_secid, seclabel);
+ kfree(seclabel);
+
+ /* check that we have permission to create files and directories with
+ * the security ID we've been given */
+ ret = security_inode_mkdir(root->d_inode, root, 0);
+ if (ret < 0) {
+ printk(KERN_ERR "CacheFiles:"
+ " Security denies permission to make dirs: error %d",
+ ret);
+ goto error;
+ }
+
+ ret = security_inode_create(root->d_inode, root, 0);
+ if (ret < 0) {
+ printk(KERN_ERR "CacheFiles:"
+ " Security denies permission to create files: error %d",
+ ret);
+ goto error;
+ }
+
+error:
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
+ _leave(" = %d", ret);
+ return ret;
+}
diff --git a/fs/cachefiles/cf-xattr.c b/fs/cachefiles/cf-xattr.c
new file mode 100644
index 0000000..9ae9240
--- /dev/null
+++ b/fs/cachefiles/cf-xattr.c
@@ -0,0 +1,292 @@
+/* CacheFiles extended attribute management
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/quotaops.h>
+#include <linux/xattr.h>
+#include "cf-internal.h"
+
+static const char cachefiles_xattr_cache[] =
+ XATTR_USER_PREFIX "CacheFiles.cache";
+
+/*
+ * check the type label on an object
+ * - done using xattrs
+ */
+int cachefiles_check_object_type(struct cachefiles_object *object)
+{
+ struct dentry *dentry = object->dentry;
+ char type[3], xtype[3];
+ int ret;
+
+ ASSERT(dentry);
+ ASSERT(dentry->d_inode);
+
+ if (!object->fscache.cookie)
+ strcpy(type, "C3");
+ else
+ snprintf(type, 3, "%02x", object->fscache.cookie->def->type);
+
+ _enter("%p{%s}", object, type);
+
+ /* attempt to install a type label directly */
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache, type, 2,
+ XATTR_CREATE);
+ if (ret == 0) {
+ _debug("SET"); /* we succeeded */
+ goto error;
+ }
+
+ if (ret != -EEXIST) {
+ kerror("Can't set xattr on %*.*s [%lu] (err %d)",
+ dentry->d_name.len, dentry->d_name.len,
+ dentry->d_name.name, dentry->d_inode->i_ino,
+ -ret);
+ goto error;
+ }
+
+ /* read the current type label */
+ ret = vfs_getxattr(dentry, cachefiles_xattr_cache, xtype, 3);
+ if (ret < 0) {
+ if (ret == -ERANGE)
+ goto bad_type_length;
+
+ kerror("Can't read xattr on %*.*s [%lu] (err %d)",
+ dentry->d_name.len, dentry->d_name.len,
+ dentry->d_name.name, dentry->d_inode->i_ino,
+ -ret);
+ goto error;
+ }
+
+ /* check the type is what we're expecting */
+ if (ret != 2)
+ goto bad_type_length;
+
+ if (xtype[0] != type[0] || xtype[1] != type[1])
+ goto bad_type;
+
+ ret = 0;
+
+error:
+ _leave(" = %d", ret);
+ return ret;
+
+bad_type_length:
+ kerror("Cache object %lu type xattr length incorrect",
+ dentry->d_inode->i_ino);
+ ret = -EIO;
+ goto error;
+
+bad_type:
+ xtype[2] = 0;
+ kerror("Cache object %*.*s [%lu] type %s not %s",
+ dentry->d_name.len, dentry->d_name.len,
+ dentry->d_name.name, dentry->d_inode->i_ino,
+ xtype, type);
+ ret = -EIO;
+ goto error;
+}
+
+/*
+ * set the state xattr on a cache file
+ */
+int cachefiles_set_object_xattr(struct cachefiles_object *object,
+ struct cachefiles_xattr *auxdata)
+{
+ struct dentry *dentry = object->dentry;
+ int ret;
+
+ ASSERT(object->fscache.cookie);
+ ASSERT(dentry);
+
+ _enter("%p,#%d", object, auxdata->len);
+
+ /* attempt to install the cache metadata directly */
+ _debug("SET %s #%u", object->fscache.cookie->def->name, auxdata->len);
+
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+ &auxdata->type, auxdata->len,
+ XATTR_CREATE);
+ if (ret < 0 && ret != -ENOMEM)
+ cachefiles_io_error_obj(
+ object,
+ "Failed to set xattr with error %d", ret);
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * update the state xattr on a cache file
+ */
+int cachefiles_update_object_xattr(struct cachefiles_object *object,
+ struct cachefiles_xattr *auxdata)
+{
+ struct dentry *dentry = object->dentry;
+ int ret;
+
+ ASSERT(object->fscache.cookie);
+ ASSERT(dentry);
+
+ _enter("%p,#%d", object, auxdata->len);
+
+ /* attempt to install the cache metadata directly */
+ _debug("SET %s #%u", object->fscache.cookie->def->name, auxdata->len);
+
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+ &auxdata->type, auxdata->len,
+ XATTR_REPLACE);
+ if (ret < 0 && ret != -ENOMEM)
+ cachefiles_io_error_obj(
+ object,
+ "Failed to update xattr with error %d", ret);
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * check the state xattr on a cache file
+ * - return -ESTALE if the object should be deleted
+ */
+int cachefiles_check_object_xattr(struct cachefiles_object *object,
+ struct cachefiles_xattr *auxdata)
+{
+ struct cachefiles_xattr *auxbuf;
+ struct dentry *dentry = object->dentry;
+ int ret;
+
+ _enter("%p,#%d", object, auxdata->len);
+
+ ASSERT(dentry);
+ ASSERT(dentry->d_inode);
+
+ auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
+ if (!auxbuf) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ /* read the current type label */
+ ret = vfs_getxattr(dentry, cachefiles_xattr_cache,
+ &auxbuf->type, 512 + 1);
+ if (ret < 0) {
+ if (ret == -ENODATA)
+ goto stale; /* no attribute - power went off
+ * mid-cull? */
+
+ if (ret == -ERANGE)
+ goto bad_type_length;
+
+ cachefiles_io_error_obj(object,
+ "Can't read xattr on %lu (err %d)",
+ dentry->d_inode->i_ino, -ret);
+ goto error;
+ }
+
+ /* check the on-disk object */
+ if (ret < 1)
+ goto bad_type_length;
+
+ if (auxbuf->type != auxdata->type)
+ goto stale;
+
+ auxbuf->len = ret;
+
+ /* consult the netfs */
+ if (object->fscache.cookie->def->check_aux) {
+ fscache_checkaux_t result;
+ unsigned int dlen;
+
+ dlen = auxbuf->len - 1;
+
+ _debug("checkaux %s #%u",
+ object->fscache.cookie->def->name, dlen);
+
+ result = object->fscache.cookie->def->check_aux(
+ object->fscache.cookie->netfs_data,
+ &auxbuf->data, dlen);
+
+ switch (result) {
+ /* entry okay as is */
+ case FSCACHE_CHECKAUX_OKAY:
+ goto okay;
+
+ /* entry requires update */
+ case FSCACHE_CHECKAUX_NEEDS_UPDATE:
+ break;
+
+ /* entry requires deletion */
+ case FSCACHE_CHECKAUX_OBSOLETE:
+ goto stale;
+
+ default:
+ BUG();
+ }
+
+ /* update the current label */
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+ &auxdata->type, auxdata->len,
+ XATTR_REPLACE);
+ if (ret < 0) {
+ cachefiles_io_error_obj(object,
+ "Can't update xattr on %lu"
+ " (error %d)",
+ dentry->d_inode->i_ino, -ret);
+ goto error;
+ }
+ }
+
+okay:
+ ret = 0;
+
+error:
+ kfree(auxbuf);
+ _leave(" = %d", ret);
+ return ret;
+
+bad_type_length:
+ kerror("Cache object %lu xattr length incorrect",
+ dentry->d_inode->i_ino);
+ ret = -EIO;
+ goto error;
+
+stale:
+ ret = -ESTALE;
+ goto error;
+}
+
+/*
+ * remove the object's xattr to mark it stale
+ */
+int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
+ struct dentry *dentry)
+{
+ int ret;
+
+ ret = vfs_removexattr(dentry, cachefiles_xattr_cache);
+ if (ret < 0) {
+ if (ret == -ENOENT || ret == -ENODATA)
+ ret = 0;
+ else if (ret != -ENOMEM)
+ cachefiles_io_error(cache,
+ "Can't remove xattr from %lu"
+ " (error %d)",
+ dentry->d_inode->i_ino, -ret);
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}

2007-08-09 16:43:59

by David Howells

[permalink] [raw]
Subject: [PATCH 04/14] FS-Cache: Generic filesystem caching facility [try #2]

The attached patch adds a generic intermediary (FS-Cache) by which filesystems
may call on local caching capabilities, and by which local caching backends may
make caches available:

+---------+
| | +--------------+
| NFS |--+ | |
| | | +-->| CacheFS |
+---------+ | +----------+ | | /dev/hda5 |
| | | | +--------------+
+---------+ +-->| | |
| | | |--+
| AFS |----->| FS-Cache |
| | | |--+
+---------+ +-->| | |
| | | | +--------------+
+---------+ | +----------+ | | |
| | | +-->| CacheFiles |
| ISOFS |--+ | /var/cache |
| | +--------------+
+---------+

The patch also documents the netfs interface and the cache backend
interface provided by the facility.


There are a number of reasons why I'm not using i_mapping to do this.
These have been discussed a lot on the LKML and CacheFS mailing lists,
but to summarise the basics:

(1) Most filesystems don't do hole reportage. Holes in files are treated as
blocks of zeros and can't be distinguished otherwise, making it difficult
to distinguish blocks that have been read from the network and cached from
those that haven't.

(2) The backing inode must be fully populated before being exposed to
userspace through the main inode because the VM/VFS goes directly to the
backing inode and does not interrogate the front inode on VM ops.

Therefore:

(a) The backing inode must fit entirely within the cache.

(b) All backed files currently open must fit entirely within the cache at
the same time.

(c) A working set of files in total larger than the cache may not be
cached.

(d) A file may not grow larger than the available space in the cache.

(e) A file that's open and cached, and remotely grows larger than the
cache is potentially stuffed.

(3) Writes go to the backing filesystem, and can only be transferred to the
network when the file is closed.

(4) There's no record of what changes have been made, so the whole file must
be written back.

(5) The pages belong to the backing filesystem, and all metadata associated
with that page are relevant only to the backing filesystem, and not
anything stacked atop it.


The attached patch adds a generic core to which both networking filesystems and
caches may bind. It transfers requests from networking filesystems to
appropriate caches if possible, or else gracefully denies them.

If this facility is disabled in the kernel configuration, then all its
operations will be trivially reducible to nothing by the compiler.

FS-Cache provides the following facilities:

(1) Caches can be added / removed at any time, even whilst in use.

(2) Adds a facility by which tags can be used to refer to caches, even if
they're not mounted yet.

(3) More than one cache can be used at once. Caches can be selected
explicitly by use of tags.

(4) The netfs is provided with an interface that allows either party to
withdraw caching facilities from a file (required for (1)).

(5) A netfs may annotate cache objects that belongs to it.

(6) Cache objects can be pinned and reservations made.

(7) The interface to the netfs returns as few errors as possible, preferring
rather to let the netfs remain oblivious.

(8) Cookies are used to represent indices, files and other objects to the
netfs. The simplest cookie is just a NULL pointer - indicating nothing
cached there.

(9) The netfs is allowed to propose - dynamically - any index hierarchy it
desires, though it must be aware that the index search function is
recursive, stack space is limited, and indices can only be children of
indices.

(10) Indices can be used to group files together to reduce key size and to make
group invalidation easier. The use of indices may make lookup quicker,
but that's cache dependent.

(11) Data I/O is effectively done directly to and from the netfs's pages. The
netfs indicates that page A is at index B of the data-file represented by
cookie C, and that it should be read or written. The cache backend may or
may not start I/O on that page, but if it does, a netfs callback will be
invoked to indicate completion. The I/O may be either synchronous or
asynchronous.

(12) Cookies can be "retired" upon release. At this point FS-Cache will mark
them as obsolete and the index hierarchy rooted at that point will get
recycled.

(13) The netfs provides a "match" function for index searches. In addition to
saying whether a match was made or not, this can also specify that an
entry should be updated or deleted.


FS-Cache maintains a virtual indexing tree in which all indices, files, objects
and pages are kept. Bits of this tree may actually reside in one or more
caches.

FSDEF
|
+------------------------------------+
| |
NFS AFS
| |
+--------------------------+ +-----------+
| | | |
homedir mirror afs.org redhat.com
| | |
+------------+ +---------------+ +----------+
| | | | | |
00001 00002 00007 00125 vol00001 vol00002
| | | | |
+---+---+ +-----+ +---+ +------+------+ +-----+----+
| | | | | | | | | | | | |
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
| |
PG0 +-------+
| |
00001 00003
|
+---+---+
| | |
PG0 PG1 PG2

In the example above, you can see two netfs's being backed: NFS and AFS. These
have different index hierarchies:

(*) The NFS primary index will probably contain per-server indices. Each
server index is indexed by NFS file handles to get data file objects.
Each data file objects can have an array of pages, but may also have
further child objects, such as extended attributes and directory entries.
Extended attribute objects themselves have page-array contents.

(*) The AFS primary index contains per-cell indices. Each cell index contains
per-logical-volume indices. Each of volume index contains up to three
indices for the read-write, read-only and backup mirrors of those volumes.
Each of these contains vnode data file objects, each of which contains an
array of pages.

The very top index is the FS-Cache master index in which individual netfs's
have entries.

Any index object may reside in more than one cache, provided it only has index
children. Any index with non-index object children will be assumed to only
reside in one cache.


The FS-Cache overview can be found in:

Documentation/filesystems/caching/fscache.txt

The netfs API to FS-Cache can be found in:

Documentation/filesystems/caching/netfs-api.txt

The cache backend API to FS-Cache can be found in:

Documentation/filesystems/caching/backend-api.txt

Signed-Off-By: David Howells <[email protected]>
---

Documentation/filesystems/caching/backend-api.txt | 625 +++++++++++++++
Documentation/filesystems/caching/fscache.txt | 295 +++++++
Documentation/filesystems/caching/netfs-api.txt | 741 ++++++++++++++++++
fs/Kconfig | 6
fs/Makefile | 1
fs/fscache/Kconfig | 49 +
fs/fscache/Makefile | 19
fs/fscache/fsc-cache.c | 512 ++++++++++++
fs/fscache/fsc-cookie.c | 494 ++++++++++++
fs/fscache/fsc-fsdef.c | 111 +++
fs/fscache/fsc-internal.h | 376 +++++++++
fs/fscache/fsc-main.c | 127 +++
fs/fscache/fsc-manage.c | 260 ++++++
fs/fscache/fsc-object.c | 586 ++++++++++++++
fs/fscache/fsc-page.c | 863 +++++++++++++++++++++
fs/fscache/fsc-proc.c | 399 ++++++++++
fs/fscache/fsc-stats.c | 103 +++
fs/fscache/fsc-threads.c | 590 ++++++++++++++
include/linux/fscache-cache.h | 433 +++++++++++
include/linux/fscache.h | 593 ++++++++++++++
20 files changed, 7183 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt
new file mode 100644
index 0000000..a7e58eb
--- /dev/null
+++ b/Documentation/filesystems/caching/backend-api.txt
@@ -0,0 +1,625 @@
+ ==========================
+ FS-CACHE CACHE BACKEND API
+ ==========================
+
+The FS-Cache system provides an API by which actual caches can be supplied to
+FS-Cache for it to then serve out to network filesystems and other interested
+parties.
+
+This API is declared in <linux/fscache-cache.h>.
+
+
+====================================
+INITIALISING AND REGISTERING A CACHE
+====================================
+
+To start off, a cache definition must be initialised and registered for each
+cache the backend wants to make available. For instance, CacheFS does this in
+the fill_super() operation on mounting.
+
+The cache definition (struct fscache_cache) should be initialised by calling:
+
+ void fscache_init_cache(struct fscache_cache *cache,
+ struct fscache_cache_ops *ops,
+ const char *idfmt,
+ ...);
+
+Where:
+
+ (*) "cache" is a pointer to the cache definition;
+
+ (*) "ops" is a pointer to the table of operations that the backend supports on
+ this cache; and
+
+ (*) "idfmt" is a format and printf-style arguments for constructing a label
+ for the cache.
+
+
+The cache should then be registered with FS-Cache by passing a pointer to the
+previously initialised cache definition to:
+
+ int fscache_add_cache(struct fscache_cache *cache,
+ struct fscache_object *fsdef,
+ const char *tagname);
+
+Two extra arguments should also be supplied:
+
+ (*) "fsdef" which should point to the object representation for the FS-Cache
+ master index in this cache. Netfs primary index entries will be created
+ here.
+
+ (*) "tagname" which, if given, should be a text string naming this cache. If
+ this is NULL, the identifier will be used instead. For CacheFS, the
+ identifier is set to name the underlying block device and the tag can be
+ supplied by mount.
+
+This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
+is already in use. 0 will be returned on success.
+
+
+=====================
+UNREGISTERING A CACHE
+=====================
+
+A cache can be withdrawn from the system by calling this function with a
+pointer to the cache definition:
+
+ void fscache_withdraw_cache(struct fscache_cache *cache);
+
+In CacheFS's case, this is called by put_super().
+
+
+========
+SECURITY
+========
+
+The cache methods are executed one of two contexts:
+
+ (1) that of the userspace process that issued the netfs operation that caused
+ the cache method to be invoked, or
+
+ (2) that of one of the processes in the FS-Cache thread pool.
+
+In either case, this may not be an appropriate context in which to access the
+cache.
+
+The calling process's fsuid, fsgid and SELinux security identities may need to
+be masqueraded for the duration of the cache driver's access to the cache.
+This is left to the cache to handle; FS-Cache makes no effort in this regard.
+
+
+===================================
+CONTROL AND STATISTICS PRESENTATION
+===================================
+
+The cache may present data to the outside world through FS-Cache's interfaces
+in sysfs and procfs - the former for control and the latter for statistics.
+
+A sysfs directory called /sys/fs/fscache/<cachetag>/ is created if CONFIG_SYSFS
+is enabled. This is accessible through the kobject struct fscache_cache::kobj
+and is for use by the cache as it sees fit.
+
+The cache driver may create itself a directory named for the cache type in the
+/proc/fs/fscache/ directory. This is available if CONFIG_FSCACHE_PROC is
+enabled and is accessible through:
+
+ struct proc_dir_entry *proc_fscache;
+
+
+========================
+RELEVANT DATA STRUCTURES
+========================
+
+ (*) Index/Data file FS-Cache representation cookie:
+
+ struct fscache_cookie {
+ struct fscache_object_def *def;
+ struct fscache_netfs *netfs;
+ void *netfs_data;
+ ...
+ };
+
+ The fields that might be of use to the backend describe the object
+ definition, the netfs definition and the netfs's data for this cookie.
+ The object definition contain functions supplied by the netfs for loading
+ and matching index entries; these are required to provide some of the
+ cache operations.
+
+
+ (*) In-cache object representation:
+
+ struct fscache_object {
+ int debug_id;
+ enum {
+ FSCACHE_OBJECT_RECYCLING,
+ ...
+ } state;
+ spinlock_t lock
+ struct fscache_cache *cache;
+ struct fscache_cookie *cookie;
+ ...
+ };
+
+ Structures of this type should be allocated by the cache backend and
+ passed to FS-Cache when requested by the appropriate cache operation. In
+ the case of CacheFS, they're embedded in CacheFS's internal object
+ structures.
+
+ The debug_id is a simple integer that can be used in debugging messages
+ that refer to a particular object. In such a case it should be printed
+ using "OBJ%x" to be consistent with FS-Cache.
+
+ Each object contains a pointer to the cookie that represents the object it
+ is backing. An object should retired when put_object() is called if it is
+ in state FSCACHE_OBJECT_RECYCLING. The fscache_object struct should be
+ initialised by calling fscache_object_init(object).
+
+
+ (*) FS-Cache operation record:
+
+ struct fscache_operation {
+ atomic_t usage;
+ struct fscache_object *object;
+ unsigned long flags;
+ #define FSCACHE_OP_EXCLUSIVE
+ void (*processor)(struct fscache_operation *op);
+ void (*release)(struct fscache_operation *op);
+ ...
+ };
+
+ FS-Cache has a pool of threads that it uses to give CPU time to the
+ various asynchronous operations that need to be done as part of driving
+ the cache. These are represented by the above structure. The processor
+ method is called to give the op CPU time, and the release method to get
+ rid of it when its usage count reaches 0.
+
+ An operation can be made exclusive upon an object by setting the
+ appropriate flag before enqueuing it with fscache_enqueue_operation(). If
+ an operation needs more processing time, it should be enqueued again.
+
+
+ (*) FS-Cache retrieval operation record:
+
+ struct fscache_retrieval {
+ struct fscache_operation op;
+ struct address_space *mapping;
+ struct list_head *to_do;
+ ...
+ };
+
+ A structure of this type is allocated by FS-Cache to record retrieval and
+ allocation requests made by the netfs. This struct is then passed to the
+ backend to do the operation. The backend may get extra refs to it by
+ calling fscache_get_retrieval() and refs may be discarded by calling
+ fscache_put_retrieval().
+
+ A retrieval operation can be used by the backend to do retrieval work. To
+ do this, the retrieval->op.processor method pointer should be set
+ appropriately by the backend and fscache_enqueue_retrieval() called to
+ submit it to the thread pool. CacheFiles, for example, uses this to queue
+ page examination when it detects PG_lock being cleared.
+
+ The to_do field is an empty list available for the cache backend to use as
+ it sees fit.
+
+
+ (*) FS-Cache storage operation record:
+
+ struct fscache_storage {
+ struct fscache_operation op;
+ pgoff_t store_limit;
+ ...
+ };
+
+ A structure of this type is allocated by FS-Cache to record outstanding
+ writes to be made. FS-Cache itself enqueues this operation and invokes
+ the write_page() method on the object at appropriate times to effect
+ storage.
+
+
+================
+CACHE OPERATIONS
+================
+
+The cache backend provides FS-Cache with a table of operations that can be
+performed on the denizens of the cache. These are held in a structure of type:
+
+ struct fscache_cache_ops
+
+ (*) Name of cache provider [mandatory]:
+
+ const char *name
+
+ This isn't strictly an operation, but should be pointed at a string naming
+ the backend.
+
+
+ (*) Allocate a new object [mandatory]:
+
+ struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
+ struct fscache_cookie *cookie)
+
+ This method is used to allocate a cache object representation to back a
+ cookie in a particular cache. fscache_object_init() should be called on
+ the object to initialise it prior to returning.
+
+ This function may also be used to parse the index key to be used for
+ multiple lookup calls to turn it into a more convenient form. FS-Cache
+ will call the lookup_complete() method to allow the cache to release the
+ form once lookup is complete or aborted.
+
+
+ (*) Look up and create object [mandatory]:
+
+ void (*lookup_object)(struct fscache_object *object)
+
+ This method is used to look up an object, given that the object is already
+ allocated and attached to the cookie. This should instantiate that object
+ in the cache if it can.
+
+ The method should call fscache_object_lookup_negative() as soon as
+ possible if it determines the object doesn't exist in the cache. If the
+ object is found to exist and the netfs indicates that it is valid then
+ fscache_obtained_object() should be called once the object is in a
+ position to have data stored in it. Similarly, fscache_obtained_object()
+ should also be called once a non-present object has been created.
+
+ If a lookup error occurs, fscache_object_lookup_error() should be called
+ to abort the lookup of that object.
+
+
+ (*) Release lookup data [mandatory]:
+
+ void (*lookup_complete)(struct fscache_object *object)
+
+ This method is called to ask the cache to release any resources it was
+ using to perform a lookup.
+
+
+ (*) Increment object refcount [mandatory]:
+
+ struct fscache_object *(*grab_object)(struct fscache_object *object)
+
+ This method is called to increment the reference count on an object. It
+ may fail (for instance if the cache is being withdrawn) by returning NULL.
+ It should return the object pointer if successful.
+
+
+ (*) Lock/Unlock object [mandatory]:
+
+ void (*lock_object)(struct fscache_object *object)
+ void (*unlock_object)(struct fscache_object *object)
+
+ These methods are used to exclusively lock an object. It must be possible
+ to schedule with the lock held, so a spinlock isn't sufficient.
+
+
+ (*) Pin/Unpin object [optional]:
+
+ int (*pin_object)(struct fscache_object *object)
+ void (*unpin_object)(struct fscache_object *object)
+
+ These methods are used to pin an object into the cache. Once pinned an
+ object cannot be reclaimed to make space. Return -ENOSPC if there's not
+ enough space in the cache to permit this.
+
+
+ (*) Update object [mandatory]:
+
+ int (*update_object)(struct fscache_object *object)
+
+ This is called to update the index entry for the specified object. The
+ new information should be in object->cookie->netfs_data. This can be
+ obtained by calling object->cookie->def->get_aux()/get_attr().
+
+
+ (*) Discard object [mandatory]:
+
+ void (*drop_object)(struct fscache_object *object)
+
+ This method is called to indicate that an object has been unbound from its
+ cookie, and that the cache should release the object's resources and
+ retire it if it's in state FSCACHE_OBJECT_RECYCLING.
+
+ This method should not attempt to release any references held by the
+ caller. The caller will invoke the put_object() method as appropriate.
+
+
+ (*) Release object reference [mandatory]:
+
+ void (*put_object)(struct fscache_object *object)
+
+ This method is used to discard a reference to an object. The object may
+ be freed when all the references to it are released.
+
+
+ (*) Synchronise a cache [mandatory]:
+
+ void (*sync)(struct fscache_cache *cache)
+
+ This is called to ask the backend to synchronise a cache with its backing
+ device.
+
+
+ (*) Dissociate a cache [mandatory]:
+
+ void (*dissociate_pages)(struct fscache_cache *cache)
+
+ This is called to ask a cache to perform any page dissociations as part of
+ cache withdrawal.
+
+
+ (*) Notification that the attributes on a netfs file changed [mandatory]:
+
+ int (*attr_changed)(struct fscache_object *object);
+
+ This is called to indicate to the cache that certain attributes on a netfs
+ file have changed (for example the maximum size a file may reach). The
+ cache can read these from the netfs by calling the cookie's get_attr()
+ method.
+
+ The cache may use the file size information to reserve space on the cache.
+ It should also call fscache_set_store_limit() to indicate to FS-Cache the
+ highest byte it's willing to store for an object.
+
+ This method may return -ve if an error occurred or the cache object cannot
+ be expanded. In such a case, the object will be withdrawn from service.
+
+ This operation is run asynchronously from FS-Cache's thread pool, and
+ storage and retrieval operations from the netfs are excluded during the
+ execution of this operation.
+
+
+ (*) Reserve cache space for an object's data [optional]:
+
+ int (*reserve_space)(struct fscache_object *object, loff_t size);
+
+ This is called to request that cache space be reserved to hold the data
+ for an object and the metadata used to track it. Zero size should be
+ taken as request to cancel a reservation.
+
+ This should return 0 if successful, -ENOSPC if there isn't enough space
+ available, or -ENOMEM or -EIO on other errors.
+
+ The reservation may exceed the current size of the object, thus permitting
+ future expansion. If the amount of space consumed by an object would
+ exceed the reservation, it's permitted to refuse requests to allocate
+ pages, but not required. An object may be pruned down to its reservation
+ size if larger than that already.
+
+
+ (*) Request page be read from cache [mandatory]:
+
+ int (*read_or_alloc_page)(struct fscache_retrieval *op,
+ struct page *page,
+ gfp_t gfp)
+
+ This is called to attempt to read a netfs page from the cache, or to
+ reserve a backing block if not. FS-Cache will have done as much checking
+ as it can before calling, but most of the work belongs to the backend.
+
+ If there's no page in the cache, then -ENODATA should be returned if the
+ backend managed to reserve a backing block; -ENOBUFS or -ENOMEM if it
+ didn't.
+
+ If there is suitable data in the cache, then a read operation should be
+ queued and 0 returned. When the read finishes, fscache_end_io() should be
+ called.
+
+ The fscache_mark_pages_cached() should be called for the page if any cache
+ metadata is retained. This will indicate to the netfs that the page needs
+ explicit uncaching. This operation takes a pagevec, thus allowing several
+ pages to be marked at once.
+
+ The retrieval record pointed to by op should be retained for each page
+ queued and released when I/O on the page has been formally ended.
+ fscache_get/put_retrieval() are available for this purpose.
+
+ The retrieval record may be used to get CPU time via the FS-Cache thread
+ pool. If this is desired, the op->op.processor should be set to point to
+ the appropriate processing routine, and fscache_enqueue_retrieval() should
+ be called at an appropriate point to request CPU time. For instance, the
+ retrieval routine could be enqueued upon the completion of a disk read.
+ The to_do field in the retrieval record is provided to aid in this.
+
+ If an I/O error occurs, fscache_io_error() should be called and -ENOBUFS
+ returned if possible or fscache_end_io() called with a suitable error
+ code..
+
+
+ (*) Request pages be read from cache [mandatory]:
+
+ int (*read_or_alloc_pages)(struct fscache_retrieval *op,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ gfp_t gfp)
+
+ This is like the read_or_alloc_page() method, except it is handed a list
+ of pages instead of one page. Any pages on which a read operation is
+ started must be added to the page cache for the specified mapping and also
+ to the LRU. Such pages must also be removed from the pages list and
+ *nr_pages decremented per page.
+
+ If there was an error such as -ENOMEM, then that should be returned; else
+ if one or more pages couldn't be read or allocated, then -ENOBUFS should
+ be returned; else if one or more pages couldn't be read, then -ENODATA
+ should be returned. If all the pages are dispatched then 0 should be
+ returned.
+
+
+ (*) Request page be allocated in the cache [mandatory]:
+
+ int (*allocate_page)(struct fscache_retrieval *op,
+ struct page *page,
+ gfp_t gfp)
+
+ This is like the read_or_alloc_page() method, except that it shouldn't
+ read from the cache, even if there's data there that could be retrieved.
+ It should, however, set up any internal metadata required such that
+ the write_page() method can write to the cache.
+
+ If there's no backing block available, then -ENOBUFS should be returned
+ (or -ENOMEM if there were other problems). If a block is successfully
+ allocated, then the netfs page should be marked and 0 returned.
+
+
+ (*) Request pages be allocated in the cache [mandatory]:
+
+ int (*allocate_pages)(struct fscache_retrieval *op,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ gfp_t gfp)
+
+ This is an multiple page version of the allocate_page() method. pages and
+ nr_pages should be treated as for the read_or_alloc_pages() method.
+
+
+ (*) Request page be written to cache [mandatory]:
+
+ int (*write_page)(struct fscache_storage *op,
+ struct page *page);
+
+ This is called to write from a page on which there was a previously
+ successful read_or_alloc_page() call or similar. FS-Cache filters out
+ pages that don't have mappings.
+
+ This method is called asynchronously from the FS-Cache thread pool. It is
+ not required to actually store anything, provided -ENODATA is then
+ returned to the next read of this page.
+
+ If an error occurred, then a negative error code should be returned,
+ otherwise zero should be returned. FS-Cache will take appropriate action
+ in response to an error, such as withdrawing this object.
+
+ If this method returns success then FS-Cache will inform the netfs
+ appropriately.
+
+
+ (*) Discard retained per-page metadata [mandatory]:
+
+ void (*uncache_page)(struct fscache_object *object, struct page *page)
+
+ This is called when a netfs page is being evicted from the pagecache. The
+ cache backend should tear down any internal representation or tracking it
+ maintains for this page.
+
+
+==================
+FS-CACHE UTILITIES
+==================
+
+FS-Cache provides some utilities that a cache backend may make use of:
+
+ (*) Note occurrence of an I/O error in a cache:
+
+ void fscache_io_error(struct fscache_cache *cache)
+
+ This tells FS-Cache that an I/O error occurred in the cache. After this
+ has been called, only resource dissociation operations (object and page
+ release) will be passed from the netfs to the cache backend for the
+ specified cache.
+
+ This does not actually withdraw the cache. That must be done separately.
+
+
+ (*) Invoke the retrieval I/O completion function:
+
+ void fscache_end_io(struct fscache_retrieval *op, struct page *page,
+ int error);
+
+ This is called to note the end of an attempt to retrieve a page. The
+ error value should be 0 if successful and an error otherwise.
+
+
+ (*) Set highest store limit:
+
+ void fscache_set_store_limit(struct fscache_object *object,
+ loff_t i_size);
+
+ This sets the limit FS-Cache imposes on the highest byte it's willing to
+ try and store for a netfs. Any page over this limit is automatically
+ rejected by fscache_read_alloc_page() and co with -ENOBUFS.
+
+
+ (*) Mark pages as being cached:
+
+ void fscache_mark_pages_cached(struct fscache_retrieval *op,
+ struct pagevec *pagevec);
+
+ This marks a set of pages as being cached. After this has been called,
+ the netfs must call fscache_uncache_page() to unmark the pages.
+
+
+ (*) Initialise a freshly allocated object:
+
+ void fscache_object_init(struct fscache_object *object);
+
+ This initialises all the fields in an object representation.
+
+
+ (*) Indicate negative lookup on an object:
+
+ void fscache_object_lookup_negative(struct fscache_object *object);
+
+ This is called to indicate to FS-Cache that a lookup process for an object
+ found a negative result.
+
+ This changes the state of an object to permit reads pending on lookup
+ completion to go off and start fetching data from the netfs server as it's
+ known at this point that there can't be any data in the cache.
+
+ This may be called multiple times on an object. Only the first call is
+ significant - all subsequent calls are ignored.
+
+
+ (*) Indicate an object has been obtained:
+
+ void fscache_obtained_object(struct fscache_object *object);
+
+ This is called to indicate to FS-Cache that a lookup process for an object
+ produced a positive result, or that an object was created. This should
+ only be called once for any particular object.
+
+ This changes the state of an object to indicate:
+
+ (1) if no call to fscache_object_lookup_negative() has been made on
+ this object, that there may be data available, and that reads can
+ now go and look for it; and
+
+ (2) that writes may now proceed against this object.
+
+
+ (*) Indicate that object lookup failed:
+
+ void fscache_object_lookup_error(struct fscache_object *object);
+
+ This marks an object as having encountered a fatal error (usually EIO)
+ and causes it to move into a state whereby it will be withdrawn as soon
+ as possible.
+
+
+ (*) Get and release references on a retrieval record:
+
+ void fscache_get_retrieval(struct fscache_retrieval *op);
+ void fscache_put_retrieval(struct fscache_retrieval *op);
+
+ These two functions are used to retain a retrieval record whilst doing
+ asynchronous data retrieval and block allocation.
+
+
+ (*) Enqueue a retrieval record for processing.
+
+ void fscache_enqueue_retrieval(struct fscache_retrieval *op);
+
+ This enqueues a retrieval record for processing by the FS-Cache thread
+ pool. One of the threads in the pool will invoke the retrieval record's
+ op->op.processor callback function. This function may be called from
+ within the callback function.
+
+
+ (*) List of object state names:
+
+ const char *fscache_object_states[];
+
+ For debugging purposes, this may be used to turn the state that an object
+ is in into a text string for display purposes.
diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
new file mode 100644
index 0000000..b28f2ca
--- /dev/null
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -0,0 +1,295 @@
+ ==========================
+ General Filesystem Caching
+ ==========================
+
+========
+OVERVIEW
+========
+
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+
+FS-Cache mediates between cache backends (such as CacheFS) and network
+filesystems:
+
+ +---------+
+ | | +--------------+
+ | NFS |--+ | |
+ | | | +-->| CacheFS |
+ +---------+ | +----------+ | | /dev/hda5 |
+ | | | | +--------------+
+ +---------+ +-->| | |
+ | | | |--+
+ | AFS |----->| FS-Cache |
+ | | | |--+
+ +---------+ +-->| | |
+ | | | | +--------------+
+ +---------+ | +----------+ | | |
+ | | | +-->| CacheFiles |
+ | ISOFS |--+ | /var/cache |
+ | | +--------------+
+ +---------+
+
+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+ cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+ must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+ one-off access of a small portion of it (such as might be done with the
+ "file" program).
+
+It instead serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+FS-Cache provides the following facilities:
+
+ (1) More than one cache can be used at once. Caches can be selected
+ explicitly by use of tags.
+
+ (2) Caches can be added / removed at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+ withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+ rather to let the netfs remain oblivious.
+
+ (5) Cookies are used to represent indices, files and other objects to the
+ netfs. The simplest cookie is just a NULL pointer - indicating nothing
+ cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+ desires, though it must be aware that the index search function is
+ recursive, stack space is limited, and indices can only be children of
+ indices.
+
+ (7) Data I/O is done direct to and from the netfs's pages. The netfs
+ indicates that page A is at index B of the data-file represented by cookie
+ C, and that it should be read or written. The cache backend may or may
+ not start I/O on that page, but if it does, a netfs callback will be
+ invoked to indicate completion. The I/O may be either synchronous or
+ asynchronous.
+
+ (8) Cookies can be "retired" upon release. At this point FS-Cache will mark
+ them as obsolete and the index hierarchy rooted at that point will get
+ recycled.
+
+ (9) The netfs provides a "match" function for index searches. In addition to
+ saying whether a match was made or not, this can also specify that an
+ entry should be updated or deleted.
+
+(10) As much as possible is done asynchronously.
+
+
+FS-Cache maintains a virtual indexing tree in which all indices, files, objects
+and pages are kept. Bits of this tree may actually reside in one or more
+caches.
+
+ FSDEF
+ |
+ +------------------------------------+
+ | |
+ NFS AFS
+ | |
+ +--------------------------+ +-----------+
+ | | | |
+ homedir mirror afs.org redhat.com
+ | | |
+ +------------+ +---------------+ +----------+
+ | | | | | |
+ 00001 00002 00007 00125 vol00001 vol00002
+ | | | | |
+ +---+---+ +-----+ +---+ +------+------+ +-----+----+
+ | | | | | | | | | | | | |
+PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
+ | |
+ PG0 +-------+
+ | |
+ 00001 00003
+ |
+ +---+---+
+ | | |
+ PG0 PG1 PG2
+
+In the example above, you can see two netfs's being backed: NFS and AFS. These
+have different index hierarchies:
+
+ (*) The NFS primary index contains per-server indices. Each server index is
+ indexed by NFS file handles to get data file objects. Each data file
+ objects can have an array of pages, but may also have further child
+ objects, such as extended attributes and directory entries. Extended
+ attribute objects themselves have page-array contents.
+
+ (*) The AFS primary index contains per-cell indices. Each cell index contains
+ per-logical-volume indices. Each of volume index contains up to three
+ indices for the read-write, read-only and backup mirrors of those volumes.
+ Each of these contains vnode data file objects, each of which contains an
+ array of pages.
+
+The very top index is the FS-Cache master index in which individual netfs's
+have entries.
+
+Any index object may reside in more than one cache, provided it only has index
+children. Any index with non-index object children will be assumed to only
+reside in one cache.
+
+
+The netfs API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/netfs-api.txt
+
+The cache backend API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/backend-api.txt
+
+
+=======================
+STATISTICAL INFORMATION
+=======================
+
+If FS-Cache is compiled with the following options enabled:
+
+ CONFIG_FSCACHE_PROC=y (implied by the following two)
+ CONFIG_FSCACHE_STATS=y
+ CONFIG_FSCACHE_HISTOGRAM=y
+
+then it will gather certain statistics and display them through a number of
+proc files.
+
+ (*) /proc/fs/fscache/stats
+
+ This shows counts of a number of events that can happen in FS-Cache:
+
+ CLASS EVENT MEANING
+ ======= ======= =======================================================
+ Cookies idx=N Number of index cookies allocated
+ dat=N Number of data storage cookies allocated
+ spc=N Number of special cookies allocated
+ Objects alc=N Number of objects allocated
+ nal=N Number of object allocation failures
+ avl=N Number of objects that reached the available state
+ Pages mrk=N Number of pages marked as being cached
+ unc=N Number of uncache page requests seen
+ Acquire n=N Number of acquire cookie requests seen
+ nul=N Number of acq reqs given a NULL parent
+ noc=N Number of acq reqs rejected due to no cache available
+ ok=N Number of acq reqs succeeded
+ nbf=N Number of acq reqs rejected due to error
+ oom=N Number of acq reqs failed on ENOMEM
+ Lookups n=N Number of lookup calls made on cache backends
+ neg=N Number of negative lookups made
+ pos=N Number of positive lookups made
+ crt=N Number of objects created by lookup
+ bst=N Number of objects with boosted lookup priority
+ Updates n=N Number of update cookie requests seen
+ nul=N Number of upd reqs given a NULL parent
+ run=N Number of upd reqs granted CPU time
+ Relinqs n=N Number of relinquish cookie requests seen
+ nul=N Number of rlq reqs given a NULL parent
+ wcr=N Number of rlq reqs waited on completion of creation
+ AttrChg n=N Number of attribute changed requests seen
+ ok=N Number of attr changed requests queued
+ nbf=N Number of attr changed rejected -ENOBUFS
+ oom=N Number of attr changed failed -ENOMEM
+ run=N Number of attr changed ops given CPU time
+ Allocs n=N Number of allocation requests seen
+ ok=N Number of successful alloc reqs
+ wt=N Number of alloc reqs that waited on lookup completion
+ nbf=N Number of alloc reqs rejected -ENOBUFS
+ ops=N Number of alloc reqs submitted
+ owt=N Number of alloc reqs waited for CPU time
+ Retrvls n=N Number of retrieval (read) requests seen
+ ok=N Number of successful retr reqs
+ wt=N Number of retr reqs that waited on lookup completion
+ nod=N Number of retr reqs returned -ENODATA
+ nbf=N Number of retr reqs rejected -ENOBUFS
+ int=N Number of retr reqs aborted -ERESTARTSYS
+ oom=N Number of retr reqs failed -ENOMEM
+ ops=N Number of retr reqs submitted
+ owt=N Number of retr reqs waited for CPU time
+ Stores n=N Number of storage (write) requests seen
+ ok=N Number of successful store reqs
+ agn=N Number of store reqs on a page already pending storage
+ nbf=N Number of store reqs rejected -ENOBUFS
+ oom=N Number of store reqs failed -ENOMEM
+ ops=N Number of store reqs submitted
+ run=N Number of store reqs granted CPU time
+ Ops pend=N Number of times async ops added to pending queues
+ run=N Number of times async ops given CPU time
+ enq=N Number of times async ops queued for processing
+ req=N Number of times async ops requeued for processing
+ rel=N Number of times async ops released
+
+
+ (*) /proc/fs/fscache/pool
+
+ This shows the number of objects and operations each thread in the thread
+ pool has given CPU time to.
+
+
+ (*) /proc/fs/fscache/histogram
+
+ cat /proc/fs/fscache/histogram
+ +HZ +TIME OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
+ ===== ===== ========= ========= ========= ========= =========
+
+ This shows the breakdown of the number of times each amount of time
+ between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
+ columns are as follows:
+
+ COLUMN TIME MEASUREMENT
+ ======= =======================================================
+ OBJ INST Length of time to instantiate an object
+ OP RUNS Length of time a call to process an operation took
+ OBJ RUNS Length of time a call to process an object event took
+ RETRV DLY Time between an requesting a read and lookup completing
+ RETRIEVLS Time between beginning and end of a retrieval
+
+ Each row shows the number of events that took a particular range of times.
+ Each step is 1 jiffy in size. The +HZ column indicates the particular
+ jiffy range covered, and the +TIME field the equivalent number of seconds.
+
+
+=========
+DEBUGGING
+=========
+
+The FS-Cache facility can have runtime debugging enabled by adjusting the value
+in:
+
+ /sys/module/fscache/parameters/debug
+
+This is a bitmask of debugging streams to enable:
+
+ BIT VALUE STREAM POINT
+ ======= ======= =============================== =======================
+ 0 1 Cache management Function entry trace
+ 1 2 Function exit trace
+ 2 4 General
+ 3 8 Cookie management Function entry trace
+ 4 16 Function exit trace
+ 5 32 General
+ 6 64 Page handling Function entry trace
+ 7 128 Function exit trace
+ 8 256 General
+ 9 512 Thread pool management Function entry trace
+ 10 1024 Function exit trace
+ 11 2048 General
+
+The appropriate set of values should be OR'd together and the result written to
+the control file. For example:
+
+ echo $((1|8|64)) >/sys/module/fscache/parameters/debug
+
+will turn on all function entry debugging.
+
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
new file mode 100644
index 0000000..e8a1496
--- /dev/null
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -0,0 +1,741 @@
+ ===============================
+ FS-CACHE NETWORK FILESYSTEM API
+ ===============================
+
+There's an API by which a network filesystem can make use of the FS-Cache
+facilities. This is based around a number of principles:
+
+ (1) Caches can store a number of different object types. There are two main
+ object types: indices and files. The first is a special type used by
+ FS-Cache to make finding objects faster and to make retiring of groups of
+ objects easier.
+
+ (2) Every index, file or other object is represented by a cookie. This cookie
+ may or may not have anything associated with it, but the netfs doesn't
+ need to care.
+
+ (3) Barring the top-level index (one entry per cached netfs), the index
+ hierarchy for each netfs is structured according the whim of the netfs.
+
+This API is declared in <linux/fscache.h>.
+
+This document contains the following sections:
+
+ (1) Network filesystem definition
+ (2) Index definition
+ (3) Object definition
+ (4) Network filesystem (un)registration
+ (5) Cache tag lookup
+ (6) Index registration
+ (7) Data file registration
+ (8) Miscellaneous object registration
+ (9) Setting the data file size
+ (10) Page alloc/read/write
+ (11) Page uncaching
+ (12) Index and data file update
+ (13) Miscellaneous cookie operations
+ (14) Cookie unregistration
+ (15) Index and data file invalidation
+
+
+=============================
+NETWORK FILESYSTEM DEFINITION
+=============================
+
+FS-Cache needs a description of the network filesystem. This is specified
+using a record of the following structure:
+
+ struct fscache_netfs {
+ uint32_t version;
+ const char *name;
+ struct fscache_netfs_operations *ops;
+ struct fscache_cookie *primary_index;
+ ...
+ };
+
+This first three fields should be filled in before registration, and the fourth
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+
+The fields are:
+
+ (1) The name of the netfs (used as the key in the toplevel index).
+
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+ entire in-cache hierarchy for this netfs will be scrapped and begun
+ afresh).
+
+ (3) The operations table is defined as follows:
+
+ struct fscache_netfs_operations {
+ };
+
+ Currently there aren't any functions here.
+
+ (4) The cookie representing the primary index will be allocated according to
+ another parameter passed into the registration function.
+
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+
+ static struct fscache_netfs_operations afs_cache_ops = {
+ };
+
+ struct fscache_netfs afs_cache_netfs = {
+ .version = 0,
+ .name = "afs",
+ .ops = &afs_cache_ops,
+ };
+
+
+================
+INDEX DEFINITION
+================
+
+Indices are used for two purposes:
+
+ (1) To aid the finding of a file based on a series of keys (such as AFS's
+ "cell", "volume ID", "vnode ID").
+
+ (2) To make it easier to discard a subset of all the files cached based around
+ a particular key - for instance to mirror the removal of an AFS volume.
+
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, FS-Cache tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree. The netfs can even mix indices and data files at the same level, but
+it's not recommended.
+
+Each index entry consists of a key of indeterminate length plus some auxilliary
+data, also of indeterminate length.
+
+There are some limits on indices:
+
+ (1) Any index containing non-index objects should be restricted to a single
+ cache. Any such objects created within an index will be created in the
+ first cache only. The cache in which an index is created can be
+ controlled by cache tags (see below).
+
+ (2) The entry data must be atomically journallable, so it is limited to about
+ 400 bytes at present. At least 400 bytes will be available.
+
+ (3) The depth of the index tree should be judged with care as the search
+ function is recursive. Too many layers will run the kernel out of stack.
+
+
+=================
+OBJECT DEFINITION
+=================
+
+To define an object, a structure of the following type should be filled out:
+
+ struct fscache_cookie_def
+ {
+ uint8_t name[16];
+ uint8_t type;
+
+ struct fscache_cache_tag *(*select_cache)(
+ const void *parent_netfs_data,
+ const void *cookie_netfs_data);
+
+ uint16_t (*get_key)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ void (*get_attr)(const void *cookie_netfs_data,
+ uint64_t *size);
+
+ uint16_t (*get_aux)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ fscache_checkaux_t (*check_aux)(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen);
+
+ void (*get_context)(void *cookie_netfs_data, void *context);
+
+ void (*put_context)(void *cookie_netfs_data, void *context);
+
+ void (*mark_pages_cached)(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec);
+
+ void (*now_uncached)(void *cookie_netfs_data);
+ };
+
+This has the following fields:
+
+ (1) The type of the object [mandatory].
+
+ This is one of the following values:
+
+ (*) FSCACHE_COOKIE_TYPE_INDEX
+
+ This defines an index, which is a special FS-Cache type.
+
+ (*) FSCACHE_COOKIE_TYPE_DATAFILE
+
+ This defines an ordinary data file.
+
+ (*) Any other value between 2 and 255
+
+ This defines an extraordinary object such as an XATTR.
+
+ (2) The name of the object type (NUL terminated unless all 16 chars are used)
+ [optional].
+
+ (3) A function to select the cache in which to store an index [optional].
+
+ This function is invoked when an index needs to be instantiated in a cache
+ during the instantiation of a non-index object. Only the immediate index
+ parent for the non-index object will be queried. Any indices above that
+ in the hierarchy may be stored in multiple caches. This function does not
+ need to be supplied for any non-index object or any index that will only
+ have index children.
+
+ If this function is not supplied or if it returns NULL then the first
+ cache in the parent's list will be chosed, or failing that, the first
+ cache in the master list.
+
+ (4) A function to retrieve an object's key from the netfs [mandatory].
+
+ This function will be called with the netfs data that was passed to the
+ cookie acquisition function and the maximum length of key data that it may
+ provide. It should write the required key data into the given buffer and
+ return the quantity it wrote.
+
+ (5) A function to retrieve attribute data from the netfs [optional].
+
+ This function will be called with the netfs data that was passed to the
+ cookie acquisition function. It should return the size of the file if
+ this is a data file. The size may be used to govern how much cache must
+ be reserved for this file in the cache.
+
+ If the function is absent, a file size of 0 is assumed.
+
+ (6) A function to retrieve auxilliary data from the netfs [optional].
+
+ This function will be called with the netfs data that was passed to the
+ cookie acquisition function and the maximum length of auxilliary data that
+ it may provide. It should write the auxilliary data into the given buffer
+ and return the quantity it wrote.
+
+ If this function is absent, the auxilliary data length will be set to 0.
+
+ The length of the auxilliary data buffer may be dependent on the key
+ length. A netfs mustn't rely on being able to provide more than 400 bytes
+ for both.
+
+ (7) A function to check the auxilliary data [optional].
+
+ This function will be called to check that a match found in the cache for
+ this object is valid. For instance with AFS it could check the auxilliary
+ data against the data version number returned by the server to determine
+ whether the index entry in a cache is still valid.
+
+ If this function is absent, it will be assumed that matching objects in a
+ cache are always valid.
+
+ If present, the function should return one of the following values:
+
+ (*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
+ (*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
+ (*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
+
+ This function can also be used to extract data from the auxilliary data in
+ the cache and copy it into the netfs's structures.
+
+ (8) A pair of functions to manage contexts for the completion callback
+ [optional].
+
+ The cache read/write functions are passed a context which is then passed
+ to the I/O completion callback function. To ensure this context remains
+ valid until after the I/O completion is called, two functions may be
+ provided: one to get an extra reference on the context, and one to drop a
+ reference to it.
+
+ If the context is not used or is a type of object that won't go out of
+ scope, then these functions are not required. These functions are not
+ required for indices as indices may not contain data. These functions may
+ be called in interrupt context and so may not sleep.
+
+ (9) A function to mark a page as retaining cache metadata [optional].
+
+ This is called by the cache to indicate that it is retaining in-memory
+ information for this page and that the netfs should uncache the page when
+ it has finished. This does not indicate whether there's data on the disk
+ or not. Note that several pages at once may be presented for marking.
+
+ The PG_fscache bit is set on the pages before this function would be
+ called, so the function need not be provided if this is sufficient.
+
+ This function is not required for indices as they're not permitted data.
+
+(10) A function to unmark all the pages retaining cache metadata [mandatory].
+
+ This is called by FS-Cache to indicate that a backing store is being
+ unbound from a cookie and that all the marks on the pages should be
+ cleared to prevent confusion. Note that the cache will have torn down all
+ its tracking information so that the pages don't need to be explicitly
+ uncached.
+
+ This function is not required for indices as they're not permitted data.
+
+
+===================================
+NETWORK FILESYSTEM (UN)REGISTRATION
+===================================
+
+The first step is to declare the network filesystem to the cache. This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+
+The registration function is:
+
+ int fscache_register_netfs(struct fscache_netfs *netfs);
+
+It just takes a pointer to the netfs definition. It returns 0 or an error as
+appropriate.
+
+For kAFS, registration is done as follows:
+
+ ret = fscache_register_netfs(&afs_cache_netfs);
+
+The last step is, of course, unregistration:
+
+ void fscache_unregister_netfs(struct fscache_netfs *netfs);
+
+
+================
+CACHE TAG LOOKUP
+================
+
+FS-Cache permits the use of more than one cache. To permit particular index
+subtrees to be bound to particular caches, the second step is to look up cache
+representation tags. This step is optional; it can be left entirely up to
+FS-Cache as to which cache should be used. The problem with doing that is that
+FS-Cache will always pick the first cache that was registered.
+
+To get the representation for a named tag:
+
+ struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
+
+This takes a text string as the name and returns a representation of a tag. It
+will never return an error. It may return a dummy tag, however, if it runs out
+of memory; this will inhibit caching with this tag.
+
+Any representation so obtained must be released by passing it to this function:
+
+ void fscache_release_cache_tag(struct fscache_cache_tag *tag);
+
+The tag will be retrieved by FS-Cache when it calls the object definition
+operation select_cache().
+
+
+==================
+INDEX REGISTRATION
+==================
+
+The third step is to inform FS-Cache about part of an index hierarchy that can
+be used to locate files. This is done by requesting a cookie for each index in
+the path to the file:
+
+ struct fscache_cookie *
+ fscache_acquire_cookie(struct fscache_cookie *parent,
+ const struct fscache_object_def *def,
+ void *netfs_data);
+
+This function creates an index entry in the index represented by parent,
+filling in the index entry by calling the operations pointed to by def.
+
+Note that this function never returns an error - all errors are handled
+internally. It may, however, return NULL to indicate no cookie. It is quite
+acceptable to pass this token back to this function as the parent to another
+acquisition (or even to the relinquish cookie, read page and write page
+functions - see below).
+
+Note also that no indices are actually created in a cache until a non-index
+object needs to be created somewhere down the hierarchy. Furthermore, an index
+may be created in several different caches independently at different times.
+This is all handled transparently, and the netfs doesn't see any of it.
+
+For example, with AFS, a cell would be added to the primary index. This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+
+ cell->cache =
+ fscache_acquire_cookie(afs_cache_netfs.primary_index,
+ &afs_cell_cache_index_def,
+ cell);
+
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+
+ vlocation->cache =
+ fscache_acquire_cookie(cell->cache,
+ &afs_vlocation_cache_index_def,
+ vlocation);
+
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+
+ volume->cache =
+ fscache_acquire_cookie(vlocation->cache,
+ &afs_volume_cache_index_def,
+ volume);
+
+
+======================
+DATA FILE REGISTRATION
+======================
+
+The fourth step is to request a data file be created in the cache. This is
+identical to index cookie acquisition. The only difference is that the type in
+the object definition should be something other than index type.
+
+ vnode->cache =
+ fscache_acquire_cookie(volume->cache,
+ &afs_vnode_cache_object_def,
+ vnode);
+
+
+=================================
+MISCELLANEOUS OBJECT REGISTRATION
+=================================
+
+An optional step is to request an object of miscellaneous type be created in
+the cache. This is almost identical to index cookie acquisition. The only
+difference is that the type in the object definition should be something other
+than index type. Whilst the parent object could be an index, it's more likely
+it would be some other type of object such as a data file.
+
+ xattr->cache =
+ fscache_acquire_cookie(vnode->cache,
+ &afs_xattr_cache_object_def,
+ xattr);
+
+Miscellaneous objects might be used to store extended attributes or directory
+entries for example.
+
+
+==========================
+SETTING THE DATA FILE SIZE
+==========================
+
+The fifth step is to set the physical attributes of the file, such as its size.
+This doesn't automatically reserve any space in the cache, but permits the
+cache to adjust its metadata for data tracking appropriately:
+
+ int fscache_attr_changed(struct fscache_cookie *cookie);
+
+The cache will return -ENOBUFS if there is no backing cache or if there is no
+space to allocate any extra metadata required in the cache. The attributes
+will be accessed with the get_attr() cookie definition operation.
+
+Note that attempts to read or write data pages in the cache over this size may
+be rebuffed with -ENOBUFS.
+
+This operation schedules an attribute adjustment to happen asynchronously at
+some point in the future, and as such, it may happen after the function returns
+to the caller. The attribute adjustment excludes read and write operations.
+
+
+=====================
+PAGE READ/ALLOC/WRITE
+=====================
+
+And the sixth step is to store and retrieve pages in the cache. There are
+three functions that are used to do this.
+
+Note:
+
+ (1) A page should not be re-read or re-allocated without uncaching it first.
+
+ (2) A read or allocated page must be uncached when the netfs page is released
+ from the pagecache.
+
+ (3) A page should only be written to the cache if previous read or allocated.
+
+This permits the cache to maintain its page tracking in proper order.
+
+
+PAGE READ
+---------
+
+Firstly, the netfs should ask FS-Cache to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+
+ typedef
+ void (*fscache_rw_complete_t)(struct page *page,
+ void *context,
+ int error);
+
+ int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *context,
+ gfp_t gfp);
+
+The cookie argument must specify a cookie for an object that isn't an index,
+the page specified will have the data loaded into it (and is also used to
+specify the page number), and the gfp argument is used to control how any
+memory allocations made are satisfied.
+
+If the cookie indicates the inode is not cached:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a copy of the page resident in the cache:
+
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+
+ (2) The function will submit a request to read the data from the cache's
+ backing device directly into the page specified.
+
+ (3) The function will return 0.
+
+ (4) When the read is complete, end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The context argument passed to the above function. This will be
+ maintained with the get_context/put_context functions mentioned above.
+
+ (*) An argument that's 0 on success or negative for an error code.
+
+ If an error occurs, it should be assumed that the page contains no usable
+ data.
+
+ end_io_func() will be called in process context if the read is results in
+ an error, but it might be called in interrupt context if the read is
+ successful.
+
+Otherwise, if there's not a copy available in cache, but the cache may be able
+to store the page:
+
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+
+ (2) A block may be reserved in the cache and attached to the object at the
+ appropriate place.
+
+ (3) The function will return -ENODATA.
+
+This function may also return -ENOMEM or -EINTR, in which case it won't have
+read any data from the cache.
+
+
+PAGE ALLOCATE
+-------------
+
+Alternatively, if there's not expected to be any data in the cache for a page
+because the file has been extended, a block can simply be allocated instead:
+
+ int fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp);
+
+This is similar to the fscache_read_or_alloc_page() function, except that it
+never reads from the cache. It will return 0 if a block has been allocated,
+rather than -ENODATA as the other would. One or the other must be performed
+before writing to the cache.
+
+The mark_pages_cached() cookie operation will be called on the page if
+successful.
+
+
+PAGE WRITE
+----------
+
+Secondly, if the netfs changes the contents of the page (either due to an
+initial download or if a user performs a write), then the page should be
+written back to the cache:
+
+ int fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp);
+
+The cookie argument must specify a data file cookie, the page specified should
+contain the data to be written (and is also used to specify the page number),
+and the gfp argument is used to control how any memory allocations made are
+satisfied.
+
+The page must have first been read or allocated successfully and must not have
+been uncached before writing is performed.
+
+If the cookie indicates the inode is not cached then:
+
+ (1) The function will return -ENOBUFS.
+
+Else if space can be allocated in the cache to hold this page:
+
+ (1) PG_fscache_write will be set on the page.
+
+ (2) The function will submit a request to write the data to cache's backing
+ device directly from the page specified.
+
+ (3) The function will return 0.
+
+ (4) When the write is complete PG_fscache_write is cleared on the page and
+ anyone waiting for that bit will be woken up.
+
+Else if there's no space available in the cache, -ENOBUFS will be returned. It
+is also possible for the PG_fscache_write bit to be cleared when no write took
+place if unforeseen circumstances arose (such as a disk error).
+
+Writing takes place asynchronously.
+
+
+MULTIPLE PAGE READ
+------------------
+
+A facility is provided to read several pages at once, as requested by the
+readpages() address space operation:
+
+ int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ int *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *context,
+ gfp_t gfp);
+
+This works in a similar way to fscache_read_or_alloc_page(), except:
+
+ (1) Any page it can retrieve data for is removed from pages and nr_pages and
+ dispatched for reading to the disk. Reads of adjacent pages on disk may
+ be merged for greater efficiency.
+
+ (2) The mark_pages_cached() cookie operation will be called on several pages
+ at once if they're being read or allocated.
+
+ (3) If there was an general error, then that error will be returned.
+
+ Else if some pages couldn't be allocated or read, then -ENOBUFS will be
+ returned.
+
+ Else if some pages couldn't be read but were allocated, then -ENODATA will
+ be returned.
+
+ Otherwise, if all pages had reads dispatched, then 0 will be returned, the
+ list will be empty and *nr_pages will be 0.
+
+ (4) end_io_func will be called once for each page being read as the reads
+ complete. It will be called in process context if error != 0, but it may
+ be called in interrupt context if there is no error.
+
+Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
+some of the pages being read and some being allocated. Those pages will have
+been marked appropriately and will need uncaching.
+
+
+==============
+PAGE UNCACHING
+==============
+
+To uncache a page, this function should be called:
+
+ void fscache_uncache_page(struct fscache_cookie *cookie,
+ struct page *page);
+
+This function permits the cache to release any in-memory representation it
+might be holding for this netfs page. This function must be called once for
+each page on which the read or write page functions above have been called to
+make sure the cache's in-memory tracking information gets torn down.
+
+Note that pages can't be explicitly deleted from the a data file. The whole
+data file must be retired (see the relinquish cookie function below).
+
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions.
+
+
+==========================
+INDEX AND DATA FILE UPDATE
+==========================
+
+To request an update of the index data for an index or other object, the
+following function should be called:
+
+ void fscache_update_cookie(struct fscache_cookie *cookie);
+
+This function will refer back to the netfs_data pointer stored in the cookie by
+the acquisition function to obtain the data to write into each revised index
+entry. The update method in the parent index definition will be called to
+transfer the data.
+
+Note that partial updates may happen automatically at other times, such as when
+data blocks are added to a data file object.
+
+
+===============================
+MISCELLANEOUS COOKIE OPERATIONS
+===============================
+
+There are a number of operations that can be used to control cookies:
+
+ (*) Cookie pinning:
+
+ int fscache_pin_cookie(struct fscache_cookie *cookie);
+ void fscache_unpin_cookie(struct fscache_cookie *cookie);
+
+ These operations permit data cookies to be pinned into the cache and to
+ have the pinning removed. They are not permitted on index cookies.
+
+ The pinning function will return 0 if successful, -ENOBUFS in the cookie
+ isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
+ -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+ -EIO if there's any other problem.
+
+ (*) Data space reservation:
+
+ int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+
+ This permits a netfs to request cache space be reserved to store up to the
+ given amount of a file. It is permitted to ask for more than the current
+ size of the file to allow for future file expansion.
+
+ If size is given as zero then the reservation will be cancelled.
+
+ The function will return 0 if successful, -ENOBUFS in the cookie isn't
+ backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
+ -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+ -EIO if there's any other problem.
+
+ Note that this doesn't pin an object in a cache; it can still be culled to
+ make space if it's not in use.
+
+
+=====================
+COOKIE UNREGISTRATION
+=====================
+
+To get rid of a cookie, this function should be called.
+
+ void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+ int retire);
+
+If retire is non-zero, then the object will be marked for recycling, and all
+copies of it will be removed from all active caches in which it is present.
+Not only that but all child objects will also be retired.
+
+If retire is zero, then the object may be available again when next the
+acquisition function is called. Retirement here will overrule the pinning on a
+cookie.
+
+One very important note - relinquish must NOT be called for a cookie unless all
+the cookies for "child" indices, objects and pages have been relinquished
+first.
+
+
+================================
+INDEX AND DATA FILE INVALIDATION
+================================
+
+There is no direct way to invalidate an index subtree or a data file. To do
+this, the caller should relinquish and retire the cookie they have, and then
+acquire a new one.
diff --git a/fs/Kconfig b/fs/Kconfig
index 58a0650..3e1c276 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -631,6 +631,12 @@ config GENERIC_ACL
bool
select FS_POSIX_ACL

+menu "Caches"
+
+source "fs/fscache/Kconfig"
+
+endmenu
+
if BLOCK
menu "CD-ROM/DVD Filesystems"

diff --git a/fs/Makefile b/fs/Makefile
index 720c29d..4eecc9a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_PROFILING) += dcookies.o
obj-$(CONFIG_DLM) += dlm/

# Do not add any filesystems before this line
+obj-$(CONFIG_FSCACHE) += fscache/
obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3
obj-$(CONFIG_EXT4DEV_FS) += ext4/ # Before ext2 so root fs can be ext4dev
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
new file mode 100644
index 0000000..e68c945
--- /dev/null
+++ b/fs/fscache/Kconfig
@@ -0,0 +1,49 @@
+
+config FSCACHE
+ tristate "General filesystem local caching manager"
+ depends on EXPERIMENTAL
+ help
+ This option enables a generic filesystem caching manager that can be
+ used by various network and other filesystems to cache data locally.
+ Different sorts of caches can be plugged in, depending on the
+ resources available.
+
+ See Documentation/filesystems/caching/fscache.txt for more information.
+
+config FSCACHE_PROC
+ bool "Provide /proc interface for local caching statistics"
+ depends on FSCACHE && PROC_FS
+
+config FSCACHE_STATS
+ bool "Gather statistical information on local caching"
+ depends on FSCACHE_PROC
+ help
+ This option causes statistical information to be gathered on local
+ caching and exported through files:
+
+ /proc/fs/fscache/stats
+ /proc/fs/fscache/pool
+
+ See Documentation/filesystems/caching/fscache.txt for more information.
+
+config FSCACHE_HISTOGRAM
+ bool "Gather latency information on local caching"
+ depends on FSCACHE_PROC
+ help
+
+ This option causes latency information to be gathered on local
+ caching and exported through file:
+
+ /proc/fs/fscache/histogram
+
+ See Documentation/filesystems/caching/fscache.txt for more information.
+
+config FSCACHE_DEBUG
+ bool "Debug FS-Cache"
+ depends on FSCACHE
+ help
+ This permits debugging to be dynamically enabled in the local caching
+ management module. If this is set, the debugging output may be
+ enabled by setting bits in /sys/modules/fscache/parameter/debug.
+
+ See Documentation/filesystems/caching/fscache.txt for more information.
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
new file mode 100644
index 0000000..e60dad3
--- /dev/null
+++ b/fs/fscache/Makefile
@@ -0,0 +1,19 @@
+#
+# Makefile for general filesystem caching code
+#
+
+fscache-y := \
+ fsc-cache.o \
+ fsc-cookie.o \
+ fsc-fsdef.o \
+ fsc-main.o \
+ fsc-manage.o \
+ fsc-object.o \
+ fsc-page.o \
+ fsc-threads.o
+
+fscache-$(CONFIG_FSCACHE_PROC) += \
+ fsc-proc.o \
+ fsc-stats.o
+
+obj-$(CONFIG_FSCACHE) := fscache.o
diff --git a/fs/fscache/fsc-cache.c b/fs/fscache/fsc-cache.c
new file mode 100644
index 0000000..1d40058
--- /dev/null
+++ b/fs/fscache/fsc-cache.c
@@ -0,0 +1,512 @@
+/* FS-Cache cache handling
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "fsc-internal.h"
+
+LIST_HEAD(fscache_cache_list);
+DECLARE_RWSEM(fscache_addremove_sem);
+DECLARE_WAIT_QUEUE_HEAD(fscache_clearance_wq);
+static LIST_HEAD(fscache_cache_tag_list);
+static LIST_HEAD(fscache_netfs_list);
+
+static struct sysfs_ops fscache_cache_sysfs_ops = {
+};
+
+static struct kobj_type fscache_cache_ktype = {
+ .sysfs_ops = &fscache_cache_sysfs_ops,
+};
+
+/*
+ * look up a cache tag
+ */
+struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
+{
+ struct fscache_cache_tag *tag, *xtag;
+
+ /* firstly check for the existence of the tag under read lock */
+ down_read(&fscache_addremove_sem);
+
+ list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+ if (strcmp(tag->name, name) == 0) {
+ atomic_inc(&tag->usage);
+ up_read(&fscache_addremove_sem);
+ return tag;
+ }
+ }
+
+ up_read(&fscache_addremove_sem);
+
+ /* the tag does not exist - create a candidate */
+ xtag = kzalloc(sizeof(*xtag) + strlen(name) + 1, GFP_KERNEL);
+ if (!xtag)
+ /* return a dummy tag if out of memory */
+ return ERR_PTR(-ENOMEM);
+
+ atomic_set(&xtag->usage, 1);
+ strcpy(xtag->name, name);
+
+ /* write lock, search again and add if still not present */
+ down_write(&fscache_addremove_sem);
+
+ list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+ if (strcmp(tag->name, name) == 0) {
+ atomic_inc(&tag->usage);
+ up_write(&fscache_addremove_sem);
+ kfree(xtag);
+ return tag;
+ }
+ }
+
+ list_add_tail(&xtag->link, &fscache_cache_tag_list);
+ up_write(&fscache_addremove_sem);
+ return xtag;
+}
+
+/*
+ * release a reference to a cache tag
+ */
+void __fscache_release_cache_tag(struct fscache_cache_tag *tag)
+{
+ if (tag != ERR_PTR(-ENOMEM)) {
+ down_write(&fscache_addremove_sem);
+
+ if (atomic_dec_and_test(&tag->usage))
+ list_del_init(&tag->link);
+ else
+ tag = NULL;
+
+ up_write(&fscache_addremove_sem);
+
+ kfree(tag);
+ }
+}
+
+/*
+ * register a network filesystem for caching
+ */
+int __fscache_register_netfs(struct fscache_netfs *netfs)
+{
+ struct fscache_netfs *ptr;
+ int ret;
+
+ _enter("{%s}", netfs->name);
+
+ INIT_LIST_HEAD(&netfs->link);
+
+ /* allocate a cookie for the primary index */
+ netfs->primary_index =
+ kmem_cache_zalloc(fscache_cookie_jar, GFP_KERNEL);
+
+ if (!netfs->primary_index) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ /* initialise the primary index cookie */
+ atomic_set(&netfs->primary_index->usage, 1);
+ atomic_set(&netfs->primary_index->n_children, 0);
+
+ netfs->primary_index->def = &fscache_fsdef_netfs_def;
+ netfs->primary_index->parent = &fscache_fsdef_index;
+ netfs->primary_index->netfs_data = netfs;
+
+ atomic_inc(&netfs->primary_index->parent->usage);
+ atomic_inc(&netfs->primary_index->parent->n_children);
+
+ spin_lock_init(&netfs->primary_index->lock);
+ INIT_HLIST_HEAD(&netfs->primary_index->backing_objects);
+
+ /* check the netfs type is not already present */
+ down_write(&fscache_addremove_sem);
+
+ ret = -EEXIST;
+ list_for_each_entry(ptr, &fscache_netfs_list, link) {
+ if (strcmp(ptr->name, netfs->name) == 0)
+ goto already_registered;
+ }
+
+ list_add(&netfs->link, &fscache_netfs_list);
+ ret = 0;
+
+ printk(KERN_NOTICE "FS-Cache: Netfs '%s' registered for caching\n",
+ netfs->name);
+
+already_registered:
+ up_write(&fscache_addremove_sem);
+
+ if (ret < 0) {
+ netfs->primary_index->parent = NULL;
+ __fscache_cookie_put(netfs->primary_index);
+ netfs->primary_index = NULL;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+EXPORT_SYMBOL(__fscache_register_netfs);
+
+/*
+ * unregister a network filesystem from the cache
+ * - all cookies must have been released first
+ */
+void __fscache_unregister_netfs(struct fscache_netfs *netfs)
+{
+ _enter("{%s.%u}", netfs->name, netfs->version);
+
+ down_write(&fscache_addremove_sem);
+
+ list_del(&netfs->link);
+ fscache_relinquish_cookie(netfs->primary_index, 0);
+
+ up_write(&fscache_addremove_sem);
+
+ printk(KERN_NOTICE "FS-Cache: Netfs '%s' unregistered from caching\n",
+ netfs->name);
+
+ _leave("");
+}
+
+EXPORT_SYMBOL(__fscache_unregister_netfs);
+
+/**
+ * fscache_init_cache - Initialise a cache record
+ * @cache: The cache record to be initialised
+ * @ops: The cache operations to be installed in that record
+ * @idfmt: Format string to define identifier
+ * @...: sprintf-style arguments
+ *
+ * Initialise a record of a cache and fill in the name.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+void fscache_init_cache(struct fscache_cache *cache,
+ const struct fscache_cache_ops *ops,
+ const char *idfmt,
+ ...)
+{
+ va_list va;
+
+ memset(cache, 0, sizeof(*cache));
+
+ cache->ops = ops;
+ cache->kobj.kset = &fscache_kset;
+ cache->kobj.ktype = &fscache_cache_ktype;
+
+ va_start(va, idfmt);
+ vsnprintf(cache->identifier, sizeof(cache->identifier), idfmt, va);
+ va_end(va);
+
+ INIT_LIST_HEAD(&cache->link);
+ INIT_LIST_HEAD(&cache->object_list);
+ spin_lock_init(&cache->object_list_lock);
+}
+
+EXPORT_SYMBOL(fscache_init_cache);
+
+/**
+ * fscache_add_cache - Declare a cache as being open for business
+ * @cache: The record describing the cache
+ * @ifsdef: The record of the cache object describing the top-level index
+ * @tagname: The tag describing this cache
+ *
+ * Add a cache to the system, making it available for netfs's to use.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+int fscache_add_cache(struct fscache_cache *cache,
+ struct fscache_object *ifsdef,
+ const char *tagname)
+{
+ struct fscache_cache_tag *tag;
+ int ret;
+
+ BUG_ON(!cache->ops);
+ BUG_ON(!ifsdef);
+
+ /* make sure the worker threads are present */
+ down_write(&fscache_addremove_sem);
+ fscache_init_threads();
+ up_write(&fscache_addremove_sem);
+
+ cache->flags = 0;
+ ifsdef->event_mask = ULONG_MAX & ~(1 << FSCACHE_OBJECT_EV_CLEARED);
+ ifsdef->state = FSCACHE_OBJECT_ACTIVE;
+
+ if (!tagname)
+ tagname = cache->identifier;
+
+ BUG_ON(!tagname[0]);
+
+ _enter("{%s.%s},,%s", cache->ops->name, cache->identifier, tagname);
+
+ /* we use the cache tag to uniquely identify caches */
+ tag = __fscache_lookup_cache_tag(tagname);
+ if (IS_ERR(tag))
+ goto nomem;
+
+ if (test_and_set_bit(FSCACHE_TAG_RESERVED, &tag->flags))
+ goto tag_in_use;
+
+ ret = kobject_set_name(&cache->kobj, "%s", tagname);
+ if (ret < 0)
+ goto error;
+
+ ret = kobject_register(&cache->kobj);
+ if (ret < 0)
+ goto error;
+
+ if (!cache->ops->grab_object(ifsdef))
+ BUG();
+
+ ifsdef->cookie = &fscache_fsdef_index;
+ ifsdef->cache = cache;
+ cache->fsdef = ifsdef;
+
+ down_write(&fscache_addremove_sem);
+
+ tag->cache = cache;
+ cache->tag = tag;
+
+ /* add the cache to the list */
+ list_add(&cache->link, &fscache_cache_list);
+
+ /* add the cache's netfs definition index object to the cache's
+ * list */
+ spin_lock(&cache->object_list_lock);
+ list_add_tail(&ifsdef->cache_link, &cache->object_list);
+ spin_unlock(&cache->object_list_lock);
+
+ /* add the cache's netfs definition index object to the top level index
+ * cookie as a known backing object */
+ spin_lock(&fscache_fsdef_index.lock);
+
+ hlist_add_head(&ifsdef->cookie_link,
+ &fscache_fsdef_index.backing_objects);
+
+ atomic_inc(&fscache_fsdef_index.usage);
+
+ /* done */
+ spin_unlock(&fscache_fsdef_index.lock);
+ up_write(&fscache_addremove_sem);
+
+ printk(KERN_NOTICE "FS-Cache: Cache \"%s\" added (type %s)\n",
+ cache->tag->name, cache->ops->name);
+
+ _leave(" = 0 [%s]", cache->identifier);
+ return 0;
+
+tag_in_use:
+ printk(KERN_ERR "FS-Cache: Cache tag '%s' already in use\n", tagname);
+ __fscache_release_cache_tag(tag);
+ _leave(" = -EXIST");
+ return -EEXIST;
+
+error:
+ __fscache_release_cache_tag(tag);
+ _leave(" = %d", ret);
+ return ret;
+
+nomem:
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+}
+
+EXPORT_SYMBOL(fscache_add_cache);
+
+/*
+ * select a cache in which to store an object
+ * - the cache addremove semaphore must be at least read-locked by the caller
+ * - the object will never be an index
+ */
+struct fscache_cache *fscache_select_cache_for_object(
+ struct fscache_cookie *cookie)
+{
+ struct fscache_cache_tag *tag;
+ struct fscache_object *object;
+ struct fscache_cache *cache;
+
+ _enter("");
+
+ if (list_empty(&fscache_cache_list)) {
+ _leave(" = NULL [no cache]");
+ return NULL;
+ }
+
+ /* we check the parent to determine the cache to use */
+ spin_lock(&cookie->lock);
+
+ /* the first in the parent's backing list should be the preferred
+ * cache */
+ if (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ cache = object->cache;
+ if (object->state >= FSCACHE_OBJECT_DYING ||
+ test_bit(FSCACHE_IOERROR, &cache->flags))
+ cache = NULL;
+
+ spin_unlock(&cookie->lock);
+ _leave(" = %p [parent]", cache);
+ return cache;
+ }
+
+ /* the parent is unbacked */
+ if (cookie->def->type != FSCACHE_COOKIE_TYPE_INDEX) {
+ /* cookie not an index and is unbacked */
+ spin_unlock(&cookie->lock);
+ _leave(" = NULL [cookie ub,ni]");
+ return NULL;
+ }
+
+ spin_unlock(&cookie->lock);
+
+ if (!cookie->def->select_cache)
+ goto no_preference;
+
+ /* ask the netfs for its preference */
+ tag = cookie->def->select_cache(cookie->parent->netfs_data,
+ cookie->netfs_data);
+ if (!tag)
+ goto no_preference;
+
+ if (tag == ERR_PTR(-ENOMEM)) {
+ _leave(" = NULL [nomem tag]");
+ return NULL;
+ }
+
+ if (!tag->cache) {
+ _leave(" = NULL [unbacked tag]");
+ return NULL;
+ }
+
+ if (test_bit(FSCACHE_IOERROR, &tag->cache->flags))
+ return NULL;
+
+ _leave(" = %p [specific]", tag->cache);
+ return tag->cache;
+
+no_preference:
+ /* netfs has no preference - just select first cache */
+ cache = list_entry(fscache_cache_list.next,
+ struct fscache_cache, link);
+ _leave(" = %p [first]", cache);
+ return cache;
+}
+
+/**
+ * fscache_io_error - Note a cache I/O error
+ * @cache: The record describing the cache
+ *
+ * Note that an I/O error occurred in a cache and that it should no longer be
+ * used for anything. This also reports the error into the kernel log.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+void fscache_io_error(struct fscache_cache *cache)
+{
+ set_bit(FSCACHE_IOERROR, &cache->flags);
+
+ printk(KERN_ERR "FS-Cache: Cache %s stopped due to I/O error\n",
+ cache->ops->name);
+}
+
+EXPORT_SYMBOL(fscache_io_error);
+
+/**
+ * fscache_withdraw_cache - Withdraw a cache from the active service
+ * @cache: The record describing the cache
+ *
+ * Withdraw a cache from service, unbinding all its cache objects from the
+ * netfs cookies they're currently representing.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+void fscache_withdraw_cache(struct fscache_cache *cache)
+{
+ struct fscache_object *object;
+ LIST_HEAD(object_list);
+
+ _enter("");
+
+ printk(KERN_NOTICE "FS-Cache: Withdrawing cache \"%s\"\n",
+ cache->tag->name);
+
+ /* make the cache unavailable for cookie acquisition */
+ if (test_and_set_bit(FSCACHE_CACHE_WITHDRAWN, &cache->flags))
+ BUG();
+
+ down_write(&fscache_addremove_sem);
+ list_del_init(&cache->link);
+ cache->tag->cache = NULL;
+ up_write(&fscache_addremove_sem);
+
+ /* make sure all pages pinned by operations on behalf of the netfs are
+ * written to disk */
+ cache->ops->sync_cache(cache);
+
+ /* dissociate all the netfs pages backed by this cache from the block
+ * mappings in the cache */
+ cache->ops->dissociate_pages(cache);
+
+ /* we now have to destroy all the active objects pertaining to this
+ * cache - which we do by passing them off to thread pool to dispose
+ * of */
+ _debug("destroy");
+
+ spin_lock(&cache->object_list_lock);
+
+ while (!list_empty(&cache->object_list)) {
+ object = list_entry(cache->object_list.next,
+ struct fscache_object, cache_link);
+ list_move_tail(&object->cache_link, &object_list);
+
+ _debug("withdraw %p", object->cookie);
+
+ spin_lock(&object->lock);
+ spin_unlock(&cache->object_list_lock);
+ fscache_raise_event(object, FSCACHE_OBJECT_EV_WITHDRAW);
+ spin_unlock(&object->lock);
+
+ cond_resched();
+ spin_lock(&cache->object_list_lock);
+ }
+
+ spin_unlock(&cache->object_list_lock);
+
+ /* wait for all extant objects to finish their outstanding operations
+ * and go away */
+ _debug("wait for finish");
+ wait_event(fscache_clearance_wq,
+ atomic_read(&cache->thread_usage) == 0);
+ _debug("wait for clearance");
+ wait_event(fscache_clearance_wq,
+ list_empty(&cache->object_list));
+ _debug("cleared");
+
+ kobject_unregister(&cache->kobj);
+
+ clear_bit(FSCACHE_TAG_RESERVED, &cache->tag->flags);
+ fscache_release_cache_tag(cache->tag);
+ cache->tag = NULL;
+
+ _leave("");
+}
+
+EXPORT_SYMBOL(fscache_withdraw_cache);
diff --git a/fs/fscache/fsc-cookie.c b/fs/fscache/fsc-cookie.c
new file mode 100644
index 0000000..92bfc73
--- /dev/null
+++ b/fs/fscache/fsc-cookie.c
@@ -0,0 +1,494 @@
+/* netfs cookie management
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL COOKIE
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "fsc-internal.h"
+
+struct kmem_cache *fscache_cookie_jar;
+
+static atomic_t fscache_object_debug_id = ATOMIC_INIT(0);
+
+static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie);
+static int fscache_alloc_object(struct fscache_cache *cache,
+ struct fscache_cookie *cookie);
+static int fscache_attach_object(struct fscache_cookie *cookie,
+ struct fscache_object *object);
+
+/*
+ * initialise an cookie jar slab element prior to any use
+ */
+void fscache_cookie_init_once(void *_cookie, struct kmem_cache *cachep,
+ unsigned long flags)
+{
+ struct fscache_cookie *cookie = _cookie;
+
+ memset(cookie, 0, sizeof(*cookie));
+ spin_lock_init(&cookie->lock);
+ INIT_HLIST_HEAD(&cookie->backing_objects);
+}
+
+/*
+ * request a cookie to represent an object (index, datafile, xattr, etc)
+ * - parent specifies the parent object
+ * - the top level index cookie for each netfs is stored in the fscache_netfs
+ * struct upon registration
+ * - def points to the definition
+ * - the netfs_data will be passed to the functions pointed to in *def
+ * - all attached caches will be searched to see if they contain this object
+ * - index objects aren't stored on disk until there's a dependent file that
+ * needs storing
+ * - other objects are stored in a selected cache immediately, and all the
+ * indices forming the path to it are instantiated if necessary
+ * - we never let on to the netfs about errors
+ * - we may set a negative cookie pointer, but that's okay
+ */
+struct fscache_cookie *__fscache_acquire_cookie(struct fscache_cookie *parent,
+ const struct fscache_cookie_def *def,
+ void *netfs_data)
+{
+ struct fscache_cookie *cookie;
+
+ BUG_ON(!def);
+
+ _enter("{%s},{%s},%p",
+ parent ? (char *) parent->def->name : "<no-parent>",
+ def->name, netfs_data);
+
+ fscache_stat(&fscache_n_acquires);
+
+ /* if there's no parent cookie, then we don't create one here either */
+ if (!parent) {
+ fscache_stat(&fscache_n_acquires_null);
+ _leave(" [no parent]");
+ return NULL;
+ }
+
+ /* validate the definition */
+ BUG_ON(!def->get_key);
+ BUG_ON(!def->name[0]);
+
+ BUG_ON(def->type == FSCACHE_COOKIE_TYPE_INDEX &&
+ parent->def->type != FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* allocate and initialise a cookie */
+ cookie = kmem_cache_alloc(fscache_cookie_jar, GFP_KERNEL);
+ if (!cookie) {
+ fscache_stat(&fscache_n_acquires_oom);
+ _leave(" [ENOMEM]");
+ return NULL;
+ }
+
+ atomic_set(&cookie->usage, 1);
+ atomic_set(&cookie->n_children, 0);
+
+ atomic_inc(&parent->usage);
+ atomic_inc(&parent->n_children);
+
+ cookie->def = def;
+ cookie->parent = parent;
+ cookie->netfs_data = netfs_data;
+ cookie->flags = 0;
+
+ switch (cookie->def->type) {
+ case FSCACHE_COOKIE_TYPE_INDEX:
+ fscache_stat(&fscache_n_cookie_index);
+ break;
+ case FSCACHE_COOKIE_TYPE_DATAFILE:
+ fscache_stat(&fscache_n_cookie_data);
+ break;
+ default:
+ fscache_stat(&fscache_n_cookie_special);
+ break;
+ }
+
+ /* if the object is an index then we need do nothing more here - we
+ * create indices on disk when we need them as an index may exist in
+ * multiple caches */
+ if (cookie->def->type != FSCACHE_COOKIE_TYPE_INDEX) {
+ if (fscache_acquire_non_index_cookie(cookie) < 0) {
+ atomic_dec(&parent->n_children);
+ __fscache_cookie_put(cookie);
+ fscache_stat(&fscache_n_acquires_nobufs);
+ _leave(" = NULL");
+ return NULL;
+ }
+ }
+
+ fscache_stat(&fscache_n_acquires_ok);
+ _leave(" = %p", cookie);
+ return cookie;
+}
+
+EXPORT_SYMBOL(__fscache_acquire_cookie);
+
+/*
+ * acquire a non-index cookie
+ * - this must make sure the index chain is instantiated and instantiate the
+ * object representation too
+ */
+static int fscache_acquire_non_index_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ struct fscache_cache *cache;
+ loff_t i_size;
+ int ret;
+
+ _enter("");
+
+ cookie->flags = 1 << FSCACHE_COOKIE_UNAVAILABLE;
+
+ /* now we need to see whether the backing objects for this cookie yet
+ * exist, if not there'll be nothing to search */
+ down_read(&fscache_addremove_sem);
+
+ if (list_empty(&fscache_cache_list)) {
+ up_read(&fscache_addremove_sem);
+ _leave(" = 0 [no caches]");
+ return 0;
+ }
+
+ /* select a cache in which to store the object */
+ cache = fscache_select_cache_for_object(cookie->parent);
+ if (!cache) {
+ up_read(&fscache_addremove_sem);
+ fscache_stat(&fscache_n_acquires_no_cache);
+ _leave(" = -ENOMEDIUM [no cache]");
+ return -ENOMEDIUM;
+ }
+
+ _debug("cache %s", cache->tag->name);
+
+ cookie->flags =
+ (1 << FSCACHE_COOKIE_LOOKING_UP) |
+ (1 << FSCACHE_COOKIE_CREATING) |
+ (1 << FSCACHE_COOKIE_NO_DATA_YET);
+
+ /* ask the cache to allocate objects for this cookie and its parent
+ * chain */
+ ret = fscache_alloc_object(cache, cookie);
+ if (ret < 0) {
+ up_read(&fscache_addremove_sem);
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ /* initiate the process of looking up all the objects in the chain */
+ cookie->def->get_attr(cookie->netfs_data, &i_size);
+
+ spin_lock(&cookie->lock);
+ if (hlist_empty(&cookie->backing_objects)) {
+ spin_unlock(&cookie->lock);
+ goto unavailable;
+ }
+
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ fscache_set_store_limit(object, i_size);
+ fscache_enqueue_object(object);
+ spin_unlock(&cookie->lock);
+
+ /* we may be required to wait for lookup to complete at this point */
+ if (!fscache_defer_lookup) {
+ _debug("non-deferred lookup %p", &cookie->flags);
+ wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP,
+ fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ _debug("complete");
+ if (test_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags))
+ goto unavailable;
+ }
+
+ up_read(&fscache_addremove_sem);
+ _leave(" = 0 [deferred]");
+ return 0;
+
+unavailable:
+ up_read(&fscache_addremove_sem);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+/*
+ * recursively allocate cache object records for a cookie/cache combination
+ * - caller must be holding the addremove sem
+ */
+static int fscache_alloc_object(struct fscache_cache *cache,
+ struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ struct hlist_node *_n;
+ int ret;
+
+ _enter("%p,%p{%s}", cache, cookie, cookie->def->name);
+
+ spin_lock(&cookie->lock);
+ hlist_for_each_entry(object, _n, &cookie->backing_objects,
+ cookie_link) {
+ if (object->cache == cache)
+ goto object_already_extant;
+ }
+ spin_unlock(&cookie->lock);
+
+ /* ask the cache to allocate an object (we may end up with duplicate
+ * objects at this stage, but we sort that out later) */
+ object = cache->ops->alloc_object(cache, cookie);
+ if (IS_ERR(object)) {
+ fscache_stat(&fscache_n_object_no_alloc);
+ ret = PTR_ERR(object);
+ goto error;
+ }
+
+ fscache_stat(&fscache_n_object_alloc);
+
+ object->debug_id = atomic_inc_return(&fscache_object_debug_id);
+
+ _debug("ALLOC OBJ%x: %s {%lx}",
+ object->debug_id, cookie->def->name, object->events);
+
+ ret = fscache_alloc_object(cache, cookie->parent);
+ if (ret < 0)
+ goto error_put;
+
+ /* only attach if we managed to allocate all we needed, otherwise
+ * discard the object we just allocated and instead use the one
+ * attached to the cookie */
+ if (fscache_attach_object(cookie, object) < 0)
+ cache->ops->put_object(object);
+
+ _leave(" = 0");
+ return 0;
+
+object_already_extant:
+ ret = -ENOBUFS;
+ if (object->state >= FSCACHE_OBJECT_DYING) {
+ spin_unlock(&cookie->lock);
+ goto error;
+ }
+ spin_unlock(&cookie->lock);
+ _leave(" = 0 [found]");
+ return 0;
+
+error_put:
+ cache->ops->put_object(object);
+error:
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * attach a cache object to a cookie
+ */
+static int fscache_attach_object(struct fscache_cookie *cookie,
+ struct fscache_object *object)
+{
+ struct fscache_object *p;
+ struct fscache_cache *cache = object->cache;
+ struct hlist_node *_n;
+ int ret;
+
+ _enter("{%s},{OBJ%x}", cookie->def->name, object->debug_id);
+
+ spin_lock(&cookie->lock);
+
+ /* there may be multiple initial creations of this object, but we only
+ * want one */
+ ret = -EEXIST;
+ hlist_for_each_entry(p, _n, &cookie->backing_objects, cookie_link) {
+ if (p->cache == object->cache) {
+ if (p->state >= FSCACHE_OBJECT_DYING)
+ ret = -ENOBUFS;
+ goto cant_attach_object;
+ }
+ }
+
+ /* pin the parent object */
+ spin_lock_nested(&cookie->parent->lock, 1);
+ hlist_for_each_entry(p, _n, &cookie->parent->backing_objects,
+ cookie_link) {
+ if (p->cache == object->cache) {
+ if (p->state >= FSCACHE_OBJECT_DYING) {
+ ret = -ENOBUFS;
+ spin_unlock(&cookie->parent->lock);
+ goto cant_attach_object;
+ }
+ object->parent = p;
+ spin_lock(&p->lock);
+ p->n_children++;
+ spin_unlock(&p->lock);
+ break;
+ }
+ }
+ spin_unlock(&cookie->parent->lock);
+
+ /* attach to the cache's object list */
+ if (list_empty(&object->cache_link)) {
+ spin_lock(&cache->object_list_lock);
+ list_add(&object->cache_link, &cache->object_list);
+ spin_unlock(&cache->object_list_lock);
+ }
+
+ /* attach to the cookie */
+ object->cookie = cookie;
+ atomic_inc(&cookie->usage);
+ hlist_add_head(&object->cookie_link, &cookie->backing_objects);
+ ret = 0;
+
+cant_attach_object:
+ spin_unlock(&cookie->lock);
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * update the index entries backing a cookie
+ */
+void __fscache_update_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ struct hlist_node *_p;
+
+ fscache_stat(&fscache_n_updates);
+
+ if (!cookie) {
+ fscache_stat(&fscache_n_updates_null);
+ _leave(" [no cookie]");
+ return;
+ }
+
+ _enter("{%s}", cookie->def->name);
+
+ BUG_ON(!cookie->def->get_aux);
+
+ spin_lock(&cookie->lock);
+
+ /* update the index entry on disk in each cache backing this cookie */
+ hlist_for_each_entry(object, _p,
+ &cookie->backing_objects, cookie_link) {
+ fscache_raise_event(object, FSCACHE_OBJECT_EV_UPDATE);
+ }
+
+ spin_unlock(&cookie->lock);
+ _leave("");
+}
+
+EXPORT_SYMBOL(__fscache_update_cookie);
+
+/*
+ * release a cookie back to the cache
+ * - the object will be marked as recyclable on disk if retire is true
+ * - all dependents of this cookie must have already been unregistered
+ * (indices/files/pages)
+ */
+void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
+{
+ struct fscache_cache *cache;
+ struct fscache_object *object;
+ unsigned long event;
+
+ fscache_stat(&fscache_n_relinquishes);
+
+ if (!cookie) {
+ fscache_stat(&fscache_n_relinquishes_null);
+ _leave(" [no cookie]");
+ return;
+ }
+
+ _enter("%p{%s,%p},%d",
+ cookie, cookie->def->name, cookie->netfs_data, retire);
+
+ if (atomic_read(&cookie->n_children) != 0) {
+ printk(KERN_ERR "FS-Cache: Cookie '%s' still has children\n",
+ cookie->def->name);
+ BUG();
+ }
+
+ /* wait for the cookie to finish being instantiated (or to fail) */
+ if (test_bit(FSCACHE_COOKIE_CREATING, &cookie->flags)) {
+ fscache_stat(&fscache_n_relinquishes_waitcrt);
+ wait_on_bit(&cookie->flags, FSCACHE_COOKIE_CREATING,
+ fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ }
+
+ event = retire ? FSCACHE_OBJECT_EV_RETIRE : FSCACHE_OBJECT_EV_RELEASE;
+
+ /* detach pointers back to the netfs */
+ spin_lock(&cookie->lock);
+
+ cookie->netfs_data = NULL;
+ cookie->def = NULL;
+
+ /* break links with all the active objects */
+ while (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object,
+ cookie_link);
+
+ _debug("RELEASE OBJ%x", object->debug_id);
+
+ /* detach each cache object from the object cookie */
+ spin_lock(&object->lock);
+ hlist_del_init(&object->cookie_link);
+
+ cache = object->cache;
+ object->cookie = NULL;
+ fscache_raise_event(object, event);
+ spin_unlock(&object->lock);
+
+ if (atomic_dec_and_test(&cookie->usage))
+ /* the cookie refcount shouldn't be reduced to 0 yet */
+ BUG();
+ }
+
+ spin_unlock(&cookie->lock);
+
+ if (cookie->parent) {
+ ASSERTCMP(atomic_read(&cookie->parent->usage), >, 0);
+ ASSERTCMP(atomic_read(&cookie->parent->n_children), >, 0);
+ atomic_dec(&cookie->parent->n_children);
+ }
+
+ /* finally dispose of the cookie */
+ ASSERTCMP(atomic_read(&cookie->usage), >, 0);
+ fscache_cookie_put(cookie);
+
+ _leave("");
+}
+
+EXPORT_SYMBOL(__fscache_relinquish_cookie);
+
+/*
+ * destroy a cookie
+ */
+void __fscache_cookie_put(struct fscache_cookie *cookie)
+{
+ struct fscache_cookie *parent;
+
+ _enter("%p", cookie);
+
+ for (;;) {
+ _debug("FREE COOKIE %p", cookie);
+ parent = cookie->parent;
+ BUG_ON(!hlist_empty(&cookie->backing_objects));
+ kmem_cache_free(fscache_cookie_jar, cookie);
+
+ if (!parent)
+ break;
+
+ cookie = parent;
+ BUG_ON(atomic_read(&cookie->usage) <= 0);
+ if (!atomic_dec_and_test(&cookie->usage))
+ break;
+ }
+
+ _leave("");
+}
diff --git a/fs/fscache/fsc-fsdef.c b/fs/fscache/fsc-fsdef.c
new file mode 100644
index 0000000..7a0d1c0
--- /dev/null
+++ b/fs/fscache/fsc-fsdef.c
@@ -0,0 +1,111 @@
+/* Filesystem index definition
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/module.h>
+#include "fsc-internal.h"
+
+static uint16_t fscache_fsdef_netfs_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax);
+
+static uint16_t fscache_fsdef_netfs_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax);
+
+static fscache_checkaux_t fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen);
+
+struct fscache_cookie_def fscache_fsdef_netfs_def = {
+ .name = "FSDEF.netfs",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = fscache_fsdef_netfs_get_key,
+ .get_aux = fscache_fsdef_netfs_get_aux,
+ .check_aux = fscache_fsdef_netfs_check_aux,
+};
+
+static struct fscache_cookie_def fscache_fsdef_index_def = {
+ .name = ".FS-Cache",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+};
+
+struct fscache_cookie fscache_fsdef_index = {
+ .usage = ATOMIC_INIT(1),
+ .lock = SPIN_LOCK_UNLOCKED,
+ .backing_objects = HLIST_HEAD_INIT,
+ .def = &fscache_fsdef_index_def,
+};
+
+EXPORT_SYMBOL(fscache_fsdef_index);
+
+/*
+ * get the key data for an FSDEF index record
+ */
+static uint16_t fscache_fsdef_netfs_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct fscache_netfs *netfs = cookie_netfs_data;
+ unsigned klen;
+
+ _enter("{%s.%u},", netfs->name, netfs->version);
+
+ klen = strlen(netfs->name);
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, netfs->name, klen);
+ return klen;
+}
+
+/*
+ * get the auxilliary data for an FSDEF index record
+ */
+static uint16_t fscache_fsdef_netfs_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct fscache_netfs *netfs = cookie_netfs_data;
+ unsigned dlen;
+
+ _enter("{%s.%u},", netfs->name, netfs->version);
+
+ dlen = sizeof(uint32_t);
+ if (dlen > bufmax)
+ return 0;
+
+ memcpy(buffer, &netfs->version, dlen);
+ return dlen;
+}
+
+/*
+ * check that the version stored in the auxilliary data is correct
+ */
+static fscache_checkaux_t fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen)
+{
+ struct fscache_netfs *netfs = cookie_netfs_data;
+ uint32_t version;
+
+ _enter("{%s},,%hu", netfs->name, datalen);
+
+ if (datalen != sizeof(version)) {
+ _leave(" = OBSOLETE [dl=%d v=%zu]", datalen, sizeof(version));
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ memcpy(&version, data, sizeof(version));
+ if (version != netfs->version) {
+ _leave(" = OBSOLETE [ver=%x net=%x]", version, netfs->version);
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ _leave(" = OKAY");
+ return FSCACHE_CHECKAUX_OKAY;
+}
diff --git a/fs/fscache/fsc-internal.h b/fs/fscache/fsc-internal.h
new file mode 100644
index 0000000..875b8a9
--- /dev/null
+++ b/fs/fscache/fsc-internal.h
@@ -0,0 +1,376 @@
+/* Internal definitions for FS-Cache
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/*
+ * Lock order, in the order in which multiple locks should be obtained:
+ * - fscache_addremove_sem
+ * - cookie->lock
+ * - cookie->parent->lock
+ * - cache->object_list_lock
+ * - object->lock
+ * - object->parent->lock
+ * - fscache_thread_lock
+ *
+ */
+
+#include <linux/fscache-cache.h>
+#include <linux/sched.h>
+
+#define FSCACHE_MIN_THREADS 4
+#define FSCACHE_MAX_THREADS 32
+
+/*
+ * fsc-cache.c
+ */
+extern struct list_head fscache_cache_list;
+extern struct rw_semaphore fscache_addremove_sem;
+extern wait_queue_head_t fscache_clearance_wq;
+
+extern struct fscache_cache *fscache_select_cache_for_object(
+ struct fscache_cookie *);
+
+/*
+ * fsc-cookie.c
+ */
+extern struct kmem_cache *fscache_cookie_jar;
+
+extern void fscache_cookie_init_once(void *, struct kmem_cache *,
+ unsigned long);
+extern void __fscache_cookie_put(struct fscache_cookie *);
+
+/*
+ * fsc-fsdef.c
+ */
+extern struct fscache_cookie fscache_fsdef_index;
+extern struct fscache_cookie_def fscache_fsdef_netfs_def;
+
+/*
+ * fsc-main.c
+ */
+extern unsigned fscache_defer_lookup;
+extern unsigned fscache_defer_create;
+extern unsigned fscache_debug;
+extern struct kset fscache_kset;
+
+extern int fscache_wait_bit(void *);
+extern int fscache_wait_bit_interruptible(void *);
+
+/*
+ * fsc-object.c
+ */
+extern void fscache_object_state_machine(struct fscache_object *);
+extern void fscache_withdrawing_object(struct fscache_cache *,
+ struct fscache_object *);
+
+/*
+ * fsc-stats.c
+ */
+#ifdef CONFIG_FSCACHE_STATS
+extern atomic_t fscache_n_ops_processed[FSCACHE_MAX_THREADS];
+extern atomic_t fscache_n_objs_processed[FSCACHE_MAX_THREADS];
+
+extern atomic_t fscache_n_op_pend;
+extern atomic_t fscache_n_op_run;
+extern atomic_t fscache_n_op_enqueue;
+extern atomic_t fscache_n_op_requeue;
+extern atomic_t fscache_n_op_release;
+
+extern atomic_t fscache_n_attr_changed;
+extern atomic_t fscache_n_attr_changed_ok;
+extern atomic_t fscache_n_attr_changed_nobufs;
+extern atomic_t fscache_n_attr_changed_nomem;
+extern atomic_t fscache_n_attr_changed_calls;
+
+extern atomic_t fscache_n_allocs;
+extern atomic_t fscache_n_allocs_ok;
+extern atomic_t fscache_n_allocs_wait;
+extern atomic_t fscache_n_allocs_nobufs;
+extern atomic_t fscache_n_alloc_ops;
+extern atomic_t fscache_n_alloc_op_waits;
+
+extern atomic_t fscache_n_retrievals;
+extern atomic_t fscache_n_retrievals_ok;
+extern atomic_t fscache_n_retrievals_wait;
+extern atomic_t fscache_n_retrievals_nodata;
+extern atomic_t fscache_n_retrievals_nobufs;
+extern atomic_t fscache_n_retrievals_intr;
+extern atomic_t fscache_n_retrievals_nomem;
+extern atomic_t fscache_n_retrieval_ops;
+extern atomic_t fscache_n_retrieval_op_waits;
+
+extern atomic_t fscache_n_stores;
+extern atomic_t fscache_n_stores_ok;
+extern atomic_t fscache_n_stores_again;
+extern atomic_t fscache_n_stores_nobufs;
+extern atomic_t fscache_n_stores_oom;
+extern atomic_t fscache_n_store_ops;
+extern atomic_t fscache_n_store_calls;
+
+extern atomic_t fscache_n_marks;
+extern atomic_t fscache_n_uncaches;
+
+extern atomic_t fscache_n_acquires;
+extern atomic_t fscache_n_acquires_null;
+extern atomic_t fscache_n_acquires_no_cache;
+extern atomic_t fscache_n_acquires_ok;
+extern atomic_t fscache_n_acquires_nobufs;
+extern atomic_t fscache_n_acquires_oom;
+
+extern atomic_t fscache_n_updates;
+extern atomic_t fscache_n_updates_null;
+extern atomic_t fscache_n_updates_run;
+
+extern atomic_t fscache_n_relinquishes;
+extern atomic_t fscache_n_relinquishes_null;
+extern atomic_t fscache_n_relinquishes_waitcrt;
+
+extern atomic_t fscache_n_cookie_index;
+extern atomic_t fscache_n_cookie_data;
+extern atomic_t fscache_n_cookie_special;
+
+extern atomic_t fscache_n_object_alloc;
+extern atomic_t fscache_n_object_no_alloc;
+extern atomic_t fscache_n_object_lookups;
+extern atomic_t fscache_n_object_lookups_negative;
+extern atomic_t fscache_n_object_lookups_positive;
+extern atomic_t fscache_n_object_created;
+extern atomic_t fscache_n_object_avail;
+extern atomic_t fscache_n_object_boosted;
+
+static inline void fscache_stat(atomic_t *stat)
+{
+ atomic_inc(stat);
+}
+#else
+
+#define fscache_stat(stat) do {} while(0)
+#endif
+
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+extern atomic_t fscache_obj_instantiate_histogram[HZ];
+extern atomic_t fscache_objs_histogram[HZ];
+extern atomic_t fscache_ops_histogram[HZ];
+extern atomic_t fscache_retrieval_delay_histogram[HZ];
+extern atomic_t fscache_retrieval_histogram[HZ];
+
+static inline void fscache_hist(atomic_t histogram[], unsigned long start_jif)
+{
+ unsigned long jif = jiffies - start_jif;
+ if (jif >= HZ)
+ jif = HZ - 1;
+ atomic_inc(&histogram[jif]);
+}
+
+#else
+#define fscache_hist(hist, start_jif) do {} while(0)
+#endif
+
+#ifdef CONFIG_FSCACHE_PROC
+extern int __init fscache_proc_init(void);
+extern void fscache_proc_cleanup(void);
+#else
+#define fscache_proc_init() (0)
+#define fscache_proc_cleanup() do {} while(0)
+#endif
+
+/*
+ * fsc-threads.c
+ */
+extern void fscache_start_operations(struct fscache_object *);
+extern void fscache_enqueue_object(struct fscache_object *);
+extern void fscache_enqueue_dependents(struct fscache_object *);
+extern void fscache_dequeue_object(struct fscache_object *);
+extern void fscache_boost_object(struct fscache_object *);
+extern int fscache_init_threads(void);
+extern void fscache_kill_threads(void);
+
+/*
+ * raise an event on an object
+ * - if the event is not masked for that object, then the object is
+ * queued for attention by the thread pool.
+ */
+static inline void fscache_raise_event(struct fscache_object *object,
+ unsigned event)
+{
+ if (!test_and_set_bit(event, &object->events) &&
+ test_bit(event, &object->event_mask))
+ fscache_enqueue_object(object);
+}
+
+/*
+ * drop a reference to a cookie
+ */
+static inline void fscache_cookie_put(struct fscache_cookie *cookie)
+{
+ BUG_ON(atomic_read(&cookie->usage) <= 0);
+ if (atomic_dec_and_test(&cookie->usage))
+ __fscache_cookie_put(cookie);
+}
+
+/*
+ * get an extra reference to a netfs retrieval context
+ */
+static inline
+void *fscache_get_context(struct fscache_cookie *cookie, void *context)
+{
+ if (cookie->def->get_context)
+ cookie->def->get_context(cookie->netfs_data, context);
+ return context;
+}
+
+/*
+ * release a reference to a netfs retrieval context
+ */
+static inline
+void fscache_put_context(struct fscache_cookie *cookie, void *context)
+{
+ if (cookie->def->put_context)
+ cookie->def->put_context(cookie->netfs_data, context);
+}
+
+/*****************************************************************************/
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT,...) \
+ printk("[%-6.6s] "FMT"\n", current->comm ,##__VA_ARGS__)
+
+/* make sure we maintain the format strings, even when debugging is disabled */
+static inline __attribute__((format(printf, 1, 2)))
+void _dbprintk(const char *fmt, ...)
+{
+}
+
+#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __FUNCTION__ ,##__VA_ARGS__)
+#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __FUNCTION__ ,##__VA_ARGS__)
+#define kdebug(FMT, ...) dbgprintk(FMT ,##__VA_ARGS__)
+
+#define kjournal(FMT, ...) _dbprintk(FMT ,##__VA_ARGS__)
+
+#ifdef __KDEBUG
+#define _enter(FMT, ...) kenter(FMT ,##__VA_ARGS__)
+#define _leave(FMT, ...) kleave(FMT ,##__VA_ARGS__)
+#define _debug(FMT, ...) kdebug(FMT ,##__VA_ARGS__)
+
+#elif defined(CONFIG_FSCACHE_DEBUG)
+#define _enter(FMT, ...) \
+do { \
+ if (__do_kdebug(ENTER)) \
+ kenter(FMT ,##__VA_ARGS__); \
+} while(0)
+
+#define _leave(FMT, ...) \
+do { \
+ if (__do_kdebug(LEAVE)) \
+ kleave(FMT ,##__VA_ARGS__); \
+} while(0)
+
+#define _debug(FMT, ...) \
+do { \
+ if (__do_kdebug(DEBUG)) \
+ kdebug(FMT ,##__VA_ARGS__); \
+} while(0)
+
+#else
+#define _enter(FMT,...) _dbprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define _leave(FMT,...) _dbprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define _debug(FMT,...) _dbprintk(FMT ,##__VA_ARGS__)
+#endif
+
+/*
+ * determine whether a particular optional debugging point should be logged
+ * - we need to go through three steps to persuade cpp to correctly join the
+ * shorthand in FSCACHE_DEBUG_LEVEL with its prefix
+ */
+#define ____do_kdebug(LEVEL, POINT) \
+ unlikely((fscache_debug & (FSCACHE_POINT_##POINT << (FSCACHE_DEBUG_ ## LEVEL * 3))))
+#define ___do_kdebug(LEVEL, POINT) \
+ ____do_kdebug(LEVEL, POINT)
+#define __do_kdebug(POINT) \
+ ___do_kdebug(FSCACHE_DEBUG_LEVEL, POINT)
+
+#define FSCACHE_DEBUG_CACHE 0
+#define FSCACHE_DEBUG_COOKIE 1
+#define FSCACHE_DEBUG_PAGE 2
+#define FSCACHE_DEBUG_THREAD 3
+
+#define FSCACHE_POINT_ENTER 1
+#define FSCACHE_POINT_LEAVE 2
+#define FSCACHE_POINT_DEBUG 4
+
+#ifndef FSCACHE_DEBUG_LEVEL
+#define FSCACHE_DEBUG_LEVEL CACHE
+#endif
+
+/*
+ * assertions
+ */
+#if 1 // defined(__KDEBUGALL)
+
+#define ASSERT(X) \
+do { \
+ if (unlikely(!(X))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "FS-Cache: Assertion failed\n"); \
+ BUG(); \
+ } \
+} while(0)
+
+#define ASSERTCMP(X, OP, Y) \
+do { \
+ if (unlikely(!((X) OP (Y)))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "FS-Cache: Assertion failed\n"); \
+ printk(KERN_ERR "%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while(0)
+
+#define ASSERTIF(C, X) \
+do { \
+ if (unlikely((C) && !(X))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "FS-Cache: Assertion failed\n"); \
+ BUG(); \
+ } \
+} while(0)
+
+#define ASSERTIFCMP(C, X, OP, Y) \
+do { \
+ if (unlikely((C) && !((X) OP (Y)))) { \
+ printk(KERN_ERR "\n"); \
+ printk(KERN_ERR "FS-Cache: Assertion failed\n"); \
+ printk(KERN_ERR "%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while(0)
+
+#else
+
+#define ASSERT(X) \
+do { \
+} while(0)
+
+#define ASSERTCMP(X, OP, Y) \
+do { \
+} while(0)
+
+#define ASSERTIF(C, X) \
+do { \
+} while(0)
+
+#define ASSERTIFCMP(C, X, OP, Y) \
+do { \
+} while(0)
+
+#endif /* assert or not */
diff --git a/fs/fscache/fsc-main.c b/fs/fscache/fsc-main.c
new file mode 100644
index 0000000..fc19117
--- /dev/null
+++ b/fs/fscache/fsc-main.c
@@ -0,0 +1,127 @@
+/* General filesystem local caching manager
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include "fsc-internal.h"
+
+MODULE_DESCRIPTION("FS Cache Manager");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+unsigned fscache_defer_lookup = 1;
+module_param_named(defer_lookup, fscache_defer_lookup, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_defer_lookup, "Defer cookie lookup to background thread");
+
+unsigned fscache_defer_create = 1;
+module_param_named(defer_create, fscache_defer_create, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_defer_create, "Defer cookie creation to background thread");
+
+unsigned fscache_debug;
+module_param_named(debug, fscache_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_debug, "FS-Cache debugging mask");
+
+static struct sysfs_ops fscache_sysfs_ops = {
+};
+
+static struct kobj_type fscache_ktype = {
+ .sysfs_ops = &fscache_sysfs_ops,
+};
+
+struct kset fscache_kset = {
+ .kobj.name = "fscache",
+ .kobj.kset = &fs_subsys,
+ .ktype = &fscache_ktype,
+};
+
+/*
+ * initialise the fs caching module
+ */
+static int __init fscache_init(void)
+{
+ int ret;
+
+ ret = fscache_proc_init();
+ if (ret < 0)
+ goto error_proc;
+
+ ret = fscache_init_threads();
+ if (ret < 0)
+ goto error_init_threads;
+
+ fscache_cookie_jar = kmem_cache_create("fscache_cookie_jar",
+ sizeof(struct fscache_cookie),
+ 0,
+ 0,
+ fscache_cookie_init_once);
+ if (!fscache_cookie_jar) {
+ printk(KERN_NOTICE
+ "FS-Cache: Failed to allocate a cookie jar\n");
+ ret = -ENOMEM;
+ goto error_cookie_jar;
+ }
+
+ ret = kset_register(&fscache_kset);
+ if (ret < 0)
+ goto error_kset;
+
+ printk(KERN_NOTICE "FS-Cache: Loaded\n");
+ return 0;
+
+error_kset:
+ kmem_cache_destroy(fscache_cookie_jar);
+error_cookie_jar:
+ fscache_kill_threads();
+error_init_threads:
+ fscache_proc_cleanup();
+error_proc:
+ return ret;
+}
+
+fs_initcall(fscache_init);
+
+/*
+ * clean up on module removal
+ */
+static void __exit fscache_exit(void)
+{
+ _enter("");
+
+ kset_unregister(&fscache_kset);
+ kmem_cache_destroy(fscache_cookie_jar);
+ fscache_kill_threads();
+ fscache_proc_cleanup();
+ printk(KERN_NOTICE "FS-Cache: Unloaded\n");
+}
+
+module_exit(fscache_exit);
+
+/*
+ * wait_on_bit() sleep function for uninterruptible waiting
+ */
+int fscache_wait_bit(void *flags)
+{
+ schedule();
+ return 0;
+}
+
+/*
+ * wait_on_bit() sleep function for interruptible waiting
+ */
+int fscache_wait_bit_interruptible(void *flags)
+{
+ schedule();
+ return signal_pending(current);
+}
diff --git a/fs/fscache/fsc-manage.c b/fs/fscache/fsc-manage.c
new file mode 100644
index 0000000..42fef97
--- /dev/null
+++ b/fs/fscache/fsc-manage.c
@@ -0,0 +1,260 @@
+/* Manage cache objects
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL PAGE
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
+#include "fsc-internal.h"
+
+/*
+ * wait for a slow operation to float to the front of the op queue, then
+ * effectively suspend processing on this object till the calling process
+ * context has completed the operation
+ * - this prevents kfscached being hogged by a really slow op
+ */
+#if 0
+static int fscache_slow_op(struct fscache_object *object,
+ struct fscache_operation *op)
+{
+ int ret;
+
+ kenter("");
+
+ if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
+ wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
+
+ if (test_bit(FSCACHE_OP_IN_PROGRESS, &op->flags))
+ ret = -EINPROGRESS;
+ else
+ ret = 0;
+
+ kleave(" = %d", ret);
+ return ret;
+}
+#endif
+
+/*
+ * reserve space for an object
+ */
+int __fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
+{
+#if 0
+ struct fscache_operation *op;
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,%llu,", cookie, size);
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+
+ op = kzalloc(sizeof(*op), GFP_KERNEL);
+ if (!op) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ op->processor = fscache_slow_op;
+ __set_bit(FSCACHE_OP_WAITING, &op->flags);
+ __set_bit(FSCACHE_OP_IN_PROGRESS, &op->flags);
+
+ spin_lock(&cookie->lock);
+
+ ret = -ENOBUFS;
+ if (hlist_empty(&cookie->backing_objects))
+ goto error;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+ if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+ goto error;
+
+ ret = -EOPNOTSUPP;
+ if (!object->cache->ops->reserve_space)
+ goto error;
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * write attempts on pages
+ */
+ ret = -ENOBUFS;
+ if (!object->cache->ops->grab_object(object))
+ goto error;
+ list_add_tail(&op->link, &object->operations);
+
+ if (!test_and_set_bit(FSCACHE_OBJECT_BUSY, &object->flags))
+ queue_work(fscache_workqueue, &object->work);
+
+ spin_unlock(&cookie->lock);
+
+ /* wait for the operation queue to be whittled down */
+ if (wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
+ fscache_wait_bit_interruptible, TASK_INTERRUPTIBLE
+ ) < 0) {
+ /* we were interrupted */
+ spin_lock(&cookie->lock);
+ if (!test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
+ queue_work(fscache_workqueue, &object->work);
+ if (!test_and_clear_bit(FSCACHE_OP_IN_PROGRESS, &op->flags))
+ BUG();
+ ret = -ERESTARTSYS;
+ goto out_locked;
+ }
+
+ /* okay - we're now in a position to make a reservation */
+ ret = -ENOBUFS;
+ if (test_bit(FSCACHE_OBJECT_ABORTED, &object->flags) ||
+ test_bit(FSCACHE_IOERROR, &object->cache->flags))
+ goto out;
+
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ goto out;
+
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->reserve_space(object, size);
+
+out:
+ spin_lock(&cookie->lock);
+out_locked:
+ if (!test_and_clear_bit(FSCACHE_OP_IN_PROGRESS, &op->flags))
+ BUG();
+ queue_work(fscache_workqueue, &object->work);
+ spin_unlock(&cookie->lock);
+
+ kleave(" = %d", ret);
+ return ret;
+
+error:
+ spin_unlock(&cookie->lock);
+ kfree(op);
+ kleave(" = %d", ret);
+ return ret;
+#endif
+ kleave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+EXPORT_SYMBOL(__fscache_reserve_space);
+
+#if 0
+/*
+ * pin an object into the cache
+ */
+int __fscache_pin_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p", cookie);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * read and write attempts on pages
+ */
+ down_write(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+ goto out;
+
+ if (!object->cache->ops->pin_object) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ /* prevent the cache from being withdrawn */
+ if (fscache_operation_lock(object)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->pin_object(object);
+
+ object->cache->ops->put_object(object);
+ }
+
+ fscache_operation_unlock(object);
+ }
+ }
+
+out:
+ up_write(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+}
+
+EXPORT_SYMBOL(__fscache_pin_cookie);
+
+/*
+ * unpin an object into the cache
+ */
+void __fscache_unpin_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p", cookie);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" [no obj]");
+ return;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * read and write attempts on pages
+ */
+ down_write(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and unpin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+ goto out;
+
+ if (!object->cache->ops->unpin_object)
+ goto out;
+
+ /* prevent the cache from being withdrawn */
+ if (fscache_operation_lock(object)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ object->cache->ops->unpin_object(object);
+
+ object->cache->ops->put_object(object);
+ }
+
+ fscache_operation_unlock(object);
+ }
+ }
+
+out:
+ up_write(&cookie->sem);
+ _leave("");
+}
+
+EXPORT_SYMBOL(__fscache_unpin_cookie);
+#endif
diff --git a/fs/fscache/fsc-object.c b/fs/fscache/fsc-object.c
new file mode 100644
index 0000000..e25d377
--- /dev/null
+++ b/fs/fscache/fsc-object.c
@@ -0,0 +1,586 @@
+/* FS-Cache object state machine handler
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL COOKIE
+#include <linux/module.h>
+#include "fsc-internal.h"
+
+const char *fscache_object_states[] = {
+ [FSCACHE_OBJECT_INIT] = "OBJECT_INIT",
+ [FSCACHE_OBJECT_LOOKING_UP] = "OBJECT_LOOKING_UP",
+ [FSCACHE_OBJECT_CREATING] = "OBJECT_CREATING",
+ [FSCACHE_OBJECT_AVAILABLE] = "OBJECT_AVAILABLE",
+ [FSCACHE_OBJECT_ACTIVE] = "OBJECT_ACTIVE",
+ [FSCACHE_OBJECT_UPDATING] = "OBJECT_UPDATING",
+ [FSCACHE_OBJECT_DYING] = "OBJECT_DYING",
+ [FSCACHE_OBJECT_LC_DYING] = "OBJECT_LC_DYING",
+ [FSCACHE_OBJECT_ABORT_INIT] = "OBJECT_ABORT_INIT",
+ [FSCACHE_OBJECT_RELEASING] = "OBJECT_RELEASING",
+ [FSCACHE_OBJECT_RECYCLING] = "OBJECT_RECYCLING",
+ [FSCACHE_OBJECT_WITHDRAWING] = "OBJECT_WITHDRAWING",
+ [FSCACHE_OBJECT_DEAD] = "OBJECT_DEAD",
+};
+
+EXPORT_SYMBOL(fscache_object_states);
+
+static void fscache_check_object_parent(struct fscache_object *);
+static void fscache_lookup_object(struct fscache_object *);
+static void fscache_object_available(struct fscache_object *);
+static void fscache_release_object(struct fscache_object *);
+static void fscache_withdraw_object(struct fscache_object *);
+
+/*
+ * object state machine processor
+ * - initiates parent lookup
+ * - does object lookup
+ * - does object creation
+ * - does object recycling and retirement
+ * - does object withdrawal
+ */
+void fscache_object_state_machine(struct fscache_object *object)
+{
+ ASSERT(object != NULL);
+
+ _enter("{OBJ%x,%s,%lx}",
+ object->debug_id, fscache_object_states[object->state],
+ object->events);
+
+ switch (object->state) {
+ case FSCACHE_OBJECT_INIT:
+ object->event_mask =
+ ULONG_MAX & ~(1 << FSCACHE_OBJECT_EV_CLEARED);
+ fscache_check_object_parent(object);
+ goto done;
+
+ case FSCACHE_OBJECT_LOOKING_UP:
+ fscache_lookup_object(object);
+ goto lookup_transit;
+
+ case FSCACHE_OBJECT_CREATING:
+ fscache_lookup_object(object);
+ goto lookup_transit;
+
+ case FSCACHE_OBJECT_AVAILABLE:
+ fscache_object_available(object);
+ goto active_transit;
+
+ case FSCACHE_OBJECT_ACTIVE:
+ goto active_transit;
+
+ case FSCACHE_OBJECT_UPDATING:
+ clear_bit(FSCACHE_OBJECT_EV_UPDATE, &object->events);
+ fscache_stat(&fscache_n_updates_run);
+ object->cache->ops->update_object(object);
+ goto active_transit;
+
+ /* object started dying during lookup */
+ case FSCACHE_OBJECT_LC_DYING:
+ object->event_mask &= ~(1 << FSCACHE_OBJECT_EV_UPDATE);
+ object->cache->ops->lookup_complete(object);
+
+ object->state = FSCACHE_OBJECT_DYING;
+ spin_lock(&object->lock);
+ if (test_and_clear_bit(FSCACHE_COOKIE_CREATING,
+ &object->cookie->flags))
+ wake_up_bit(&object->cookie->flags,
+ FSCACHE_COOKIE_CREATING);
+ spin_unlock(&object->lock);
+
+ /* wait for completion of active accessors */
+ case FSCACHE_OBJECT_DYING:
+ dying:
+ clear_bit(FSCACHE_OBJECT_EV_CLEARED, &object->events);
+ spin_lock(&object->lock);
+ _debug("dying OBJ%x {%d,%d}",
+ object->debug_id, object->n_ops, object->n_children);
+ if (object->n_ops == 0 && object->n_children == 0) {
+ object->event_mask &=
+ ~(1 << FSCACHE_OBJECT_EV_CLEARED);
+ object->event_mask |=
+ (1 << FSCACHE_OBJECT_EV_WITHDRAW) |
+ (1 << FSCACHE_OBJECT_EV_RETIRE) |
+ (1 << FSCACHE_OBJECT_EV_RELEASE) |
+ (1 << FSCACHE_OBJECT_EV_ERROR);
+ } else {
+ object->event_mask &=
+ ~((1 << FSCACHE_OBJECT_EV_WITHDRAW) |
+ (1 << FSCACHE_OBJECT_EV_RETIRE) |
+ (1 << FSCACHE_OBJECT_EV_RELEASE) |
+ (1 << FSCACHE_OBJECT_EV_ERROR));
+ object->event_mask |=
+ 1 << FSCACHE_OBJECT_EV_CLEARED;
+ }
+ spin_unlock(&object->lock);
+ fscache_enqueue_dependents(object);
+ goto terminal_transit;
+
+ /* handle an abort during the init state */
+ case FSCACHE_OBJECT_ABORT_INIT:
+ _debug("handle abort init %lx", object->events);
+ object->event_mask &= ~(1 << FSCACHE_OBJECT_EV_UPDATE);
+ fscache_dequeue_object(object);
+
+ object->state = FSCACHE_OBJECT_DYING;
+ spin_lock(&object->lock);
+ if (test_and_clear_bit(FSCACHE_COOKIE_CREATING,
+ &object->cookie->flags))
+ wake_up_bit(&object->cookie->flags,
+ FSCACHE_COOKIE_CREATING);
+ spin_unlock(&object->lock);
+ goto dying;
+
+ case FSCACHE_OBJECT_RELEASING:
+ case FSCACHE_OBJECT_RECYCLING:
+ object->event_mask &=
+ ~((1 << FSCACHE_OBJECT_EV_WITHDRAW) |
+ (1 << FSCACHE_OBJECT_EV_RETIRE) |
+ (1 << FSCACHE_OBJECT_EV_RELEASE) |
+ (1 << FSCACHE_OBJECT_EV_ERROR));
+ fscache_release_object(object);
+ object->state = FSCACHE_OBJECT_DEAD;
+ goto terminal_transit;
+
+ case FSCACHE_OBJECT_WITHDRAWING:
+ object->event_mask &=
+ ~((1 << FSCACHE_OBJECT_EV_WITHDRAW) |
+ (1 << FSCACHE_OBJECT_EV_RETIRE) |
+ (1 << FSCACHE_OBJECT_EV_RELEASE) |
+ (1 << FSCACHE_OBJECT_EV_ERROR));
+ fscache_withdraw_object(object);
+ object->state = FSCACHE_OBJECT_DEAD;
+ goto terminal_transit;
+
+ case FSCACHE_OBJECT_DEAD:
+ printk(KERN_ERR "FS-Cache:"
+ " Unexpected event in dead state %lx\n",
+ object->events & object->event_mask);
+ BUG();
+
+ default:
+ printk(KERN_ERR "FS-Cache: Unknown object state %u\n",
+ object->state);
+ BUG();
+ }
+
+ /* determine the transition from a lookup state */
+lookup_transit:
+ switch (fls(object->events & object->event_mask) - 1) {
+ case FSCACHE_OBJECT_EV_WITHDRAW:
+ case FSCACHE_OBJECT_EV_RETIRE:
+ case FSCACHE_OBJECT_EV_RELEASE:
+ case FSCACHE_OBJECT_EV_ERROR:
+ object->state = FSCACHE_OBJECT_LC_DYING;
+ break;
+ case FSCACHE_OBJECT_EV_REQUEUE:
+ break;
+ case -1:
+ break; /* sleep until event */
+ default:
+ goto unsupported_event;
+ }
+ goto done;
+
+ /* determine the transition from an active state */
+active_transit:
+ switch (fls(object->events & object->event_mask) - 1) {
+ case FSCACHE_OBJECT_EV_WITHDRAW:
+ case FSCACHE_OBJECT_EV_RETIRE:
+ case FSCACHE_OBJECT_EV_RELEASE:
+ case FSCACHE_OBJECT_EV_ERROR:
+ object->state = FSCACHE_OBJECT_DYING;
+ break;
+ case FSCACHE_OBJECT_EV_UPDATE:
+ object->state = FSCACHE_OBJECT_UPDATING;
+ break;
+ case -1:
+ object->state = FSCACHE_OBJECT_ACTIVE;
+ break; /* sleep until event */
+ default:
+ goto unsupported_event;
+ }
+ goto done;
+
+ /* determine the transition from a terminal state */
+terminal_transit:
+ switch (fls(object->events & object->event_mask) - 1) {
+ case FSCACHE_OBJECT_EV_WITHDRAW:
+ object->state = FSCACHE_OBJECT_WITHDRAWING;
+ break;
+ case FSCACHE_OBJECT_EV_RETIRE:
+ object->state = FSCACHE_OBJECT_RECYCLING;
+ break;
+ case FSCACHE_OBJECT_EV_RELEASE:
+ object->state = FSCACHE_OBJECT_RELEASING;
+ break;
+ case FSCACHE_OBJECT_EV_ERROR:
+ object->state = FSCACHE_OBJECT_WITHDRAWING;
+ break;
+ case FSCACHE_OBJECT_EV_CLEARED:
+ object->state = FSCACHE_OBJECT_DYING;
+ break;
+ case -1:
+ break; /* sleep until event */
+ default:
+ goto unsupported_event;
+ }
+
+done:
+ _leave(" [->%s]", fscache_object_states[object->state]);
+ return;
+
+unsupported_event:
+ printk(KERN_ERR "FS-Cache:"
+ " Unsupported event %lx [mask %lx] in state %s\n",
+ object->events, object->event_mask,
+ fscache_object_states[object->state]);
+ BUG();
+}
+
+/*
+ * check the specified object's parent to see if we can make use of it
+ * immediately to do a creation
+ * - we may need to start the process of creating a parent and we need to wait
+ * for the parent's lookup and creation to complete if it's not there yet
+ * - an object's cookie is pinned until we clear FSCACHE_COOKIE_CREATING on the
+ * leaf-most cookies of the object and all its children
+ */
+static void fscache_check_object_parent(struct fscache_object *object)
+{
+ struct fscache_object *parent;
+
+ _enter("");
+ ASSERT(object->cookie != NULL);
+ ASSERT(object->cookie->parent != NULL);
+ ASSERT(list_empty(&object->work_link));
+
+ if (object->events & ((1 << FSCACHE_OBJECT_EV_ERROR) |
+ (1 << FSCACHE_OBJECT_EV_RELEASE) |
+ (1 << FSCACHE_OBJECT_EV_RETIRE) |
+ (1 << FSCACHE_OBJECT_EV_WITHDRAW))) {
+ _debug("abort init %lx", object->events);
+ object->state = FSCACHE_OBJECT_ABORT_INIT;
+ return;
+ }
+
+ spin_lock(&object->cookie->lock);
+ spin_lock_nested(&object->cookie->parent->lock, 1);
+
+ parent = object->parent;
+ if (!parent) {
+ _debug("no parent");
+ set_bit(FSCACHE_OBJECT_EV_WITHDRAW, &object->events);
+ } else {
+ spin_lock_nested(&parent->lock, 1);
+ _debug("parent %s", fscache_object_states[parent->state]);
+
+ if (parent->state >= FSCACHE_OBJECT_DYING) {
+ _debug("bad parent");
+ set_bit(FSCACHE_OBJECT_EV_WITHDRAW, &object->events);
+ } else if (parent->state < FSCACHE_OBJECT_AVAILABLE) {
+ _debug("wait");
+ object->cache->ops->grab_object(object);
+ set_bit(FSCACHE_OBJECT_WAITING, &object->flags);
+ list_add(&object->work_link, &parent->dependents);
+ atomic_inc(&object->cache->thread_usage);
+ if (parent->state == FSCACHE_OBJECT_INIT)
+ fscache_enqueue_object(parent);
+ } else {
+ _debug("go");
+ parent->n_ops++;
+ object->lookup_jif = jiffies;
+ object->state = FSCACHE_OBJECT_LOOKING_UP;
+ set_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ }
+
+ spin_unlock(&parent->lock);
+ }
+
+ spin_unlock(&object->cookie->parent->lock);
+ spin_unlock(&object->cookie->lock);
+ _leave("");
+}
+
+/*
+ * look up an object in its cache
+ * - we hold an "access lock" on the parent object, so the parent object cannot
+ * be withdrawn by either party till we've finished
+ * - an object's cookie is pinned until we clear FSCACHE_COOKIE_CREATING on the
+ * leaf-most cookies of the object and all its children
+ */
+static void fscache_lookup_object(struct fscache_object *object)
+{
+ struct fscache_cookie *cookie = object->cookie;
+ struct fscache_object *parent;
+
+ _enter("");
+
+ parent = object->parent;
+ ASSERT(parent != NULL);
+ ASSERTCMP(parent->n_ops, >, 0);
+
+ /* make sure the parent is still available */
+ ASSERTCMP(parent->state, >=, FSCACHE_OBJECT_AVAILABLE);
+
+ if (parent->state >= FSCACHE_OBJECT_DYING ||
+ test_bit(FSCACHE_IOERROR, &object->cache->flags)) {
+ _debug("unavailable");
+ set_bit(FSCACHE_OBJECT_EV_WITHDRAW, &object->events);
+ _leave("");
+ return;
+ }
+
+ _debug("LOOKUP \"%s/%s\" in \"%s\"",
+ parent->cookie->def->name, cookie->def->name,
+ object->cache->tag->name);
+
+ fscache_stat(&fscache_n_object_lookups);
+ object->cache->ops->lookup_object(object);
+
+ if (test_bit(FSCACHE_OBJECT_EV_ERROR, &object->events))
+ set_bit(FSCACHE_COOKIE_UNAVAILABLE, &cookie->flags);
+
+ _leave("");
+}
+
+/**
+ * fscache_object_lookup_negative - Note negative cookie lookup
+ * @object: Object pointing to cookie to mark
+ *
+ * Note negative lookup, permitting those waiting to read data from an already
+ * existing backing object to continue as there's no data for them to read.
+ */
+void fscache_object_lookup_negative(struct fscache_object *object)
+{
+ struct fscache_cookie *cookie = object->cookie;
+
+ _enter("{OBJ%x,%s}",
+ object->debug_id, fscache_object_states[object->state]);
+
+ if (object->state == FSCACHE_OBJECT_LOOKING_UP) {
+ fscache_stat(&fscache_n_object_lookups_negative);
+
+ /* transit here to allow write requests to begin stacking up
+ * and read requests to begin returning ENODATA */
+ object->state = FSCACHE_OBJECT_CREATING;
+
+ set_bit(FSCACHE_COOKIE_PENDING_FILL, &cookie->flags);
+ set_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
+
+ _debug("wake up lookup %p", &cookie->flags);
+ smp_mb__before_clear_bit();
+ clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
+ clear_bit(FSCACHE_OBJECT_SYNC, &object->flags);
+ set_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ } else {
+ ASSERTCMP(object->state, ==, FSCACHE_OBJECT_CREATING);
+ }
+
+ _leave("");
+}
+
+EXPORT_SYMBOL(fscache_object_lookup_negative);
+
+/**
+ * fscache_obtained_object - Note successful object lookup or creation
+ * @object: Object pointing to cookie to mark
+ *
+ * Note successful lookup and/or creation, permitting those waiting to write
+ * data to a backing object to continue.
+ *
+ * Note that after calling this, an object's cookie may be relinquished by the
+ * netfs, and so must be accessed with object lock held.
+ */
+void fscache_obtained_object(struct fscache_object *object)
+{
+ struct fscache_cookie *cookie = object->cookie;
+
+ _enter("{OBJ%x,%s}",
+ object->debug_id, fscache_object_states[object->state]);
+
+ /* if we were still looking up, then we must have a positive lookup
+ * result, in which case there may be data available */
+ if (object->state == FSCACHE_OBJECT_LOOKING_UP) {
+ fscache_stat(&fscache_n_object_lookups_positive);
+
+ clear_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
+
+ object->state = FSCACHE_OBJECT_AVAILABLE;
+
+ smp_mb__before_clear_bit();
+ clear_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP);
+ clear_bit(FSCACHE_OBJECT_SYNC, &object->flags);
+ set_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ } else {
+ ASSERTCMP(object->state, ==, FSCACHE_OBJECT_CREATING);
+ fscache_stat(&fscache_n_object_created);
+
+ object->state = FSCACHE_OBJECT_AVAILABLE;
+ set_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ smp_wmb();
+ }
+
+ if (test_and_clear_bit(FSCACHE_COOKIE_CREATING, &cookie->flags))
+ wake_up_bit(&cookie->flags, FSCACHE_COOKIE_CREATING);
+
+ _leave("");
+}
+
+EXPORT_SYMBOL(fscache_obtained_object);
+
+/*
+ * handle an object that has just become available
+ */
+static void fscache_object_available(struct fscache_object *object)
+{
+ _enter("{OBJ%x}", object->debug_id);
+
+ spin_lock(&object->lock);
+ if (test_and_clear_bit(FSCACHE_COOKIE_CREATING, &object->cookie->flags))
+ wake_up_bit(&object->cookie->flags, FSCACHE_COOKIE_CREATING);
+
+ spin_lock_nested(&object->parent->lock, 1);
+ object->parent->n_ops--;
+ if (object->parent->n_ops == 0)
+ fscache_raise_event(object->parent, FSCACHE_OBJECT_EV_CLEARED);
+ spin_unlock(&object->parent->lock);
+
+ if (object->n_ops > 0)
+ fscache_start_operations(object);
+ spin_unlock(&object->lock);
+
+ object->cache->ops->lookup_complete(object);
+ fscache_enqueue_dependents(object);
+
+ if (test_bit(FSCACHE_OBJECT_BOOSTED, &object->flags))
+ fscache_hist(fscache_obj_instantiate_histogram,
+ object->lookup_jif);
+ fscache_stat(&fscache_n_object_avail);
+
+ _leave("");
+}
+
+/*
+ * drop an object's attachments
+ */
+static void fscache_drop_object(struct fscache_object *object)
+{
+ struct fscache_object *parent = object->parent;
+ struct fscache_cache *cache = object->cache;
+
+ _enter("{OBJ%x,%d}", object->debug_id, object->n_children);
+
+ spin_lock(&cache->object_list_lock);
+ list_del_init(&object->cache_link);
+ spin_unlock(&cache->object_list_lock);
+
+ cache->ops->drop_object(object);
+
+ if (parent) {
+ _debug("release parent OBJ%x {%d}",
+ parent->debug_id, parent->n_children);
+
+ spin_lock(&parent->lock);
+ parent->n_children--;
+ if (parent->n_children == 0)
+ fscache_raise_event(parent, FSCACHE_OBJECT_EV_CLEARED);
+ spin_unlock(&parent->lock);
+ object->parent = NULL;
+ }
+
+ /* this just shifts the object release to the fscache thread pool */
+ object->cache->ops->put_object(object);
+}
+
+/*
+ * release or recycle an object
+ */
+static void fscache_release_object(struct fscache_object *object)
+{
+ _enter("");
+
+ fscache_drop_object(object);
+
+ _leave("");
+}
+
+/*
+ * withdraw an object from active service
+ */
+static void fscache_withdraw_object(struct fscache_object *object)
+{
+ struct fscache_cookie *cookie;
+ bool detached;
+
+ _enter("");
+
+ spin_lock(&object->lock);
+ cookie = object->cookie;
+ if (cookie) {
+ /* need to get the cookie lock before the object lock, starting
+ * from the object pointer */
+ atomic_inc(&cookie->usage);
+ spin_unlock(&object->lock);
+
+ detached = false;
+ spin_lock(&cookie->lock);
+ spin_lock(&object->lock);
+
+ if (object->cookie == cookie) {
+ hlist_del_init(&object->cookie_link);
+ object->cookie = NULL;
+ detached = true;
+ }
+ spin_unlock(&cookie->lock);
+ fscache_cookie_put(cookie);
+ if (detached)
+ fscache_cookie_put(cookie);
+ }
+
+ spin_unlock(&object->lock);
+
+ fscache_drop_object(object);
+
+ _leave("");
+}
+
+/*
+ * withdraw an object from active service at the behest of the cache
+ * - need break the links to a cached object cookie
+ * - called under two situations:
+ * (1) recycler decides to reclaim an in-use object
+ * (2) a cache is unmounted
+ * - have to take care as the cookie can be being relinquished by the netfs
+ * simultaneously
+ * - the object is pinned by the caller holding a refcount on it
+ */
+void fscache_withdrawing_object(struct fscache_cache *cache,
+ struct fscache_object *object)
+{
+ bool enqueue = false;
+
+ _enter(",OBJ%x", object->debug_id);
+
+ spin_lock(&object->lock);
+ if (object->state < FSCACHE_OBJECT_WITHDRAWING) {
+ object->state = FSCACHE_OBJECT_WITHDRAWING;
+ enqueue = true;
+ }
+ spin_unlock(&object->lock);
+
+ if (enqueue)
+ fscache_enqueue_object(object);
+
+ _leave("");
+}
diff --git a/fs/fscache/fsc-page.c b/fs/fscache/fsc-page.c
new file mode 100644
index 0000000..481e55a
--- /dev/null
+++ b/fs/fscache/fsc-page.c
@@ -0,0 +1,863 @@
+/* Cache page management and data I/O routines
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL PAGE
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
+#include "fsc-internal.h"
+
+/*
+ * start an op running
+ */
+static void fscache_run_op(struct fscache_object *object,
+ struct fscache_operation *op)
+{
+ object->n_in_progress++;
+ clear_bit(FSCACHE_OP_WAITING, &op->flags);
+ if (op->processor)
+ fscache_enqueue_operation(op);
+ fscache_stat(&fscache_n_op_run);
+}
+
+/*
+ * submit an exclusive operation for an object
+ * - other ops are excluded from running simultaneously with this one
+ */
+static int fscache_submit_exclusive_op(struct fscache_object *object,
+ struct fscache_operation *op)
+{
+ int ret;
+
+ spin_lock(&object->lock);
+ ASSERTCMP(object->n_ops, >=, object->n_in_progress);
+ ASSERTCMP(object->n_ops, >=, object->n_exclusive);
+
+ ret = -ENOBUFS;
+ if (fscache_object_is_active(object)) {
+ op->object = object;
+ object->n_ops++;
+ object->n_exclusive++; /* reads and writes must wait */
+
+ if (object->n_ops > 0) {
+ list_add_tail(&op->work_link, &object->pending_ops);
+ fscache_stat(&fscache_n_op_pend);
+ } else {
+ ASSERTCMP(object->n_in_progress, ==, 0);
+ fscache_run_op(object, op);
+ }
+
+ /* need to issue a new write op after this */
+ clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
+ ret = 0;
+ } else if (object->state == FSCACHE_OBJECT_CREATING) {
+ op->object = object;
+ object->n_ops++;
+ object->n_exclusive++; /* reads and writes must wait */
+ list_add_tail(&op->work_link, &object->pending_ops);
+ fscache_stat(&fscache_n_op_pend);
+ ret = 0;
+ } else {
+ /* not allowed to submit ops in any other state */
+ BUG();
+ }
+
+ spin_unlock(&object->lock);
+ return ret;
+}
+
+/*
+ * submit an operation for an object
+ */
+static int fscache_submit_op(struct fscache_object *object,
+ struct fscache_operation *op)
+{
+ int ret;
+
+ spin_lock(&object->lock);
+ ASSERTCMP(object->n_ops, >=, object->n_in_progress);
+ ASSERTCMP(object->n_ops, >=, object->n_exclusive);
+
+ if (fscache_object_is_active(object)) {
+ op->object = object;
+ object->n_ops++;
+
+ if (object->n_exclusive > 0) {
+ ASSERTCMP(object->n_in_progress, >, 0);
+ list_add_tail(&op->work_link, &object->pending_ops);
+ fscache_stat(&fscache_n_op_pend);
+ } else {
+ ASSERT(list_empty(&object->pending_ops));
+ ASSERTCMP(object->n_exclusive, ==, 0);
+ fscache_run_op(object, op);
+ }
+ ret = 0;
+ } else if (object->state == FSCACHE_OBJECT_CREATING) {
+ op->object = object;
+ object->n_ops++;
+ list_add_tail(&op->work_link, &object->pending_ops);
+ fscache_stat(&fscache_n_op_pend);
+ ret = 0;
+ } else if (!test_bit(FSCACHE_IOERROR, &object->cache->flags)) {
+ static bool once_only;
+ if (!once_only) {
+ once_only = true;
+ kdebug("no submit [OBJ%x %s]",
+ object->debug_id,
+ fscache_object_states[object->state]);
+ dump_stack();
+ }
+ ret = -ENOBUFS;
+ } else {
+ ret = -ENOBUFS;
+ }
+
+ spin_unlock(&object->lock);
+ return ret;
+}
+
+/*
+ * queue an object for withdrawal on error, aborting all following asynchronous
+ * operations
+ */
+static void fscache_abort_object(struct fscache_object *object)
+{
+ fscache_raise_event(object, FSCACHE_OBJECT_EV_ERROR);
+}
+
+/*
+ * actually apply the changed attributes to a cache object
+ */
+static void fscache_attr_changed_op(struct fscache_operation *op)
+{
+ struct fscache_object *object = op->object;
+
+ _enter("{OBJ%x}", object->debug_id);
+
+ fscache_stat(&fscache_n_attr_changed_calls);
+
+ if (fscache_object_is_active(object) &&
+ object->cache->ops->attr_changed(object) < 0)
+ fscache_abort_object(object);
+
+ _leave("");
+}
+
+/*
+ * notification that the attributes on an object have changed
+ */
+int __fscache_attr_changed(struct fscache_cookie *cookie)
+{
+ struct fscache_operation *op;
+ struct fscache_object *object;
+
+ _enter("%p", cookie);
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+
+ fscache_stat(&fscache_n_attr_changed);
+
+ op = kzalloc(sizeof(*op), GFP_KERNEL);
+ if (!op) {
+ fscache_stat(&fscache_n_attr_changed_nomem);
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ op->flags |= 1 << FSCACHE_OP_EXCLUSIVE;
+ op->processor = fscache_attr_changed_op;
+
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (fscache_submit_exclusive_op(object, op) < 0)
+ goto nobufs;
+ spin_unlock(&cookie->lock);
+ fscache_stat(&fscache_n_attr_changed_ok);
+ _leave(" = 0");
+ return 0;
+
+nobufs:
+ spin_unlock(&cookie->lock);
+ kfree(op);
+ fscache_stat(&fscache_n_attr_changed_nobufs);
+ _leave(" = %d", -ENOBUFS);
+ return -ENOBUFS;
+}
+
+EXPORT_SYMBOL(__fscache_attr_changed);
+
+/*
+ * release a retrieval op reference
+ */
+static void fscache_release_retrieval_op(struct fscache_operation *_op)
+{
+ struct fscache_retrieval *op =
+ container_of(_op, struct fscache_retrieval, op);
+
+ _enter("");
+
+ fscache_hist(fscache_retrieval_histogram, op->start_time);
+ if (op->context)
+ fscache_put_context(op->op.object->cookie, op->context);
+
+ _leave("");
+}
+
+/*
+ * allocate a retrieval op
+ */
+static struct fscache_retrieval *fscache_alloc_retrieval(
+ struct address_space *mapping,
+ fscache_rw_complete_t end_io_func,
+ void *context)
+{
+ struct fscache_retrieval *op;
+
+ /* allocate a retrieval operation and attempt to submit it */
+ op = kzalloc(sizeof(*op), GFP_NOIO);
+ if (!op) {
+ fscache_stat(&fscache_n_retrievals_nomem);
+ return NULL;
+ }
+
+ atomic_set(&op->op.usage, 1);
+ op->op.flags = (1 << FSCACHE_OP_WAITING) | (1 << FSCACHE_OP_SYNC);
+ op->op.release = fscache_release_retrieval_op;
+ op->mapping = mapping;
+ op->end_io_func = end_io_func;
+ op->context = context;
+ op->start_time = jiffies;
+ INIT_LIST_HEAD(&op->op.work_link);
+ INIT_LIST_HEAD(&op->to_do);
+ return op;
+}
+
+/*
+ * wait for a deferred lookup to complete
+ */
+static int fscache_wait_for_deferred_lookup(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ unsigned long jif;
+
+ if (!test_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags))
+ return 0;
+
+ fscache_stat(&fscache_n_retrievals_wait);
+
+ /* tell the cookie dispatcher that this cookie is now being waited
+ * upon for lookup completion */
+ spin_lock(&cookie->lock);
+ if (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+ if (!test_and_set_bit(FSCACHE_OBJECT_SYNC, &object->flags))
+ fscache_boost_object(object);
+ }
+ spin_unlock(&cookie->lock);
+
+ jif = jiffies;
+ if (wait_on_bit(&cookie->flags, FSCACHE_COOKIE_LOOKING_UP,
+ fscache_wait_bit_interruptible,
+ TASK_INTERRUPTIBLE) != 0) {
+ fscache_stat(&fscache_n_retrievals_intr);
+ return -ERESTARTSYS;
+ }
+
+ ASSERT(!test_bit(FSCACHE_COOKIE_LOOKING_UP, &cookie->flags));
+
+ smp_rmb();
+ fscache_hist(fscache_retrieval_delay_histogram, jif);
+ return 0;
+}
+
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - we return:
+ * -ENOMEM - out of memory, nothing done
+ * -ERESTARTSYS - interrupted
+ * -ENOBUFS - no backing object available in which to cache the block
+ * -ENODATA - no data available in the backing object for this block
+ * 0 - dispatched a read - it'll call end_io_func() when finished
+ */
+int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *context,
+ gfp_t gfp)
+{
+ struct fscache_retrieval *op;
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,%p,,,", cookie, page);
+
+ fscache_stat(&fscache_n_retrievals);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs;
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+ ASSERTCMP(page, !=, NULL);
+
+ if (fscache_wait_for_deferred_lookup(cookie) < 0)
+ return -ERESTARTSYS;
+
+ op = fscache_alloc_retrieval(page->mapping, end_io_func, context);
+ if (!op) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs_unlock;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ ASSERTCMP(object->state, >, FSCACHE_COOKIE_LOOKING_UP);
+
+ if (fscache_submit_op(object, &op->op) < 0)
+ goto nobufs_unlock;
+ spin_unlock(&cookie->lock);
+
+ fscache_stat(&fscache_n_retrieval_ops);
+
+ /* pin the netfs read context in case we need to do the actual netfs
+ * read because we've encountered a cache read failure */
+ fscache_get_context(object->cookie, op->context);
+
+ if (test_bit(FSCACHE_OP_WAITING, &op->op.flags)) {
+ _debug(">>> WT");
+ fscache_stat(&fscache_n_retrieval_op_waits);
+ wait_on_bit(&op->op.flags, FSCACHE_OP_WAITING,
+ fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ _debug("<<< GO");
+ }
+
+ /* ask the cache to honour the operation */
+ if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
+ ret = object->cache->ops->allocate_page(op, page, gfp);
+ if (ret == 0)
+ ret = -ENODATA;
+ } else {
+ ret = object->cache->ops->read_or_alloc_page(op, page, gfp);
+ }
+
+ if (ret == -ENOMEM)
+ fscache_stat(&fscache_n_retrievals_nomem);
+ else if (ret == -ERESTARTSYS)
+ fscache_stat(&fscache_n_retrievals_intr);
+ else if (ret == -ENODATA)
+ fscache_stat(&fscache_n_retrievals_nodata);
+ else if (ret < 0)
+ fscache_stat(&fscache_n_retrievals_nobufs);
+ else
+ fscache_stat(&fscache_n_retrievals_ok);
+
+ fscache_put_retrieval(op);
+ _leave(" = %d", ret);
+ return ret;
+
+nobufs_unlock:
+ spin_unlock(&cookie->lock);
+ kfree(op);
+nobufs:
+ fscache_stat(&fscache_n_retrievals_nobufs);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+EXPORT_SYMBOL(__fscache_read_or_alloc_page);
+
+/*
+ * read a list of page from the cache or allocate a block in which to store
+ * them
+ * - we return:
+ * -ENOMEM - out of memory, some pages may be being read
+ * -ERESTARTSYS - interrupted, some pages may be being read
+ * -ENOBUFS - no backing object or space available in which to cache any
+ * pages not being read
+ * -ENODATA - no data available in the backing object for some or all of
+ * the pages
+ * 0 - dispatched a read on all pages
+ *
+ * end_io_func() will be called for each page read from the cache as it is
+ * finishes being read
+ *
+ * any pages for which a read is dispatched will be removed from pages and
+ * nr_pages
+ */
+int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *context,
+ gfp_t gfp)
+{
+ fscache_pages_retrieval_func_t func;
+ struct fscache_retrieval *op;
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,,%d,,,", cookie, *nr_pages);
+
+ fscache_stat(&fscache_n_retrievals);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs;
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+ ASSERTCMP(*nr_pages, >, 0);
+ ASSERT(!list_empty(pages));
+
+ if (fscache_wait_for_deferred_lookup(cookie) < 0)
+ return -ERESTARTSYS;
+
+ op = fscache_alloc_retrieval(mapping, end_io_func, context);
+ if (!op)
+ return -ENOMEM;
+
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs_unlock;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (fscache_submit_op(object, &op->op) < 0)
+ goto nobufs_unlock;
+ spin_unlock(&cookie->lock);
+
+ fscache_stat(&fscache_n_retrieval_ops);
+
+ /* pin the netfs read context in case we need to do the actual netfs
+ * read because we've encountered a cache read failure */
+ fscache_get_context(object->cookie, op->context);
+
+ if (test_bit(FSCACHE_OP_WAITING, &op->op.flags)) {
+ _debug(">>> WT");
+ fscache_stat(&fscache_n_retrieval_op_waits);
+ wait_on_bit(&op->op.flags, FSCACHE_OP_WAITING,
+ fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ _debug("<<< GO");
+ }
+
+ /* ask the cache to honour the operation */
+ if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags))
+ func = object->cache->ops->allocate_pages;
+ else
+ func = object->cache->ops->read_or_alloc_pages;
+ ret = func(op, pages, nr_pages, gfp);
+
+ if (ret == -ENOMEM)
+ fscache_stat(&fscache_n_retrievals_nomem);
+ else if (ret == -ERESTARTSYS)
+ fscache_stat(&fscache_n_retrievals_intr);
+ else if (ret == -ENODATA)
+ fscache_stat(&fscache_n_retrievals_nodata);
+ else if (ret < 0)
+ fscache_stat(&fscache_n_retrievals_nobufs);
+ else
+ fscache_stat(&fscache_n_retrievals_ok);
+
+ fscache_put_retrieval(op);
+ _leave(" = %d", ret);
+ return ret;
+
+nobufs_unlock:
+ spin_unlock(&cookie->lock);
+ kfree(op);
+nobufs:
+ fscache_stat(&fscache_n_retrievals_nobufs);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+EXPORT_SYMBOL(__fscache_read_or_alloc_pages);
+
+/*
+ * allocate a block in the cache on which to store a page
+ * - we return:
+ * -ENOMEM - out of memory, nothing done
+ * -ERESTARTSYS - interrupted
+ * -ENOBUFS - no backing object available in which to cache the block
+ * 0 - block allocated
+ */
+int __fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp)
+{
+ struct fscache_retrieval *op;
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,%p,,,", cookie, page);
+
+ fscache_stat(&fscache_n_allocs);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs;
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+ ASSERTCMP(page, !=, NULL);
+
+ if (fscache_wait_for_deferred_lookup(cookie) < 0)
+ return -ERESTARTSYS;
+
+ op = fscache_alloc_retrieval(page->mapping, NULL, NULL);
+ if (!op)
+ return -ENOMEM;
+
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs_unlock;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (fscache_submit_op(object, &op->op) < 0)
+ goto nobufs_unlock;
+ spin_unlock(&cookie->lock);
+
+ fscache_stat(&fscache_n_alloc_ops);
+
+ if (test_bit(FSCACHE_OP_WAITING, &op->op.flags)) {
+ _debug(">>> WT");
+ fscache_stat(&fscache_n_alloc_op_waits);
+ wait_on_bit(&op->op.flags, FSCACHE_OP_WAITING,
+ fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ _debug("<<< GO");
+ }
+
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->allocate_page(op, page, gfp);
+
+ if (ret < 0)
+ fscache_stat(&fscache_n_allocs_nobufs);
+ else
+ fscache_stat(&fscache_n_allocs_ok);
+
+ fscache_put_retrieval(op);
+ _leave(" = %d", ret);
+ return ret;
+
+nobufs_unlock:
+ spin_unlock(&cookie->lock);
+ kfree(op);
+nobufs:
+ fscache_stat(&fscache_n_allocs_nobufs);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+EXPORT_SYMBOL(__fscache_alloc_page);
+
+/*
+ * store a page in the cache in the background
+ */
+static void fscache_write_op(struct fscache_operation *_op)
+{
+ struct fscache_storage *op =
+ container_of(_op, struct fscache_storage, op);
+ struct fscache_object *object = op->op.object;
+ struct page *page;
+ unsigned n;
+ void *results[1];
+ int ret;
+
+ _enter("");
+
+ if (!fscache_object_is_active(object)) {
+ fscache_put_operation(&op->op);
+ _leave("");
+ return;
+ }
+
+ fscache_stat(&fscache_n_store_calls);
+
+ /* find a page to store */
+ spin_lock(&object->lock);
+
+ page = NULL;
+ n = radix_tree_gang_lookup(&object->stores, results, 0, 1);
+ if (n == 1) {
+ page = results[0];
+ _debug("gang %d [%lx]", n, page->index);
+ if (page->index <= op->store_limit)
+ radix_tree_delete(&object->stores, page->index);
+ else
+ goto superseded;
+ } else {
+ goto superseded;
+ }
+
+ spin_unlock(&object->lock);
+
+ if (page) {
+ ret = object->cache->ops->write_page(op, page);
+ end_page_fscache_write(page);
+ page_cache_release(page);
+ if (ret < 0) {
+ fscache_abort_object(object);
+ fscache_put_operation(&op->op);
+ } else {
+ fscache_enqueue_operation(&op->op);
+ }
+ }
+
+ _leave("");
+ return;
+
+superseded:
+ /* this writer is going away and there aren't any more things to
+ * write */
+ _debug("cease");
+ clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags);
+ spin_unlock(&object->lock);
+ _leave("");
+}
+
+/*
+ * request a page be stored in the cache
+ * - returns:
+ * -ENOMEM - out of memory, nothing done
+ * -ENOBUFS - no backing object available in which to cache the page
+ * 0 - dispatched a write - it'll call end_io_func() when finished
+ *
+ * if the cookie still has a backing object at this point, that object can be
+ * in one of a few states with respect to storage processing:
+ *
+ * (1) negative lookup, object not yet created (FSCACHE_COOKIE_CREATING is
+ * set)
+ *
+ * (a) no writes yet (set FSCACHE_COOKIE_PENDING_FILL and queue deferred
+ * fill op)
+ *
+ * (b) writes deferred till post-creation (mark page for writing and
+ * return immediately)
+ *
+ * (2) negative lookup, object created, initial fill being made from netfs
+ * (FSCACHE_COOKIE_INITIAL_FILL is set)
+ *
+ * (a) fill point not yet reached this page (mark page for writing and
+ * return)
+ *
+ * (b) fill point passed this page (queue op to store this page)
+ *
+ * (3) object extant (queue op to store this page)
+ *
+ * any other state is invalid
+ */
+int __fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp)
+{
+ struct fscache_storage *op;
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,%x,", cookie, (u32) page->flags);
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+ ASSERT(PageFsCache(page));
+
+ fscache_stat(&fscache_n_stores);
+
+ op = kzalloc(sizeof(*op), GFP_NOIO);
+ if (!op) {
+ fscache_stat(&fscache_n_stores_oom);
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ ret = radix_tree_preload(gfp & ~__GFP_HIGHMEM);
+ if (ret < 0) {
+ kfree(op);
+ fscache_stat(&fscache_n_stores_oom);
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ ret = -ENOBUFS;
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+ if (test_bit(FSCACHE_IOERROR, &object->cache->flags))
+ goto nobufs;
+
+ /* add the page to the pending-storage radix tree on the backing
+ * object */
+ spin_lock(&object->lock);
+
+ _debug("store limit %llx", (unsigned long long) object->store_limit);
+
+ ret = radix_tree_insert(&object->stores, page->index, page);
+ if (ret < 0) {
+ if (ret == -EEXIST)
+ goto already_queued;
+ _debug("insert failed %d", ret);
+ ret = -ENOBUFS;
+ goto nobufs_unlock;
+ }
+
+ page_cache_get(page);
+ if (TestSetPageFsCacheWrite(page))
+ BUG();
+
+ /* we only want one writer at a time, but we do need to queue new
+ * writers after exclusive ops */
+ if (test_and_set_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags))
+ goto already_pending;
+
+ spin_unlock(&object->lock);
+
+ op->op.processor = fscache_write_op;
+ op->store_limit = object->store_limit;
+ INIT_LIST_HEAD(&op->op.work_link);
+
+ if (fscache_submit_op(object, &op->op) < 0) {
+ radix_tree_delete(&object->stores, page->index);
+ end_page_fscache_write(page);
+ page_cache_release(page);
+ ret = -ENOBUFS;
+ goto nobufs;
+ }
+
+ spin_unlock(&cookie->lock);
+ radix_tree_preload_end();
+ fscache_stat(&fscache_n_store_ops);
+ fscache_stat(&fscache_n_stores_ok);
+ _leave(" = 0");
+ return 0;
+
+already_queued:
+ fscache_stat(&fscache_n_stores_again);
+already_pending:
+ spin_unlock(&object->lock);
+ spin_unlock(&cookie->lock);
+ radix_tree_preload_end();
+ kfree(op);
+ fscache_stat(&fscache_n_stores_ok);
+ _leave(" = 0");
+ return 0;
+
+nobufs_unlock:
+ spin_unlock(&object->lock);
+nobufs:
+ spin_unlock(&cookie->lock);
+ radix_tree_preload_end();
+ kfree(op);
+ fscache_stat(&fscache_n_stores_nobufs);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+EXPORT_SYMBOL(__fscache_write_page);
+
+/*
+ * remove a page from the cache
+ */
+void __fscache_uncache_page(struct fscache_cookie *cookie, struct page *page)
+{
+ struct fscache_object *object;
+
+ _enter(",%p", page);
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+ ASSERTCMP(page, !=, NULL);
+
+ fscache_stat(&fscache_n_uncaches);
+
+ /* cache withdrawal may beat us to it */
+ if (!PageFsCache(page))
+ goto done;
+
+ /* get the object */
+ spin_lock(&cookie->lock);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto done_unlock;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* there might now be stuff on disk we could read */
+ clear_bit(FSCACHE_COOKIE_NO_DATA_YET, &cookie->flags);
+
+ /* only invoke the cache backend if we managed to mark the page
+ * uncached here; this deals with synchronisation vs withdrawal */
+ if (TestClearPageFsCache(page) &&
+ object->cache->ops->uncache_page) {
+ /* the cache backend releases the cookie lock */
+ object->cache->ops->uncache_page(object, page);
+ goto done;
+ }
+
+done_unlock:
+ spin_unlock(&cookie->lock);
+done:
+ _leave("");
+}
+
+EXPORT_SYMBOL(__fscache_uncache_page);
+
+/**
+ * fscache_mark_pages_cached - Mark pages as being cached
+ * @op: The retrieval op pages are being marked for
+ * @pagevec: The pages to be marked
+ *
+ * Mark a bunch of netfs pages as being cached. After this is called,
+ * the netfs must call fscache_uncache_page() to remove the mark.
+ */
+void fscache_mark_pages_cached(struct fscache_retrieval *op,
+ struct pagevec *pagevec)
+{
+ struct fscache_cookie *cookie = op->op.object->cookie;
+ unsigned long loop;
+
+#ifdef CONFIG_FSCACHE_STATS
+ atomic_add(pagevec->nr, &fscache_n_marks);
+#endif
+
+ for (loop = 0; loop < pagevec->nr; loop++) {
+ struct page *page = pagevec->pages[loop];
+
+ _debug("- mark %p{%lx}", page, page->index);
+ if (TestSetPageFsCache(page)) {
+ static bool once_only = false;
+ if (!once_only) {
+ once_only = true;
+ printk(KERN_WARNING "FS-Cache:"
+ " Cookie type %s marked page %lx"
+ " multiple times\n",
+ cookie->def->name, page->index);
+ }
+ }
+ }
+
+ if (cookie->def->mark_pages_cached)
+ cookie->def->mark_pages_cached(cookie->netfs_data,
+ op->mapping, pagevec);
+ pagevec_reinit(pagevec);
+}
+
+EXPORT_SYMBOL(fscache_mark_pages_cached);
diff --git a/fs/fscache/fsc-proc.c b/fs/fscache/fsc-proc.c
new file mode 100644
index 0000000..1ac9d5d
--- /dev/null
+++ b/fs/fscache/fsc-proc.c
@@ -0,0 +1,399 @@
+/* FS-Cache statistics viewing interface
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL THREAD
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include "fsc-internal.h"
+
+struct fscache_proc {
+ unsigned nlines;
+ const struct seq_operations *ops;
+};
+
+struct proc_dir_entry *proc_fscache;
+
+EXPORT_SYMBOL(proc_fscache);
+
+static int fscache_proc_open(struct inode *inode, struct file *file);
+static void *fscache_proc_start(struct seq_file *m, loff_t *pos);
+static void fscache_proc_stop(struct seq_file *m, void *v);
+static void *fscache_proc_next(struct seq_file *m, void *v, loff_t *pos);
+
+static const struct file_operations fscache_proc_fops = {
+ .open = fscache_proc_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+#ifdef CONFIG_FSCACHE_STATS
+static int fscache_stats_show(struct seq_file *m, void *v);
+static int fscache_pool_show(struct seq_file *m, void *v);
+
+static const struct seq_operations fscache_stats_ops = {
+ .start = fscache_proc_start,
+ .stop = fscache_proc_stop,
+ .next = fscache_proc_next,
+ .show = fscache_stats_show,
+};
+
+static const struct fscache_proc fscache_stats = {
+ .nlines = 16,
+ .ops = &fscache_stats_ops,
+};
+
+static const struct seq_operations fscache_pool_ops = {
+ .start = fscache_proc_start,
+ .stop = fscache_proc_stop,
+ .next = fscache_proc_next,
+ .show = fscache_pool_show,
+};
+
+static const struct fscache_proc fscache_pool = {
+ .nlines = FSCACHE_MAX_THREADS + 2,
+ .ops = &fscache_pool_ops,
+};
+#endif
+
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+static int fscache_histogram_show(struct seq_file *m, void *v);
+
+static const struct seq_operations fscache_histogram_ops = {
+ .start = fscache_proc_start,
+ .stop = fscache_proc_stop,
+ .next = fscache_proc_next,
+ .show = fscache_histogram_show,
+};
+
+static const struct fscache_proc fscache_histogram = {
+ .nlines = HZ + 1,
+ .ops = &fscache_histogram_ops,
+};
+#endif
+
+#define FSC_DESC(SELECT, N) ((void *) (unsigned long) (((SELECT) << 16) | (N)))
+
+/*
+ * initialise the /proc/fs/fscache/ directory
+ */
+int __init fscache_proc_init(void)
+{
+ struct proc_dir_entry *p;
+
+ _enter("");
+
+ proc_fscache = proc_mkdir("fs/fscache", NULL);
+ if (!proc_fscache)
+ goto error_dir;
+ proc_fscache->owner = THIS_MODULE;
+
+#ifdef CONFIG_FSCACHE_STATS
+ p = create_proc_entry("stats", 0, proc_fscache);
+ if (!p)
+ goto error_stats;
+ p->proc_fops = &fscache_proc_fops;
+ p->owner = THIS_MODULE;
+ p->data = (void *) &fscache_stats;
+
+ p = create_proc_entry("pool", 0, proc_fscache);
+ if (!p)
+ goto error_pool;
+ p->proc_fops = &fscache_proc_fops;
+ p->owner = THIS_MODULE;
+ p->data = (void *) &fscache_pool;
+#endif
+
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+ p = create_proc_entry("histogram", 0, proc_fscache);
+ if (!p)
+ goto error_histogram;
+ p->proc_fops = &fscache_proc_fops;
+ p->owner = THIS_MODULE;
+ p->data = (void *) &fscache_histogram;
+#endif
+
+ _leave(" = 0");
+ return 0;
+
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+error_histogram:
+#endif
+#ifdef CONFIG_FSCACHE_STATS
+ remove_proc_entry("pool", proc_fscache);
+error_pool:
+ remove_proc_entry("stats", proc_fscache);
+error_stats:
+#endif
+ remove_proc_entry("fs/fscache", NULL);
+error_dir:
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+}
+
+/*
+ * clean up the /proc/fs/fscache/ directory
+ */
+void fscache_proc_cleanup(void)
+{
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+ remove_proc_entry("histogram", proc_fscache);
+#endif
+#ifdef CONFIG_FSCACHE_STATS
+ remove_proc_entry("pool", proc_fscache);
+ remove_proc_entry("stats", proc_fscache);
+#endif
+ remove_proc_entry("fs/fscache", NULL);
+}
+
+/*
+ * open "/proc/fs/fscache/XXX" which provide statistics summaries
+ */
+static int fscache_proc_open(struct inode *inode, struct file *file)
+{
+ const struct fscache_proc *proc = PDE(inode)->data;
+ struct seq_file *m;
+ int ret;
+
+ ret = seq_open(file, proc->ops);
+ if (ret == 0) {
+ m = file->private_data;
+ m->private = (void *) proc;
+ }
+ return ret;
+}
+
+/*
+ * set up the iterator to start reading from the first line
+ */
+static void *fscache_proc_start(struct seq_file *m, loff_t *_pos)
+{
+ if (*_pos == 0)
+ *_pos = 1;
+ return (void *)(unsigned long) *_pos;
+}
+
+/*
+ * move to the next line
+ */
+static void *fscache_proc_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ const struct fscache_proc *proc = m->private;
+
+ (*pos)++;
+ return *pos > proc->nlines ? NULL : (void *)(unsigned long) *pos;
+}
+
+/*
+ * clean up after reading
+ */
+static void fscache_proc_stop(struct seq_file *m, void *v)
+{
+}
+
+#ifdef CONFIG_FSCACHE_STATS
+/*
+ * display the general statistics
+ */
+static int fscache_stats_show(struct seq_file *m, void *v)
+{
+ unsigned long line = (unsigned long) v;
+
+ switch (line) {
+ case 1:
+ seq_puts(m, "FS-Cache statistics\n");
+ break;
+
+ case 2:
+ seq_printf(m, "Cookies: idx=%u dat=%u spc=%u\n",
+ atomic_read(&fscache_n_cookie_index),
+ atomic_read(&fscache_n_cookie_data),
+ atomic_read(&fscache_n_cookie_special));
+ break;
+
+ case 3:
+ seq_printf(m, "Objects: alc=%u nal=%u avl=%u\n",
+ atomic_read(&fscache_n_object_alloc),
+ atomic_read(&fscache_n_object_no_alloc),
+ atomic_read(&fscache_n_object_avail));
+ break;
+
+ case 4:
+ seq_printf(m, "Pages : mrk=%u unc=%u\n",
+ atomic_read(&fscache_n_marks),
+ atomic_read(&fscache_n_uncaches));
+ break;
+
+ case 5:
+ seq_printf(m, "Acquire: n=%u nul=%u noc=%u ok=%u nbf=%u"
+ " oom=%u\n",
+ atomic_read(&fscache_n_acquires),
+ atomic_read(&fscache_n_acquires_null),
+ atomic_read(&fscache_n_acquires_no_cache),
+ atomic_read(&fscache_n_acquires_ok),
+ atomic_read(&fscache_n_acquires_nobufs),
+ atomic_read(&fscache_n_acquires_oom));
+ break;
+
+ case 6:
+ seq_printf(m, "Lookups: n=%u neg=%u pos=%u crt=%u bst=%u\n",
+ atomic_read(&fscache_n_object_lookups),
+ atomic_read(&fscache_n_object_lookups_negative),
+ atomic_read(&fscache_n_object_lookups_positive),
+ atomic_read(&fscache_n_object_created),
+ atomic_read(&fscache_n_object_boosted));
+ break;
+
+ case 7:
+ seq_printf(m, "Updates: n=%u nul=%u run=%u\n",
+ atomic_read(&fscache_n_updates),
+ atomic_read(&fscache_n_updates_null),
+ atomic_read(&fscache_n_updates_run));
+ break;
+
+ case 8:
+ seq_printf(m, "Relinqs: n=%u nul=%u wcr=%u\n",
+ atomic_read(&fscache_n_relinquishes),
+ atomic_read(&fscache_n_relinquishes_null),
+ atomic_read(&fscache_n_relinquishes_waitcrt));
+ break;
+
+ case 9:
+ seq_printf(m, "AttrChg: n=%u ok=%u nbf=%u oom=%u run=%u\n",
+ atomic_read(&fscache_n_attr_changed),
+ atomic_read(&fscache_n_attr_changed_ok),
+ atomic_read(&fscache_n_attr_changed_nobufs),
+ atomic_read(&fscache_n_attr_changed_nomem),
+ atomic_read(&fscache_n_attr_changed_calls));
+ break;
+
+ case 10:
+ seq_printf(m, "Allocs : n=%u ok=%u wt=%u nbf=%u\n",
+ atomic_read(&fscache_n_allocs),
+ atomic_read(&fscache_n_allocs_ok),
+ atomic_read(&fscache_n_allocs_wait),
+ atomic_read(&fscache_n_allocs_nobufs));
+ break;
+ case 11:
+ seq_printf(m, "Allocs : ops=%u owt=%u\n",
+ atomic_read(&fscache_n_alloc_ops),
+ atomic_read(&fscache_n_alloc_op_waits));
+ break;
+
+ case 12:
+ seq_printf(m, "Retrvls: n=%u ok=%u wt=%u nod=%u nbf=%u"
+ " int=%u oom=%u\n",
+ atomic_read(&fscache_n_retrievals),
+ atomic_read(&fscache_n_retrievals_ok),
+ atomic_read(&fscache_n_retrievals_wait),
+ atomic_read(&fscache_n_retrievals_nodata),
+ atomic_read(&fscache_n_retrievals_nobufs),
+ atomic_read(&fscache_n_retrievals_intr),
+ atomic_read(&fscache_n_retrievals_nomem));
+ break;
+ case 13:
+ seq_printf(m, "Retrvls: ops=%u owt=%u\n",
+ atomic_read(&fscache_n_retrieval_ops),
+ atomic_read(&fscache_n_retrieval_op_waits));
+ break;
+
+ case 14:
+ seq_printf(m, "Stores : n=%u ok=%u agn=%u nbf=%u oom=%u\n",
+ atomic_read(&fscache_n_stores),
+ atomic_read(&fscache_n_stores_ok),
+ atomic_read(&fscache_n_stores_again),
+ atomic_read(&fscache_n_stores_nobufs),
+ atomic_read(&fscache_n_stores_oom));
+ break;
+ case 15:
+ seq_printf(m, "Stores : ops=%u run=%u\n",
+ atomic_read(&fscache_n_store_ops),
+ atomic_read(&fscache_n_store_calls));
+ break;
+
+ case 16:
+ seq_printf(m, "Ops : pend=%u run=%u enq=%u req=%u rel=%u\n",
+ atomic_read(&fscache_n_op_pend),
+ atomic_read(&fscache_n_op_run),
+ atomic_read(&fscache_n_op_enqueue),
+ atomic_read(&fscache_n_op_requeue),
+ atomic_read(&fscache_n_op_release));
+ break;
+
+ default:
+ break;
+ }
+ return 0;
+}
+
+/*
+ * display the per-pool-thread statistics
+ */
+static int fscache_pool_show(struct seq_file *m, void *v)
+{
+ unsigned line = (unsigned long) v;
+ unsigned x, y;
+
+ switch (line) {
+ case 1:
+ seq_puts(m, "THREAD OPERS RUN OBJS RUN\n");
+ return 0;
+ case 2:
+ seq_puts(m, "======= ========= =========\n");
+ return 0;
+ case 3 ... FSCACHE_MAX_THREADS + 2:
+ x = atomic_read(&fscache_n_ops_processed[line - 3]);
+ y = atomic_read(&fscache_n_objs_processed[line - 3]);
+ if (x != 0 || y != 0)
+ seq_printf(m, "kfsc%02ud %9u %9u\n", line - 3, x, y);
+ default:
+ return 0;
+ }
+}
+#endif
+
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+/*
+ * display the time-taken histogram
+ */
+static int fscache_histogram_show(struct seq_file *m, void *v)
+{
+ unsigned long index;
+ unsigned n[5], t;
+
+ switch ((unsigned long) v) {
+ case 1:
+ seq_puts(m, "JIFS SECS OBJ INST OP RUNS OBJ RUNS "
+ " RETRV DLY RETRIEVLS\n");
+ return 0;
+ case 2:
+ seq_puts(m, "===== ===== ========= ========= ========="
+ " ========= =========\n");
+ return 0;
+ default:
+ index = (unsigned long) v - 3;
+ n[0] = atomic_read(&fscache_obj_instantiate_histogram[index]);
+ n[1] = atomic_read(&fscache_ops_histogram[index]);
+ n[2] = atomic_read(&fscache_objs_histogram[index]);
+ n[3] = atomic_read(&fscache_retrieval_delay_histogram[index]);
+ n[4] = atomic_read(&fscache_retrieval_histogram[index]);
+ if (!(n[0] | n[1] | n[2] | n[3] | n[4]))
+ return 0;
+
+ t = (index * 1000) / HZ;
+
+ seq_printf(m, "%4lu 0.%03u %9u %9u %9u %9u %9u\n",
+ index, t, n[0], n[1], n[2], n[3], n[4]);
+ return 0;
+ }
+}
+#endif
diff --git a/fs/fscache/fsc-stats.c b/fs/fscache/fsc-stats.c
new file mode 100644
index 0000000..15adbda
--- /dev/null
+++ b/fs/fscache/fsc-stats.c
@@ -0,0 +1,103 @@
+/* FS-Cache statistics
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL THREAD
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include "fsc-internal.h"
+
+/*
+ * operation counters
+ */
+#ifdef CONFIG_FSCACHE_STATS
+atomic_t fscache_n_op_pend;
+atomic_t fscache_n_op_run;
+atomic_t fscache_n_op_enqueue;
+atomic_t fscache_n_op_requeue;
+atomic_t fscache_n_op_release;
+
+atomic_t fscache_n_attr_changed;
+atomic_t fscache_n_attr_changed_ok;
+atomic_t fscache_n_attr_changed_nobufs;
+atomic_t fscache_n_attr_changed_nomem;
+atomic_t fscache_n_attr_changed_calls;
+
+atomic_t fscache_n_allocs;
+atomic_t fscache_n_allocs_ok;
+atomic_t fscache_n_allocs_wait;
+atomic_t fscache_n_allocs_nobufs;
+atomic_t fscache_n_alloc_ops;
+atomic_t fscache_n_alloc_op_waits;
+
+atomic_t fscache_n_retrievals;
+atomic_t fscache_n_retrievals_ok;
+atomic_t fscache_n_retrievals_wait;
+atomic_t fscache_n_retrievals_nodata;
+atomic_t fscache_n_retrievals_nobufs;
+atomic_t fscache_n_retrievals_intr;
+atomic_t fscache_n_retrievals_nomem;
+atomic_t fscache_n_retrieval_ops;
+atomic_t fscache_n_retrieval_op_waits;
+
+atomic_t fscache_n_stores;
+atomic_t fscache_n_stores_ok;
+atomic_t fscache_n_stores_again;
+atomic_t fscache_n_stores_nobufs;
+atomic_t fscache_n_stores_oom;
+atomic_t fscache_n_store_ops;
+atomic_t fscache_n_store_calls;
+
+atomic_t fscache_n_marks;
+atomic_t fscache_n_uncaches;
+
+atomic_t fscache_n_acquires;
+atomic_t fscache_n_acquires_null;
+atomic_t fscache_n_acquires_no_cache;
+atomic_t fscache_n_acquires_ok;
+atomic_t fscache_n_acquires_nobufs;
+atomic_t fscache_n_acquires_oom;
+
+atomic_t fscache_n_updates;
+atomic_t fscache_n_updates_null;
+atomic_t fscache_n_updates_run;
+
+atomic_t fscache_n_relinquishes;
+atomic_t fscache_n_relinquishes_null;
+atomic_t fscache_n_relinquishes_waitcrt;
+
+atomic_t fscache_n_cookie_index;
+atomic_t fscache_n_cookie_data;
+atomic_t fscache_n_cookie_special;
+
+atomic_t fscache_n_object_alloc;
+atomic_t fscache_n_object_no_alloc;
+atomic_t fscache_n_object_lookups;
+atomic_t fscache_n_object_lookups_negative;
+atomic_t fscache_n_object_lookups_positive;
+atomic_t fscache_n_object_created;
+atomic_t fscache_n_object_avail;
+atomic_t fscache_n_object_boosted;
+
+/*
+ * the number of operations and objects processed by each thread in the pool
+ */
+atomic_t fscache_n_ops_processed[FSCACHE_MAX_THREADS];
+atomic_t fscache_n_objs_processed[FSCACHE_MAX_THREADS];
+#endif
+
+#ifdef CONFIG_FSCACHE_HISTOGRAM
+atomic_t fscache_obj_instantiate_histogram[HZ];
+atomic_t fscache_objs_histogram[HZ];
+atomic_t fscache_ops_histogram[HZ];
+atomic_t fscache_retrieval_delay_histogram[HZ];
+atomic_t fscache_retrieval_histogram[HZ];
+#endif
diff --git a/fs/fscache/fsc-threads.c b/fs/fscache/fsc-threads.c
new file mode 100644
index 0000000..ff20598
--- /dev/null
+++ b/fs/fscache/fsc-threads.c
@@ -0,0 +1,590 @@
+/* FS-Cache worker thread pool manager
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define FSCACHE_DEBUG_LEVEL THREAD
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include "fsc-internal.h"
+
+static LIST_HEAD(fscache_object_fifo);
+static LIST_HEAD(fscache_sync_object_fifo);
+static LIST_HEAD(fscache_op_fifo);
+static LIST_HEAD(fscache_sync_op_fifo);
+
+static struct task_struct *fscache_threads[FSCACHE_MAX_THREADS];
+
+static unsigned fscache_n_threads = 21;
+module_param_named(n_threads, fscache_n_threads, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_n_threads, "FS-Cache thread pool size");
+
+static DEFINE_SPINLOCK(fscache_object_lock);
+static DEFINE_SPINLOCK(fscache_operation_lock);
+static DEFINE_MUTEX(fscache_thread_mutex);
+static DECLARE_WAIT_QUEUE_HEAD(fscache_fast_threads);
+static DECLARE_WAIT_QUEUE_HEAD(fscache_slow_threads);
+
+/**
+ * fscache_enqueue_operation - Enqueue an operation for processing
+ * @op: The operation to enqueue
+ *
+ * Enqueue an operation for processing by the FS-Cache thread pool.
+ */
+void fscache_enqueue_operation(struct fscache_operation *op)
+{
+ unsigned long flags;
+ bool added = false;
+
+ _enter("{OBJ%x}", op->object->debug_id);
+
+ ASSERT(op->processor != NULL);
+ ASSERTCMP(op->object->state, >=, FSCACHE_OBJECT_AVAILABLE);
+
+ spin_lock_irqsave(&fscache_operation_lock, flags);
+
+ if (list_empty(&op->work_link) &&
+ !test_bit(FSCACHE_OP_REQUEUE, &op->flags)) {
+ if (!test_bit(FSCACHE_OP_LOCK, &op->flags)) {
+ atomic_inc(&op->usage);
+ if (test_bit(FSCACHE_OP_SYNC, &op->flags))
+ list_add_tail(&op->work_link,
+ &fscache_sync_op_fifo);
+ else
+ list_add_tail(&op->work_link,
+ &fscache_op_fifo);
+ added = true;
+ fscache_stat(&fscache_n_op_enqueue);
+ } else {
+ set_bit(FSCACHE_OP_REQUEUE, &op->flags);
+ fscache_stat(&fscache_n_op_requeue);
+ }
+ }
+
+ spin_unlock_irqrestore(&fscache_operation_lock, flags);
+ if (added)
+ wake_up(&fscache_fast_threads);
+}
+
+EXPORT_SYMBOL(fscache_enqueue_operation);
+
+/*
+ * jump start the operation processing on an object
+ * - caller must hold object->lock
+ */
+void fscache_start_operations(struct fscache_object *object)
+{
+ struct fscache_operation *op;
+ bool stop = false;
+
+ while (!list_empty(&object->pending_ops) && !stop) {
+ op = list_entry(object->pending_ops.next,
+ struct fscache_operation, work_link);
+
+ if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) {
+ if (object->n_in_progress > 0)
+ break;
+ stop = true;
+ }
+ list_del_init(&op->work_link);
+ object->n_in_progress++;
+
+ if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags))
+ wake_up_bit(&op->flags, FSCACHE_OP_WAITING);
+ if (op->processor)
+ fscache_enqueue_operation(op);
+ }
+
+ ASSERTCMP(object->n_in_progress, <=, object->n_ops);
+
+ _debug("woke %d ops on OBJ%x",
+ object->n_in_progress, object->debug_id);
+}
+
+/*
+ * release an operation
+ * - queues pending ops if this is the last in-progress op
+ */
+void fscache_put_operation(struct fscache_operation *op)
+{
+ struct fscache_object *object;
+
+ _enter("{%d}", atomic_read(&op->usage));
+
+ ASSERTCMP(atomic_read(&op->usage), >, 0);
+
+ if (!atomic_dec_and_test(&op->usage))
+ return;
+
+ _debug("PUT OP");
+ fscache_stat(&fscache_n_op_release);
+
+ if (op->release)
+ op->release(op);
+
+ object = op->object;
+ spin_lock(&object->lock);
+ if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) {
+ ASSERTCMP(object->n_exclusive, >, 0);
+ object->n_exclusive--;
+ }
+
+ ASSERTCMP(object->n_in_progress, >, 0);
+ object->n_in_progress--;
+ if (object->n_in_progress == 0)
+ fscache_start_operations(object);
+
+ ASSERTCMP(object->n_ops, >, 0);
+ object->n_ops--;
+ if (object->n_ops == 0)
+ fscache_raise_event(object, FSCACHE_OBJECT_EV_CLEARED);
+
+ spin_unlock(&object->lock);
+ _leave("");
+}
+
+EXPORT_SYMBOL(fscache_put_operation);
+
+/*
+ * add object to queue
+ * - caller must hold fscache_thread_lock
+ */
+static void __fscache_enqueue_object(struct fscache_object *object)
+{
+ if (test_bit(FSCACHE_OBJECT_SYNC, &object->flags))
+ list_add_tail(&object->work_link, &fscache_sync_object_fifo);
+ else
+ list_add_tail(&object->work_link, &fscache_object_fifo);
+}
+
+/*
+ * enqueue an object for metadata-type processing
+ */
+void fscache_enqueue_object(struct fscache_object *object)
+{
+ struct fscache_cache *cache;
+ bool added;
+
+ _enter("{OBJ%x}", object->debug_id);
+
+ if (list_empty(&object->work_link) &&
+ !test_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events)) {
+ added = false;
+ spin_lock_irq(&fscache_object_lock);
+ if (!test_bit(FSCACHE_OBJECT_LOCK, &object->flags)) {
+ if (list_empty(&object->work_link)) {
+ _debug("add");
+ cache = object->cache;
+ atomic_inc(&cache->thread_usage);
+ cache->ops->grab_object(object);
+ __fscache_enqueue_object(object);
+ added = true;
+ }
+ } else {
+ _debug("defer");
+ set_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ }
+ spin_unlock_irq(&fscache_object_lock);
+ if (added) {
+ _debug("wake");
+ wake_up(&fscache_slow_threads);
+ }
+ }
+}
+
+/*
+ * enqueue the dependents of an object for metadata-type processing
+ * - the caller must hold the object's lock
+ * - this may cause an already locked object to wind up being processed again
+ */
+void fscache_enqueue_dependents(struct fscache_object *object)
+{
+ struct fscache_object *dep;
+
+ _enter("{%p}", object);
+
+ if (list_empty(&object->dependents))
+ return;
+
+ spin_lock_irq(&fscache_object_lock);
+
+ while (!list_empty(&object->dependents)) {
+ dep = list_entry(object->dependents.next,
+ struct fscache_object, work_link);
+ list_del(&dep->work_link);
+
+ clear_bit(FSCACHE_OBJECT_WAITING, &object->flags);
+
+ /* sort onto appropriate lists */
+ __fscache_enqueue_object(dep);
+
+ if (!list_empty(&object->dependents) && need_resched()) {
+ spin_unlock_irq(&fscache_object_lock);
+ cond_resched();
+ spin_lock_irq(&fscache_object_lock);
+ }
+ }
+
+ spin_unlock_irq(&fscache_object_lock);
+ wake_up(&fscache_slow_threads);
+}
+
+/*
+ * remove an object from whatever queue it's waiting on
+ */
+void fscache_dequeue_object(struct fscache_object *object)
+{
+ _enter("{OBJ%x}", object->debug_id);
+
+ if (!list_empty(&object->dependents)) {
+ spin_lock_irq(&fscache_object_lock);
+ list_del_init(&object->work_link);
+ spin_unlock_irq(&fscache_object_lock);
+ }
+ _leave("");
+}
+
+/*
+ * boost an object that's being waited upon by moving it to the priority queue
+ */
+void fscache_boost_object(struct fscache_object *object)
+{
+ set_bit(FSCACHE_OBJECT_BOOSTED, &object->flags);
+ if (!test_bit(FSCACHE_OBJECT_WAITING, &object->flags)) {
+ fscache_stat(&fscache_n_object_boosted);
+ spin_lock_irq(&fscache_object_lock);
+ if (!list_empty(&object->work_link))
+ list_move_tail(&object->work_link,
+ &fscache_sync_object_fifo);
+ spin_unlock_irq(&fscache_object_lock);
+ }
+}
+
+/*
+ * object dispatcher
+ * - slow threads take from the object FIFO by preference
+ * - called with fscache_thread_lock locked, which it drops
+ */
+static void fscache_dispatch_object(unsigned thread)
+{
+ struct fscache_object *object;
+ struct fscache_cache *cache = NULL;
+ unsigned long start;
+ bool sync;
+
+ spin_lock_irq(&fscache_object_lock);
+
+ if (list_empty(&fscache_object_fifo) &&
+ list_empty(&fscache_sync_object_fifo)) {
+ spin_unlock_irq(&fscache_object_lock);
+ return;
+ }
+
+ fscache_stat(&fscache_n_objs_processed[thread]);
+
+ sync = !list_empty(&fscache_sync_object_fifo);
+ if (sync) {
+ object = list_entry(fscache_sync_object_fifo.next,
+ struct fscache_object, work_link);
+ set_user_nice(current, -1);
+ } else {
+ object = list_entry(fscache_object_fifo.next,
+ struct fscache_object, work_link);
+ set_user_nice(current, 1);
+ }
+
+ /* lock the object so that it's only processed by one thread at
+ * once */
+ _debug("LK OBJ%x", object->debug_id);
+ list_del_init(&object->work_link);
+ if (test_and_set_bit(FSCACHE_OBJECT_LOCK, &object->flags)) {
+ _debug("OBJ%x already locked {%s,%lx}\n",
+ object->debug_id,
+ fscache_object_states[object->state],
+ object->events & object->event_mask);
+ set_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ spin_unlock_irq(&fscache_object_lock);
+ return;
+ }
+ do {
+ spin_unlock_irq(&fscache_object_lock);
+
+ do {
+ clear_bit(FSCACHE_OBJECT_EV_REQUEUE, &object->events);
+ start = jiffies;
+ fscache_object_state_machine(object);
+ fscache_hist(fscache_objs_histogram, start);
+ } while (object->events & object->event_mask);
+
+ spin_lock_irq(&fscache_object_lock);
+ } while (object->events & object->event_mask);
+
+ /* unlock the object */
+ _debug("UN OBJ%x", object->debug_id);
+ if (!test_and_clear_bit(FSCACHE_OBJECT_LOCK, &object->flags))
+ BUG();
+
+ spin_unlock_irq(&fscache_object_lock);
+
+ if (object) {
+ /* must do the wake up outside the thread pool lock to avoid a
+ * circular lock dependency against a __wake_up() lock in
+ * CacheFiles */
+ wake_up_bit(&object->lock, FSCACHE_OBJECT_LOCK);
+ cache = object->cache;
+ cache->ops->put_object(object);
+ _debug("%d REMAIN", atomic_read(&cache->thread_usage));
+ if (atomic_dec_and_test(&cache->thread_usage)) {
+ _debug("ALL GONE");
+ wake_up_all(&fscache_clearance_wq);
+ }
+ }
+}
+
+/*
+ * operation dispatcher
+ * - all threads can take from the fast FIFO
+ * - called with fscache_thread_lock locked, which it drops
+ */
+static void fscache_dispatch_operation(unsigned thread, struct list_head *queue)
+{
+ struct fscache_operation *op;
+ unsigned long start;
+
+ spin_lock_irq(&fscache_operation_lock);
+
+ if (list_empty(queue)) {
+ spin_unlock_irq(&fscache_operation_lock);
+ return;
+ }
+
+ fscache_stat(&fscache_n_ops_processed[thread]);
+
+ op = list_entry(queue->next, struct fscache_operation, work_link);
+
+ /* lock the operation so that it's only processed by one thread
+ * at once */
+ _debug("LK OP OBJ%x", op->object->debug_id);
+ if (test_and_set_bit(FSCACHE_OP_LOCK, &op->flags)) {
+ printk(KERN_ERR "FS-Cache: OP on OBJ%x already locked\n",
+ op->object->debug_id);
+ BUG();
+ }
+ list_del_init(&op->work_link);
+ spin_unlock_irq(&fscache_operation_lock);
+
+ ASSERT(op->processor != NULL);
+ start = jiffies;
+ op->processor(op);
+ fscache_hist(fscache_ops_histogram, start);
+
+ /* unlock the op and requeue if requested */
+ spin_lock_irq(&fscache_operation_lock);
+ _debug("UN OP OBJ%x", op->object->debug_id);
+ clear_bit(FSCACHE_OP_LOCK, &op->flags);
+
+ if (test_and_clear_bit(FSCACHE_OP_REQUEUE, &op->flags)) {
+ list_add(&op->work_link, queue);
+ op = NULL;
+ }
+
+ spin_unlock_irq(&fscache_operation_lock);
+
+ if (op)
+ fscache_put_operation(op);
+}
+
+/*
+ * thread dispatcher
+ */
+static void fscache_dispatch(unsigned thread)
+{
+ _enter("");
+
+ switch (thread % 3) {
+ case 2:
+ /* threads 2, 5, 8, 11, ... do objects, async ops then sync
+ * ops */
+ if (!list_empty(&fscache_object_fifo) ||
+ !list_empty(&fscache_sync_object_fifo)) {
+ fscache_dispatch_object(thread);
+ break;
+ }
+ case 1:
+ /* threads 1, 4, 7, 10, ... do async ops then sync ops */
+ if (!list_empty(&fscache_op_fifo)) {
+ fscache_dispatch_operation(thread, &fscache_op_fifo);
+ break;
+ }
+ if (!list_empty(&fscache_sync_op_fifo)) {
+ fscache_dispatch_operation(thread,
+ &fscache_sync_op_fifo);
+ break;
+ }
+ break;
+ default:
+ /* threads 0, 3, 6, 9, ... do sync ops then async ops */
+ if (!list_empty(&fscache_sync_op_fifo)) {
+ fscache_dispatch_operation(thread,
+ &fscache_sync_op_fifo);
+ break;
+ }
+ break;
+ }
+
+ _leave("");
+}
+
+/*
+ * operation dispatcher thread dedicated to doing fast ops
+ */
+static int kfscached_fast(unsigned thread)
+{
+ DECLARE_WAITQUEUE(myself, current);
+
+ do {
+ while (!list_empty(&fscache_op_fifo) ||
+ !list_empty(&fscache_sync_op_fifo)) {
+ fscache_dispatch(thread);
+ cond_resched();
+ }
+
+ add_wait_queue(&fscache_fast_threads, &myself);
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (list_empty(&fscache_op_fifo) &&
+ list_empty(&fscache_sync_op_fifo) &&
+ !kthread_should_stop())
+ schedule();
+ remove_wait_queue(&fscache_fast_threads, &myself);
+ __set_current_state(TASK_RUNNING);
+ } while (!kthread_should_stop());
+
+ return 0;
+}
+
+/*
+ * operation dispatcher thread for preferentially doing slow metadata ops (but
+ * can also do fast ops if bored)
+ */
+static int kfscached(void *_thread)
+{
+ unsigned thread = (unsigned long) _thread;
+
+ DECLARE_WAITQUEUE(myself_fast, current);
+ DECLARE_WAITQUEUE(myself_slow, current);
+
+ if (thread % 3 != 2) {
+ //if (thread % 3 == 1)
+ //set_user_nice(current, -4);
+ return kfscached_fast(thread);
+ }
+
+ set_user_nice(current, 1);
+
+ do {
+ while (!list_empty(&fscache_object_fifo) ||
+ !list_empty(&fscache_sync_object_fifo) ||
+ !list_empty(&fscache_op_fifo) ||
+ !list_empty(&fscache_sync_op_fifo)) {
+ fscache_dispatch(thread);
+ cond_resched();
+ }
+
+ add_wait_queue(&fscache_slow_threads, &myself_slow);
+ add_wait_queue_tail(&fscache_fast_threads, &myself_fast);
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (list_empty(&fscache_object_fifo) &&
+ list_empty(&fscache_sync_object_fifo) &&
+ list_empty(&fscache_op_fifo) &&
+ list_empty(&fscache_sync_op_fifo) &&
+ !kthread_should_stop())
+ schedule();
+ remove_wait_queue(&fscache_fast_threads, &myself_fast);
+ remove_wait_queue(&fscache_slow_threads, &myself_slow);
+ __set_current_state(TASK_RUNNING);
+ } while (!kthread_should_stop());
+
+ return 0;
+}
+
+/*
+ * initialise the worker thread pool
+ */
+int fscache_init_threads(void)
+{
+ static bool inited = false;
+ struct task_struct *t;
+ unsigned long loop, max;
+ int ret;
+
+ _enter("");
+
+ mutex_lock(&fscache_thread_mutex);
+
+ if (!inited) {
+ max = fscache_n_threads;
+ if (max < FSCACHE_MIN_THREADS)
+ max = FSCACHE_MIN_THREADS;
+ else if (max > FSCACHE_MAX_THREADS)
+ max = FSCACHE_MAX_THREADS;
+
+ for (loop = 0; loop < max; loop++) {
+ t = kthread_create(kfscached, (void *) loop,
+ "kfsc%02ud", loop);
+ if (IS_ERR(t))
+ goto failed;
+ fscache_threads[loop] = t;
+ wake_up_process(t);
+ }
+
+ inited = true;
+ }
+ mutex_unlock(&fscache_thread_mutex);
+ _leave(" = 0");
+ return 0;
+
+failed:
+ ret = PTR_ERR(t);
+ printk(KERN_ERR "FS-Cache: Unable to create kfscached threads (%d)\n",
+ ret);
+
+ while (loop-- > 0) {
+ if (fscache_threads[loop]) {
+ kthread_stop(fscache_threads[loop]);
+ fscache_threads[loop] = NULL;
+ }
+ }
+
+ mutex_unlock(&fscache_thread_mutex);
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * kill the pool of running threads
+ */
+void fscache_kill_threads(void)
+{
+ unsigned int loop;
+
+ _enter("");
+ mutex_lock(&fscache_thread_mutex);
+
+ for (loop = 0; loop < FSCACHE_MAX_THREADS; loop++)
+ if (fscache_threads[loop])
+ kthread_stop(fscache_threads[loop]);
+
+ BUG_ON(!list_empty(&fscache_object_fifo));
+ BUG_ON(!list_empty(&fscache_sync_object_fifo));
+ BUG_ON(!list_empty(&fscache_op_fifo));
+ BUG_ON(!list_empty(&fscache_sync_op_fifo));
+
+ mutex_unlock(&fscache_thread_mutex);
+ _leave("");
+}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
new file mode 100644
index 0000000..176797f
--- /dev/null
+++ b/include/linux/fscache-cache.h
@@ -0,0 +1,433 @@
+/* General filesystem caching backing cache interface
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * NOTE!!! See:
+ *
+ * Documentation/filesystems/caching/backend-api.txt
+ *
+ * for a description of the cache backend interface declared here.
+ */
+
+#ifndef _LINUX_FSCACHE_CACHE_H
+#define _LINUX_FSCACHE_CACHE_H
+
+#include <linux/fscache.h>
+
+#define NR_MAXCACHES BITS_PER_LONG
+
+struct fscache_cache;
+struct fscache_cache_ops;
+struct fscache_object;
+
+#ifdef CONFIG_FSCACHE_PROC
+extern struct proc_dir_entry *proc_fscache;
+#endif
+
+/*
+ * cache tag definition
+ */
+struct fscache_cache_tag {
+ struct list_head link;
+ struct fscache_cache *cache; /* cache referred to by this tag */
+ unsigned long flags;
+#define FSCACHE_TAG_RESERVED 0 /* T if tag is reserved for a cache */
+ atomic_t usage;
+ char name[0]; /* tag name */
+};
+
+/*
+ * cache definition
+ */
+struct fscache_cache {
+ const struct fscache_cache_ops *ops;
+ struct fscache_cache_tag *tag; /* tag representing this cache */
+ struct kobject kobj; /* system representation of this cache */
+ struct list_head link; /* link in list of caches */
+ size_t max_index_size; /* maximum size of index data */
+ char identifier[32]; /* cache label */
+
+ /* node management */
+ struct list_head object_list; /* list of data/index objects */
+ spinlock_t object_list_lock;
+ atomic_t thread_usage; /* no. of threads working this cache */
+ struct fscache_object *fsdef; /* object for the fsdef index */
+ unsigned long flags;
+#define FSCACHE_IOERROR 0 /* cache stopped on I/O error */
+#define FSCACHE_CACHE_WITHDRAWN 1 /* cache has been withdrawn */
+};
+
+/*
+ * asynchronous operation being applied to or waiting to be applied to a cache
+ * object
+ * - slow operations are done in the context of the process that issued them,
+ * not in the context of kfscached
+ */
+struct fscache_operation {
+ struct list_head work_link; /* link in worker thread FIFO or
+ * link in object->pending_ops */
+ struct fscache_object *object; /* object to be operated upon */
+
+ unsigned long flags;
+#define FSCACHE_OP_WAITING 0 /* cleared when op is woken */
+#define FSCACHE_OP_SYNC 1 /* synchronous operation */
+#define FSCACHE_OP_EXCLUSIVE 2 /* exclusive op, other ops must wait */
+#define FSCACHE_OP_LOCK 3 /* thread pool processing lock */
+#define FSCACHE_OP_REQUEUE 4 /* op needs more processing */
+
+ atomic_t usage;
+
+ /* operation processor callback
+ * - can be NULL if FSCACHE_OP_WAITING is going to be used to perform
+ * the op in a non-pool thread */
+ void (*processor)(struct fscache_operation *op);
+
+ /* operation releaser */
+ void (*release)(struct fscache_operation *op);
+};
+
+extern void fscache_enqueue_operation(struct fscache_operation *);
+extern void fscache_put_operation(struct fscache_operation *);
+
+/*
+ * data read operation
+ */
+struct fscache_retrieval {
+ struct fscache_operation op;
+ struct address_space *mapping; /* netfs pages */
+ fscache_rw_complete_t end_io_func; /* function to call on I/O completion */
+ void *context; /* netfs read context (pinned) */
+ struct list_head to_do; /* list of things to be done by the backend */
+ unsigned long start_time; /* time at which retrieval started */
+};
+
+typedef int (*fscache_page_retrieval_func_t)(struct fscache_retrieval *op,
+ struct page *page,
+ gfp_t gfp);
+
+typedef int (*fscache_pages_retrieval_func_t)(struct fscache_retrieval *op,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ gfp_t gfp);
+
+/**
+ * fscache_get_retrieval - Get an extra reference on a retrieval operation
+ * @op: The retrieval operation to get a reference on
+ *
+ * Get an extra reference on a retrieval operation.
+ */
+static inline
+struct fscache_retrieval *fscache_get_retrieval(struct fscache_retrieval *op)
+{
+ atomic_inc(&op->op.usage);
+ return op;
+}
+
+/**
+ * fscache_enqueue_retrieval - Enqueue a retrieval operation for processing
+ * @op: The retrieval operation affected
+ *
+ * Enqueue a retrieval operation for processing by the FS-Cache thread pool.
+ */
+static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
+{
+ fscache_enqueue_operation(&op->op);
+}
+
+/**
+ * fscache_put_retrieval - Drop a reference to a retrieval operation
+ * @op: The retrieval operation affected
+ *
+ * Drop a reference to a retrieval operation.
+ */
+static inline void fscache_put_retrieval(struct fscache_retrieval *op)
+{
+ fscache_put_operation(&op->op);
+}
+
+/*
+ * cached page storage work item
+ * - used to do three things:
+ * - batch writes to the cache
+ * - do cache writes asynchronously
+ * - defer writes until cache object lookup completion
+ */
+struct fscache_storage {
+ struct fscache_operation op;
+ pgoff_t store_limit; /* don't write more than this */
+};
+
+/*
+ * cache operations
+ */
+struct fscache_cache_ops {
+ /* name of cache provider */
+ const char *name;
+
+ /* allocate an object record for a cookie */
+ struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
+ struct fscache_cookie *cookie);
+
+ /* look up the object for a cookie */
+ void (*lookup_object)(struct fscache_object *object);
+
+ /* finished looking up */
+ void (*lookup_complete)(struct fscache_object *object);
+
+ /* increment the usage count on this object (may fail if unmounting) */
+ struct fscache_object *(*grab_object)(struct fscache_object *object);
+
+ /* pin an object in the cache */
+ int (*pin_object)(struct fscache_object *object);
+
+ /* unpin an object in the cache */
+ void (*unpin_object)(struct fscache_object *object);
+
+ /* store the updated auxilliary data on an object */
+ void (*update_object)(struct fscache_object *object);
+
+ /* discard the resources pinned by an object and effect retirement if
+ * necessary */
+ void (*drop_object)(struct fscache_object *object);
+
+ /* dispose of a reference to an object */
+ void (*put_object)(struct fscache_object *object);
+
+ /* sync a cache */
+ void (*sync_cache)(struct fscache_cache *cache);
+
+ /* notification that the attributes of a non-index object (such as
+ * i_size) have changed */
+ int (*attr_changed)(struct fscache_object *object);
+
+ /* reserve space for an object's data and associated metadata */
+ int (*reserve_space)(struct fscache_object *object, loff_t i_size);
+
+ /* request a backing block for a page be read or allocated in the
+ * cache */
+ fscache_page_retrieval_func_t read_or_alloc_page;
+
+ /* request backing blocks for a list of pages be read or allocated in
+ * the cache */
+ fscache_pages_retrieval_func_t read_or_alloc_pages;
+
+ /* request a backing block for a page be allocated in the cache so that
+ * it can be written directly */
+ fscache_page_retrieval_func_t allocate_page;
+
+ /* request backing blocks for pages be allocated in the cache so that
+ * they can be written directly */
+ fscache_pages_retrieval_func_t allocate_pages;
+
+ /* write a page to its backing block in the cache */
+ int (*write_page)(struct fscache_storage *op, struct page *page);
+
+ /* detach backing block from a page (optional)
+ * - must release the cookie lock before returning
+ * - may sleep
+ */
+ void (*uncache_page)(struct fscache_object *object,
+ struct page *page);
+
+ /* dissociate a cache from all the pages it was backing */
+ void (*dissociate_pages)(struct fscache_cache *cache);
+};
+
+/*
+ * data file or index object cookie
+ * - a file will only appear in one cache
+ * - a request to cache a file may or may not be honoured, subject to
+ * constraints such as disk space
+ * - indices are created on disk just-in-time
+ */
+struct fscache_cookie {
+ atomic_t usage; /* number of users of this cookie */
+ atomic_t n_children; /* number of children of this cookie */
+ spinlock_t lock;
+ struct hlist_head backing_objects; /* object(s) backing this file/index */
+ const struct fscache_cookie_def *def; /* definition */
+ struct fscache_cookie *parent; /* parent of this entry */
+ void *netfs_data; /* back pointer to netfs */
+ unsigned long flags;
+#define FSCACHE_COOKIE_LOOKING_UP 0 /* T if non-index cookie being looked up still */
+#define FSCACHE_COOKIE_CREATING 1 /* T if non-index object being created still */
+#define FSCACHE_COOKIE_NO_DATA_YET 2 /* T if new object with no cached data yet */
+#define FSCACHE_COOKIE_PENDING_FILL 3 /* T if pending initial fill on object */
+#define FSCACHE_COOKIE_FILLING 4 /* T if filling object incrementally */
+#define FSCACHE_COOKIE_UNAVAILABLE 5 /* T if cookie is unavailable (error, etc) */
+};
+
+extern struct fscache_cookie fscache_fsdef_index;
+
+/*
+ * on-disk cache file or index handle
+ */
+struct fscache_object {
+ enum {
+ FSCACHE_OBJECT_INIT, /* object in initial unbound state */
+ FSCACHE_OBJECT_LOOKING_UP, /* looking up object */
+ FSCACHE_OBJECT_CREATING, /* creating object */
+
+ /* active states */
+ FSCACHE_OBJECT_AVAILABLE, /* cleaning up object after creation */
+ FSCACHE_OBJECT_ACTIVE, /* object is usable */
+ FSCACHE_OBJECT_UPDATING, /* object is updating */
+
+ /* terminal states */
+ FSCACHE_OBJECT_DYING, /* object waiting for accessors to finish */
+ FSCACHE_OBJECT_LC_DYING, /* object cleaning up after lookup/create */
+ FSCACHE_OBJECT_ABORT_INIT, /* abort the init state */
+ FSCACHE_OBJECT_RELEASING, /* releasing object */
+ FSCACHE_OBJECT_RECYCLING, /* retiring object */
+ FSCACHE_OBJECT_WITHDRAWING, /* withdrawing object */
+ FSCACHE_OBJECT_DEAD, /* object is now dead */
+ } state;
+
+ int debug_id; /* debugging ID */
+ int n_children; /* number of child objects */
+ int n_ops; /* number of ops outstanding on object */
+ int n_in_progress; /* number of ops in progress */
+ int n_exclusive; /* number of exclusive ops queued */
+ spinlock_t lock; /* state and operations lock */
+
+ unsigned long lookup_jif; /* time at which lookup started */
+ unsigned long event_mask; /* events this object is interested in */
+ unsigned long events; /* events to be processed by this object
+ * (order is important - using fls) */
+#define FSCACHE_OBJECT_EV_REQUEUE 0 /* T if object should be requeued */
+#define FSCACHE_OBJECT_EV_UPDATE 1 /* T if object should be updated */
+#define FSCACHE_OBJECT_EV_CLEARED 2 /* T if accessors all gone */
+#define FSCACHE_OBJECT_EV_ERROR 3 /* T if fatal error occurred during processing */
+#define FSCACHE_OBJECT_EV_RELEASE 4 /* T if netfs requested object release */
+#define FSCACHE_OBJECT_EV_RETIRE 5 /* T if netfs requested object retirement */
+#define FSCACHE_OBJECT_EV_WITHDRAW 6 /* T if cache requested object withdrawal */
+
+ unsigned long flags;
+#define FSCACHE_OBJECT_LOCK 0 /* T if object is busy being processed */
+#define FSCACHE_OBJECT_SYNC 1 /* T if object is has waiters */
+#define FSCACHE_OBJECT_PENDING_WRITE 2 /* T if object has pending write */
+#define FSCACHE_OBJECT_WAITING 3 /* T if object is waiting on its parent */
+#define FSCACHE_OBJECT_BOOSTED 4 /* T if object was boosted to priority queue */
+
+ struct list_head cache_link; /* link in cache->object_list */
+ struct hlist_node cookie_link; /* link in cookie->backing_objects */
+ struct fscache_cache *cache; /* cache that supplied this object */
+ struct fscache_cookie *cookie; /* netfs's file/index object */
+ struct fscache_object *parent; /* parent object */
+ struct list_head work_link; /* link in worker thread FIFO */
+ struct list_head dependents; /* FIFO of dependent objects */
+ struct list_head pending_ops; /* unstarted operations on this object */
+ struct radix_tree_root stores; /* data to be stored */
+ pgoff_t store_limit; /* current storage limit */
+};
+
+extern const char *fscache_object_states[];
+
+#define fscache_object_is_active(obj) \
+ (!test_bit(FSCACHE_IOERROR, &(obj)->cache->flags) && \
+ (obj)->state >= FSCACHE_OBJECT_AVAILABLE && \
+ (obj)->state < FSCACHE_OBJECT_DYING)
+
+/**
+ * fscache_object_init - Initialise a cache object description
+ * @object: Object description
+ *
+ * Initialise a cache object description to its basic values.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_object_init(struct fscache_object *object)
+{
+ object->state = FSCACHE_OBJECT_INIT;
+ spin_lock_init(&object->lock);
+ INIT_LIST_HEAD(&object->cache_link);
+ INIT_HLIST_NODE(&object->cookie_link);
+ INIT_LIST_HEAD(&object->work_link);
+ INIT_LIST_HEAD(&object->dependents);
+ INIT_LIST_HEAD(&object->pending_ops);
+ INIT_RADIX_TREE(&object->stores, GFP_NOFS);
+ object->n_children = 0;
+ object->n_ops = object->n_in_progress = object->n_exclusive = 0;
+ object->events = object->event_mask = 0;
+ object->flags = 0;
+ object->store_limit = 0;
+ object->cache = NULL;
+ object->cookie = NULL;
+}
+
+extern void fscache_object_lookup_negative(struct fscache_object *object);
+extern void fscache_obtained_object(struct fscache_object *object);
+
+/**
+ * fscache_object_lookup_error - Note an object encountered an error
+ * @object: The object on which the error was encountered
+ *
+ * Note that an object encountered a fatal error (usually an I/O error) and
+ * that it should be withdrawn as soon as possible.
+ */
+static inline void fscache_object_lookup_error(struct fscache_object *object)
+{
+ set_bit(FSCACHE_OBJECT_EV_ERROR, &object->events);
+}
+
+/**
+ * fscache_set_store_limit - Set the maximum size to be stored in an object
+ * @object: The object to set the maximum on
+ * @i_size: The limit to set in bytes
+ *
+ * Set the maximum size an object is permitted to reach, implying the highest
+ * byte that may be written. Intended to be called by the attr_changed() op.
+ *
+ * See Documentation/filesystems/caching/backend-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_set_store_limit(struct fscache_object *object, loff_t i_size)
+{
+ object->store_limit = i_size >> PAGE_SHIFT;
+ if (i_size & ~PAGE_MASK)
+ object->store_limit++;
+}
+
+/**
+ * fscache_end_io - End a retrieval operation on a page
+ * @op: The FS-Cache operation covering the retrieval
+ * @page: The page that was to be fetched
+ * @error: The error code (0 if successful)
+ *
+ * Note the end of an operation to retrieve a page, as covered by a particular
+ * operation record.
+ */
+static inline void fscache_end_io(struct fscache_retrieval *op,
+ struct page *page, int error)
+{
+ op->end_io_func(page, op->context, error);
+}
+
+/*
+ * out-of-line cache backend functions
+ */
+extern void fscache_init_cache(struct fscache_cache *cache,
+ const struct fscache_cache_ops *ops,
+ const char *idfmt,
+ ...) __attribute__ ((format (printf, 3, 4)));
+
+extern int fscache_add_cache(struct fscache_cache *cache,
+ struct fscache_object *fsdef,
+ const char *tagname);
+extern void fscache_withdraw_cache(struct fscache_cache *cache);
+
+extern void fscache_io_error(struct fscache_cache *cache);
+
+extern void fscache_mark_pages_cached(struct fscache_retrieval *op,
+ struct pagevec *pagevec);
+
+#endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
new file mode 100644
index 0000000..f9a074b
--- /dev/null
+++ b/include/linux/fscache.h
@@ -0,0 +1,593 @@
+/* General filesystem caching interface
+ *
+ * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * NOTE!!! See:
+ *
+ * Documentation/filesystems/caching/netfs-api.txt
+ *
+ * for a description of the network filesystem interface declared here.
+ */
+
+#ifndef _LINUX_FSCACHE_H
+#define _LINUX_FSCACHE_H
+
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+
+#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+#define fscache_available() (1)
+#define fscache_cookie_valid(cookie) (cookie)
+#else
+#define fscache_available() (0)
+#define fscache_cookie_valid(cookie) (0)
+#endif
+
+
+/* pattern used to fill dead space in an index entry */
+#define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
+
+struct pagevec;
+struct fscache_cache_tag;
+struct fscache_cookie;
+struct fscache_netfs;
+struct fscache_netfs_operations;
+
+typedef void (*fscache_rw_complete_t)(struct page *page,
+ void *context,
+ int error);
+
+/* result of index entry consultation */
+typedef enum {
+ FSCACHE_CHECKAUX_OKAY, /* entry okay as is */
+ FSCACHE_CHECKAUX_NEEDS_UPDATE, /* entry requires update */
+ FSCACHE_CHECKAUX_OBSOLETE, /* entry requires deletion */
+} fscache_checkaux_t;
+
+/*
+ * fscache cookie definition
+ */
+struct fscache_cookie_def
+{
+ /* name of cookie type */
+ char name[16];
+
+ /* cookie type */
+ uint8_t type;
+#define FSCACHE_COOKIE_TYPE_INDEX 0
+#define FSCACHE_COOKIE_TYPE_DATAFILE 1
+
+ /* select the cache into which to insert an entry in this index
+ * - optional
+ * - should return a cache identifier or NULL to cause the cache to be
+ * inherited from the parent if possible or the first cache picked
+ * for a non-index file if not
+ */
+ struct fscache_cache_tag *(*select_cache)(const void *parent_netfs_data,
+ const void *cookie_netfs_data);
+
+ /* get an index key
+ * - should store the key data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+ uint16_t (*get_key)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ /* get certain file attributes from the netfs data
+ * - this function can be absent for an index
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+ void (*get_attr)(const void *cookie_netfs_data, uint64_t *size);
+
+ /* get the auxilliary data from netfs data
+ * - this function can be absent if the index carries no state data
+ * - should store the auxilliary data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+ uint16_t (*get_aux)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ /* consult the netfs about the state of an object
+ * - this function can be absent if the index carries no state data
+ * - the netfs data from the cookie being used as the target is
+ * presented, as is the auxilliary data
+ */
+ fscache_checkaux_t (*check_aux)(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen);
+
+ /* get an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ * require a context
+ */
+ void (*get_context)(void *cookie_netfs_data, void *context);
+
+ /* release an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ * require a context
+ */
+ void (*put_context)(void *cookie_netfs_data, void *context);
+
+ /* indicate pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
+ * - the pages will have been marked with PG_fscache before this is
+ * called, so this is optional
+ */
+ void (*mark_pages_cached)(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec);
+
+ /* indicate the cookie is no longer cached
+ * - this function is called when the backing store currently caching
+ * a cookie is removed
+ * - the netfs should use this to clean up any markers indicating
+ * cached pages
+ * - this is mandatory for any object that may have data
+ */
+ void (*now_uncached)(void *cookie_netfs_data);
+};
+
+/*
+ * netfs operations pointer (currently there aren't any ops)
+ */
+struct fscache_netfs_operations
+{
+};
+
+/*
+ * fscache cached network filesystem type
+ * - name, version and ops must be filled in before registration
+ * - all other fields will be set during registration
+ */
+struct fscache_netfs
+{
+ uint32_t version; /* indexing version */
+ const char *name; /* filesystem name */
+ struct fscache_cookie *primary_index;
+ const struct fscache_netfs_operations *ops;
+ struct list_head link; /* internal link */
+};
+
+/*
+ * slow-path functions for when there is actually caching available, and the
+ * netfs does actually have a valid token
+ * - these are not to be called directly
+ * - these are undefined symbols when FS-Cache is not configured and the
+ * optimiser takes care of not using them
+ */
+extern int __fscache_register_netfs(struct fscache_netfs *);
+extern void __fscache_unregister_netfs(struct fscache_netfs *);
+extern struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *);
+extern void __fscache_release_cache_tag(struct fscache_cache_tag *);
+extern struct fscache_cookie *__fscache_acquire_cookie(struct fscache_cookie *,
+ const struct fscache_cookie_def *,
+ void *);
+extern void __fscache_relinquish_cookie(struct fscache_cookie *, int);
+extern void __fscache_update_cookie(struct fscache_cookie *);
+extern int __fscache_pin_cookie(struct fscache_cookie *);
+extern void __fscache_unpin_cookie(struct fscache_cookie *);
+extern int __fscache_attr_changed(struct fscache_cookie *);
+extern int __fscache_reserve_space(struct fscache_cookie *, loff_t);
+extern int __fscache_read_or_alloc_page(struct fscache_cookie *,
+ struct page *,
+ fscache_rw_complete_t,
+ void *,
+ gfp_t);
+extern int __fscache_read_or_alloc_pages(struct fscache_cookie *,
+ struct address_space *,
+ struct list_head *,
+ unsigned *,
+ fscache_rw_complete_t,
+ void *,
+ gfp_t);
+extern int __fscache_alloc_page(struct fscache_cookie *, struct page *, gfp_t);
+extern int __fscache_write_page(struct fscache_cookie *, struct page *, gfp_t);
+
+extern int __fscache_write_pages(struct fscache_cookie *,
+ struct pagevec *,
+ fscache_rw_complete_t,
+ void *,
+ gfp_t);
+extern void __fscache_uncache_page(struct fscache_cookie *, struct page *);
+extern void __fscache_uncache_pages(struct fscache_cookie *, struct pagevec *);
+
+/**
+ * fscache_register_netfs - Register a filesystem as desiring caching services
+ * @netfs: The description of the filesystem
+ *
+ * Register a filesystem as desiring caching services if they're available.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_register_netfs(struct fscache_netfs *netfs)
+{
+ if (fscache_available())
+ return __fscache_register_netfs(netfs);
+ else
+ return 0;
+}
+
+/**
+ * fscache_unregister_netfs - Indicate that a filesystem no longer desires
+ * caching services
+ * @netfs: The description of the filesystem
+ *
+ * Indicate that a filesystem no longer desires caching services for the
+ * moment.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_unregister_netfs(struct fscache_netfs *netfs)
+{
+ if (fscache_available())
+ __fscache_unregister_netfs(netfs);
+}
+
+/**
+ * fscache_lookup_cache_tag - Look up a cache tag
+ * @name: The name of the tag to search for
+ *
+ * Acquire a specific cache referral tag that can be used to select a specific
+ * cache in which to cache an index.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name)
+{
+ if (fscache_available())
+ return __fscache_lookup_cache_tag(name);
+ else
+ return NULL;
+}
+
+/**
+ * fscache_release_cache_tag - Release a cache tag
+ * @tag: The tag to release
+ *
+ * Release a reference to a cache referral tag previously looked up.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_release_cache_tag(struct fscache_cache_tag *tag)
+{
+ if (fscache_available())
+ __fscache_release_cache_tag(tag);
+}
+
+/**
+ * fscache_acquire_cookie - Acquire a cookie to represent a cache object
+ * @parent: The cookie that's to be the parent of this one
+ * @def: A description of the cache object, including callback operations
+ * @netfs_data: An arbitrary piece of data to be kept in the cookie to
+ * represent the cache object to the netfs
+ *
+ * This function is used to inform FS-Cache about part of an index hierarchy
+ * that can be used to locate files. This is done by requesting a cookie for
+ * each index in the path to the file.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+struct fscache_cookie *fscache_acquire_cookie(struct fscache_cookie *parent,
+ const struct fscache_cookie_def *def,
+ void *netfs_data)
+{
+ if (fscache_cookie_valid(parent))
+ return __fscache_acquire_cookie(parent, def, netfs_data);
+ else
+ return NULL;
+}
+
+/**
+ * fscache_relinquish_cookie - Return the cookie to the cache, maybe discarding
+ * it
+ * @cookie: The cookie being returned
+ * @retire: True if the cache object the cookie represents is to be discarded
+ *
+ * This function returns a cookie to the cache, forcibly discarding the
+ * associated cache object if retire is set to true.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_relinquish_cookie(cookie, retire);
+}
+
+/**
+ * fscache_update_cookie - Request that a cache object be updated
+ * @cookie: The cookie representing the cache object
+ *
+ * Request an update of the index data for the cache object associated with the
+ * cookie.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_update_cookie(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_update_cookie(cookie);
+}
+
+/**
+ * fscache_pin_cookie - Pin a data-storage cache object in its cache
+ * @cookie: The cookie representing the cache object
+ *
+ * Permit data-storage cache objects to be pinned in the cache.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_pin_cookie(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_pin_cookie(cookie);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_pin_cookie - Unpin a data-storage cache object in its cache
+ * @cookie: The cookie representing the cache object
+ *
+ * Permit data-storage cache objects to be unpinned from the cache.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_unpin_cookie(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_unpin_cookie(cookie);
+}
+
+/**
+ * fscache_attr_changed - Notify cache that an object's attributes changed
+ * @cookie: The cookie representing the cache object
+ *
+ * Send a notification to the cache indicating that an object's attributes have
+ * changed. This includes the data size. These attributes will be obtained
+ * through the get_attr() cookie definition op.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_attr_changed(struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_attr_changed(cookie);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_reserve_space - Reserve data space for a cached object
+ * @cookie: The cookie representing the cache object
+ * @i_size: The amount of space to be reserved
+ *
+ * Reserve an amount of space in the cache for the cache object attached to a
+ * cookie so that a write to that object within the space can always be
+ * honoured.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_reserve_space(cookie, size);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_read_or_alloc_page - Read a page from the cache or allocate a block
+ * in which to store it
+ * @cookie: The cookie representing the cache object
+ * @page: The netfs page to fill if possible
+ * @end_io_func: The callback to invoke when and if the page is filled
+ * @context: An arbitrary piece of data to pass on to end_io_func()
+ * @gfp: The conditions under which memory allocation should be made
+ *
+ * Read a page from the cache, or if that's not possible make a potential
+ * one-block reservation in the cache into which the page may be stored once
+ * fetched from the server.
+ *
+ * If the page is not backed by the cache object, or if it there's some reason
+ * it can't be, -ENOBUFS will be returned and nothing more will be done for
+ * that page.
+ *
+ * Else, if that page is backed by the cache, a read will be initiated directly
+ * to the netfs's page and 0 will be returned by this function. The
+ * end_io_func() callback will be invoked when the operation terminates on a
+ * completion or failure. Note that the callback may be invoked before the
+ * return.
+ *
+ * Else, if the page is unbacked, -ENODATA is returned and a block may have
+ * been allocated in the cache.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *context,
+ gfp_t gfp)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_read_or_alloc_page(cookie, page, end_io_func,
+ context, gfp);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_read_or_alloc_pages - Read pages from the cache and/or allocate
+ * blocks in which to store them
+ * @cookie: The cookie representing the cache object
+ * @mapping: The netfs inode mapping to which the pages will be attached
+ * @pages: A list of potential netfs pages to be filled
+ * @end_io_func: The callback to invoke when and if each page is filled
+ * @context: An arbitrary piece of data to pass on to end_io_func()
+ * @gfp: The conditions under which memory allocation should be made
+ *
+ * Read a set of pages from the cache, or if that's not possible, attempt to
+ * make a potential one-block reservation for each page in the cache into which
+ * that page may be stored once fetched from the server.
+ *
+ * If some pages are not backed by the cache object, or if it there's some
+ * reason they can't be, -ENOBUFS will be returned and nothing more will be
+ * done for that pages.
+ *
+ * Else, if some of the pages are backed by the cache, a read will be initiated
+ * directly to the netfs's page and 0 will be returned by this function. The
+ * end_io_func() callback will be invoked when the operation terminates on a
+ * completion or failure. Note that the callback may be invoked before the
+ * return.
+ *
+ * Else, if a page is unbacked, -ENODATA is returned and a block may have
+ * been allocated in the cache.
+ *
+ * Because the function may want to return all of -ENOBUFS, -ENODATA and 0 in
+ * regard to different pages, the return values are prioritised in that order.
+ * Any pages submitted for reading are removed from the pages list.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *context,
+ gfp_t gfp)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_read_or_alloc_pages(cookie, mapping, pages,
+ nr_pages, end_io_func,
+ context, gfp);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_alloc_page - Allocate a block in which to store a page
+ * @cookie: The cookie representing the cache object
+ * @page: The netfs page to allocate a page for
+ * @gfp: The conditions under which memory allocation should be made
+ *
+ * Request Allocation a block in the cache in which to store a netfs page
+ * without retrieving any contents from the cache.
+ *
+ * If the page is not backed by a file then -ENOBUFS will be returned and
+ * nothing more will be done, and no reservation will be made.
+ *
+ * Else, a block will be allocated if one wasn't already, and 0 will be
+ * returned
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_alloc_page(cookie, page, gfp);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_write_page - Request storage of a page in the cache
+ * @cookie: The cookie representing the cache object
+ * @page: The netfs page to store
+ * @gfp: The conditions under which memory allocation should be made
+ *
+ * Request the contents of the netfs page be written into the cache. This
+ * request may be ignored if no cache block is currently allocated, in which
+ * case it will return -ENOBUFS.
+ *
+ * If a cache block was already allocated, a write will be initiated and 0 will
+ * be returned. The PG_fscache_write page bit is set immediately and will then
+ * be cleared at the completion of the write to indicate the success or failure
+ * of the operation. Note that the completion may happen before the return.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+int fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp)
+{
+ if (fscache_cookie_valid(cookie))
+ return __fscache_write_page(cookie, page, gfp);
+ else
+ return -ENOBUFS;
+}
+
+/**
+ * fscache_uncache_page - Indicate that caching is no longer required on a page
+ * @cookie: The cookie representing the cache object
+ * @page: The netfs page that was being cached.
+ *
+ * Tell the cache that we no longer want a page to be cached and that it should
+ * remove any knowledge of the netfs page it may have.
+ *
+ * Note that this cannot cancel any outstanding I/O operations between this
+ * page and the cache.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_uncache_page(struct fscache_cookie *cookie,
+ struct page *page)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_uncache_page(cookie, page);
+}
+
+#endif /* _LINUX_FSCACHE_H */

2007-08-09 17:05:19

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]


--- David Howells <[email protected]> wrote:

> Make it possible for a process's file creation SID to be temporarily
> overridden
> by CacheFiles so that files created in the cache have the right label
> attached.
>
> Without this facility, files created in the cache will be given the current
> file creation SID of whatever process happens to have invoked CacheFiles
> indirectly by means of opening a netfs file at the time the cache file is
> created.

This is SELinux specific funtionality and should be done in the
SELinux code. You should not be adding interfaces that are SELinux
specific, in this case using secids instead of the LSM blob interfaces.


Casey Schaufler
[email protected]

2007-08-09 17:07:47

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]


--- David Howells <[email protected]> wrote:

> Permit an inode's security ID to be obtained by the CacheFiles module. This
> is
> then used as the SID with which files and directories will be created in the
> cache.

This is SELinux specific functionality. It should not be an LSM
interface.

Casey Schaufler
[email protected]

2007-08-09 17:25:27

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

On Thu, 2007-08-09 at 10:07 -0700, Casey Schaufler wrote:
> --- David Howells <[email protected]> wrote:
>
> > Permit an inode's security ID to be obtained by the CacheFiles module. This
> > is
> > then used as the SID with which files and directories will be created in the
> > cache.
>
> This is SELinux specific functionality. It should not be an LSM
> interface.

Odd, you proposed exactly the same hook (aside from naming convention
and secid as argument vs. as retval) in recent postings on linux-audit
and selinux list for use by the audit system.

--
Stephen Smalley
National Security Agency

2007-08-09 17:59:38

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]


--- Stephen Smalley <[email protected]> wrote:

> On Thu, 2007-08-09 at 10:07 -0700, Casey Schaufler wrote:
> > --- David Howells <[email protected]> wrote:
> >
> > > Permit an inode's security ID to be obtained by the CacheFiles module.
> This
> > > is
> > > then used as the SID with which files and directories will be created in
> the
> > > cache.
> >
> > This is SELinux specific functionality. It should not be an LSM
> > interface.
>
> Odd, you proposed exactly the same hook (aside from naming convention
> and secid as argument vs. as retval) in recent postings on linux-audit
> and selinux list for use by the audit system.

And that's exposing SELinux specific functionality too. And I don't
like the fact that the audit system already requires a secid interface.
The audit system, however, does not use the secid for anything other
than a handle that gets passed around and eventually used to get the
data that goes into the audit record. It's annoying, but harmless and
does not affect any access control decisions. The change proposed here
would use the secid in access control decisions. The LSM interface
ought not to be exposing module specific internal data structures. My
work on pulling selinux code out of audit left the secid interface
in place. You're right in that audit should get fixed. I had been
hoping to make that a phase II activity.


Casey Schaufler
[email protected]

2007-08-09 18:08:03

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

Casey Schaufler <[email protected]> wrote:

> This is SELinux specific functionality. It should not be an LSM
> interface.

This is what I worked out in conjunction with the denizens of the SELinux
mailing list. What would you have me do differently? Change things like:

u32 (*act_as_secid)(u32 secid);

to something like:

void (*act_as_secid)(const char *newsecdata, u32 newseclen,
char *oldsecdata, u32 *oldseclen);

David

2007-08-09 18:08:55

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]

Casey Schaufler <[email protected]> wrote:

> This is SELinux specific funtionality and should be done in the
> SELinux code. You should not be adding interfaces that are SELinux
> specific, in this case using secids instead of the LSM blob interfaces.

Is using secids your only objection? Or are you objecting to the whole
'act-as' concept?

David

2007-08-09 18:17:55

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

On Thu, 9 Aug 2007, David Howells wrote:

> + u32 (*inode_get_secid)(struct inode *inode);

To maintain API consistency, please return an int which only acts as an
error code, and returning the secid via a *u32 function parameter.


- James
--
James Morris
<[email protected]>

2007-08-09 18:22:23

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

James Morris <[email protected]> wrote:

> > + u32 (*inode_get_secid)(struct inode *inode);
>
> To maintain API consistency, please return an int which only acts as an
> error code, and returning the secid via a *u32 function parameter.

Does that apply to *all* the functions, irrespective of whether or not they
return an error?

David

2007-08-09 18:25:09

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 14/14] NFS: Use local caching [try #2]

Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
should really be moved into fscache.c.

Otherwise, this looks a lot less intrusive than previous patches.

See inlined comments.

On Thu, 2007-08-09 at 17:05 +0100, David Howells wrote:
> The attached patch makes it possible for the NFS filesystem to make use of the
> network filesystem local caching service (FS-Cache).
>
> To be able to use this, an updated mount program is required. This can be
> obtained from:
>
> http://people.redhat.com/steved/cachefs/util-linux/
>
> To mount an NFS filesystem to use caching, add an "fsc" option to the mount:
>
> mount warthog:/ /a -o fsc
>
> Signed-Off-By: David Howells <[email protected]>
> ---
>
> fs/Kconfig | 8 +
> fs/nfs/Makefile | 1
> fs/nfs/client.c | 11 +
> fs/nfs/file.c | 38 +++-
> fs/nfs/fscache.c | 303 +++++++++++++++++++++++++++++
> fs/nfs/fscache.h | 464 ++++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/inode.c | 16 ++
> fs/nfs/internal.h | 8 +
> fs/nfs/read.c | 27 ++-
> fs/nfs/super.c | 1
> fs/nfs/sysctl.c | 43 ++++
> fs/nfs/write.c | 3
> include/linux/nfs4_mount.h | 3
> include/linux/nfs_fs.h | 6 +
> include/linux/nfs_fs_sb.h | 5
> include/linux/nfs_mount.h | 3
> 16 files changed, 931 insertions(+), 9 deletions(-)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7feb4cb..76d5d16 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -1600,6 +1600,14 @@ config NFS_V4
>
> If unsure, say N.
>
> +config NFS_FSCACHE
> + bool "Provide NFS client caching support (EXPERIMENTAL)"
> + depends on EXPERIMENTAL
> + depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
> + help
> + Say Y here if you want NFS data to be cached locally on disc through
> + the general filesystem cache manager
> +
> config NFS_DIRECTIO
> bool "Allow direct I/O on NFS files"
> depends on NFS_FS
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index b55cb23..c9e7c43 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \
> nfs4namespace.o
> nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
> nfs-$(CONFIG_SYSCTL) += sysctl.o
> +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
> nfs-objs := $(nfs-y)
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index a49f9fe..7be7807 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -137,6 +137,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname,
> clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
> #endif
>
> + nfs_fscache_get_client_cookie(clp);
> +
> return clp;
>
> error_3:
> @@ -168,6 +170,8 @@ static void nfs_free_client(struct nfs_client *clp)
>
> nfs4_shutdown_client(clp);
>
> + nfs_fscache_release_client_cookie(clp);
> +
> /* -EIO all pending I/O */
> if (!IS_ERR(clp->cl_rpcclient))
> rpc_shutdown_client(clp->cl_rpcclient);
> @@ -1308,7 +1312,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
>
> /* display header on line 1 */
> if (v == &nfs_volume_list) {
> - seq_puts(m, "NV SERVER PORT DEV FSID\n");
> + seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
> return 0;
> }
> /* display one transport per line on subsequent lines */
> @@ -1322,12 +1326,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v)
> (unsigned long long) server->fsid.major,
> (unsigned long long) server->fsid.minor);
>
> - seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
> + seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
> clp->cl_nfsversion,
> NIPQUAD(clp->cl_addr.sin_addr),
> ntohs(clp->cl_addr.sin_port),
> dev,
> - fsid);
> + fsid,
> + nfs_server_fscache_state(server));

Please send these changes as a separate patch: they change an existing
interface.

>
> return 0;
> }
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index c87dc71..dfd36e0 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -34,6 +34,7 @@
>
> #include "delegation.h"
> #include "iostat.h"
> +#include "internal.h"
>
> #define NFSDBG_FACILITY NFSDBG_FILE
>
> @@ -259,6 +260,10 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * vma)
> status = nfs_revalidate_mapping(inode, file->f_mapping);
> if (!status)
> status = generic_file_mmap(file, vma);
> +
> + if (status == 0)
> + nfs_fscache_install_vm_ops(inode, vma);
> +
> return status;

Please note that in 2.6.24 the NFS client will be gaining a
page_mkwrite() method in order to fix the remaining mmap() races.
AFAICS, it should be fairly trivial to merge with your
nfs_file_page_mkwrite.

> }
>
> @@ -311,22 +316,51 @@ static int nfs_commit_write(struct file *file, struct page *page, unsigned offse
> return status;
> }
>
> +/*
> + * partially or wholly invalidate a page
> + * - release the private state associated with a page if undergoing complete
> + * page invalidation
> + * - caller holds page lock
> + */
> static void nfs_invalidate_page(struct page *page, unsigned long offset)
> {
> if (offset != 0)
> return;
> /* Cancel any unstarted writes on this page */
> nfs_wb_page_priority(page->mapping->host, page, FLUSH_INVALIDATE);
> +
> + nfs_fscache_invalidate_page(page, page->mapping->host, offset);
> +
> + /* we can do this here as the bits are only set with the page lock
> + * held, and our caller is holding that */
> + if (!page->private)
> + ClearPagePrivate(page);

Why would PG_private be set at this point?

In any case, please send this and other PagePrivate changes as a
separate patch. Any changes to the PagePrivate semantics must be made
easy to debug.

> }
>
> +/*
> + * release the private state associated with a page, if the page isn't busy
> + * - caller holds page lock
> + * - return true (may release) or false (may not)
> + */
> static int nfs_release_page(struct page *page, gfp_t gfp)
> {
> - /* If PagePrivate() is set, then the page is not freeable */
> - return 0;
> + /*
> + * Avoid deadlock on nfs_wait_on_request().
> + */
> + if (!(gfp & __GFP_FS))
> + return 0;

We got rid of this. There should be no race with nfs_wait_on_request().
Either fix up the comment, or remove the test.

> +
> + if (!nfs_fscache_release_page(page, gfp))
> + return 0;

This looks _very_ dubious. Why shouldn't I be able to discard a page
just because fscache isn't done writing it out? I may have very good
reasons to do so.

> + BUG_ON(page->private != 0);
> + ClearPagePrivate(page);
> + return 1;
> }
>
> static int nfs_launder_page(struct page *page)
> {
> + nfs_fscache_invalidate_page(page, page->mapping->host, 0);

Why? The function of launder_page() is to clean the page, not to
truncate it.

> return nfs_wb_page(page->mapping->host, page);
> }
>
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> new file mode 100644
> index 0000000..2b2785a
> --- /dev/null
> +++ b/fs/nfs/fscache.c
> @@ -0,0 +1,303 @@
> +/* fscache.c: NFS filesystem cache interface
> + *
> + * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/mm.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_fs_sb.h>
> +#include <linux/in6.h>
> +
> +#include "internal.h"
> +
> +/*
> + * Sysctl variables
> + */
> +atomic_t nfs_fscache_to_pages;
> +atomic_t nfs_fscache_from_pages;
> +atomic_t nfs_fscache_uncache_page;
> +int nfs_fscache_from_error;
> +int nfs_fscache_to_error;
> +
> +#define NFSDBG_FACILITY NFSDBG_FSCACHE
> +
> +/* the auxiliary data in the cache (used for coherency management) */
> +struct nfs_fh_auxdata {
> + struct timespec i_mtime;
> + struct timespec i_ctime;
> + loff_t i_size;
> +};
> +
> +static struct fscache_netfs_operations nfs_cache_ops = {
> +};
> +
> +struct fscache_netfs nfs_cache_netfs = {
> + .name = "nfs",
> + .version = 0,
> + .ops = &nfs_cache_ops,
> +};
> +
> +static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
> + [0 ... 9] = 0x00,
> + [10 ... 11] = 0xff
> +};
> +
> +struct nfs_server_key {
> + uint16_t nfsversion;
> + uint16_t port;
> + union {
> + struct {
> + uint8_t ipv6wrapper[12];
> + struct in_addr addr;
> + } ipv4_addr;
> + struct in6_addr ipv6_addr;
> + };
> +};
> +
> +static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + const struct nfs_client *clp = cookie_netfs_data;
> + struct nfs_server_key *key = buffer;
> + uint16_t len = 0;
> +
> + key->nfsversion = clp->cl_nfsversion;
> +
> + switch (clp->cl_addr.sin_family) {
> + case AF_INET:
> + key->port = clp->cl_addr.sin_port;
> +
> + memcpy(&key->ipv4_addr.ipv6wrapper,
> + &nfs_cache_ipv6_wrapper_for_ipv4,
> + sizeof(key->ipv4_addr.ipv6wrapper));
> + memcpy(&key->ipv4_addr.addr,
> + &clp->cl_addr.sin_addr,
> + sizeof(key->ipv4_addr.addr));
> + len = sizeof(struct nfs_server_key);
> + break;
> +
> + case AF_INET6:
> + key->port = clp->cl_addr.sin_port;
> +
> + memcpy(&key->ipv6_addr,
> + &clp->cl_addr.sin_addr,
> + sizeof(key->ipv6_addr));
> + len = sizeof(struct nfs_server_key);
> + break;
> +
> + default:
> + len = 0;
> + printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
> + clp->cl_addr.sin_family);
> + break;
> + }
> +
> + return len;
> +}
> +
> +/*
> + * the root index for the filesystem is defined by nfsd IP address and ports
> + */
> +struct fscache_cookie_def nfs_cache_server_index_def = {
> + .name = "NFS.servers",
> + .type = FSCACHE_COOKIE_TYPE_INDEX,
> + .get_key = nfs_server_get_key,
> +};
> +
> +static uint16_t nfs_fh_get_key(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + const struct nfs_inode *nfsi = cookie_netfs_data;
> + uint16_t nsize;
> +
> + /* set the file handle */
> + nsize = nfsi->fh.size;
> + memcpy(buffer, nfsi->fh.data, nsize);
> + return nsize;
> +}
> +
> +/*
> + * get an extra reference on a read context
> + * - this function can be absent if the completion function doesn't
> + * require a context
> + */
> +static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
> +{
> + get_nfs_open_context(context);
> +}
> +
> +/*
> + * release an extra reference on a read context
> + * - this function can be absent if the completion function doesn't
> + * require a context
> + */
> +static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
> +{
> + if (context)
> + put_nfs_open_context(context);
> +}
> +
> +/*
> + * indication the cookie is no longer uncached
> + * - this function is called when the backing store currently caching a cookie
> + * is removed
> + * - the netfs should use this to clean up any markers indicating cached pages
> + * - this is mandatory for any object that may have data
> + */
> +static void nfs_fh_now_uncached(void *cookie_netfs_data)
> +{
> + struct nfs_inode *nfsi = cookie_netfs_data;
> + struct pagevec pvec;
> + pgoff_t first;
> + int loop, nr_pages;
> +
> + pagevec_init(&pvec, 0);
> + first = 0;
> +
> + dprintk("NFS: nfs_fh_now_uncached: nfs_inode 0x%p\n", nfsi);
> +
> + for (;;) {
> + /* grab a bunch of pages to clean */
> + nr_pages = pagevec_lookup(&pvec,
> + nfsi->vfs_inode.i_mapping,
> + first,
> + PAGEVEC_SIZE - pagevec_count(&pvec));
> + if (!nr_pages)
> + break;
> +
> + for (loop = 0; loop < nr_pages; loop++)
> + ClearPageFsCache(pvec.pages[loop]);
> +
> + first = pvec.pages[nr_pages - 1]->index + 1;
> +
> + pvec.nr = nr_pages;
> + pagevec_release(&pvec);
> + cond_resched();
> + }
> +}
> +
> +/*
> + * get certain file attributes from the netfs data
> + * - this function can be absent for an index
> + * - not permitted to return an error
> + * - the netfs data from the cookie being used as the source is
> + * presented
> + */
> +static void nfs_fh_get_attr(const void *cookie_netfs_data, uint64_t *size)
> +{
> + const struct nfs_inode *nfsi = cookie_netfs_data;
> +
> + *size = nfsi->vfs_inode.i_size;
> +}
> +
> +/*
> + * get the auxilliary data from netfs data
> + * - this function can be absent if the index carries no state data
> + * - should store the auxilliary data in the buffer
> + * - should return the amount of amount stored
> + * - not permitted to return an error
> + * - the netfs data from the cookie being used as the source is
> + * presented
> + */
> +static uint16_t nfs_fh_get_aux(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + struct nfs_fh_auxdata auxdata;
> + const struct nfs_inode *nfsi = cookie_netfs_data;
> +
> + auxdata.i_size = nfsi->vfs_inode.i_size;
> + auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
> + auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
> +
> + if (bufmax > sizeof(auxdata))
> + bufmax = sizeof(auxdata);
> +
> + memcpy(buffer, &auxdata, bufmax);
> + return bufmax;
> +}
> +
> +/*
> + * consult the netfs about the state of an object
> + * - this function can be absent if the index carries no state data
> + * - the netfs data from the cookie being used as the target is
> + * presented, as is the auxilliary data
> + */
> +static fscache_checkaux_t nfs_fh_check_aux(void *cookie_netfs_data,
> + const void *data, uint16_t datalen)
> +{
> + struct nfs_fh_auxdata auxdata;
> + struct nfs_inode *nfsi = cookie_netfs_data;
> +
> + if (datalen > sizeof(auxdata))
> + return FSCACHE_CHECKAUX_OBSOLETE;
> +
> + auxdata.i_size = nfsi->vfs_inode.i_size;
> + auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
> + auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
> +
> + if (memcmp(data, &auxdata, datalen) != 0)
> + return FSCACHE_CHECKAUX_OBSOLETE;
> +
> + return FSCACHE_CHECKAUX_OKAY;
> +}
> +
> +/*
> + * the primary index for each server is simply made up of a series of NFS file
> + * handles
> + */
> +struct fscache_cookie_def nfs_cache_fh_index_def = {
> + .name = "NFS.fh",
> + .type = FSCACHE_COOKIE_TYPE_DATAFILE,
> + .get_key = nfs_fh_get_key,
> + .get_attr = nfs_fh_get_attr,
> + .get_aux = nfs_fh_get_aux,
> + .check_aux = nfs_fh_check_aux,
> + .get_context = nfs_fh_get_context,
> + .put_context = nfs_fh_put_context,
> + .now_uncached = nfs_fh_now_uncached,
> +};
> +
> +static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> +{
> + wait_on_page_fscache_write(page);
> + return 0;
> +}
> +
> +struct vm_operations_struct nfs_fs_vm_operations = {
> + .fault = filemap_fault,
> + .page_mkwrite = nfs_file_page_mkwrite,
> +};
> +
> +/*
> + * handle completion of a page being read from the cache
> + * - called in process (keventd) context
> + */
> +void nfs_readpage_from_fscache_complete(struct page *page,
> + void *context,
> + int error)
> +{
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
> + page, context, error);
> +
> + /* if the read completes with an error, we just unlock the page and let
> + * the VM reissue the readpage */
> + if (!error) {
> + SetPageUptodate(page);
> + unlock_page(page);
> + } else {
> + error = nfs_readpage_async(context, page->mapping->host, page);
> + if (error)
> + unlock_page(page);
> + }
> +}
> diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
> new file mode 100644
> index 0000000..152309c
> --- /dev/null
> +++ b/fs/nfs/fscache.h
> @@ -0,0 +1,464 @@
> +/* fscache.h: NFS filesystem cache interface definitions
> + *
> + * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _NFS_FSCACHE_H
> +#define _NFS_FSCACHE_H
> +
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_mount.h>
> +#include <linux/nfs4_mount.h>
> +
> +#ifdef CONFIG_NFS_FSCACHE
> +#include <linux/fscache.h>
> +
> +extern struct fscache_netfs nfs_cache_netfs;
> +extern struct fscache_cookie_def nfs_cache_server_index_def;
> +extern struct fscache_cookie_def nfs_cache_fh_index_def;
> +extern struct vm_operations_struct nfs_fs_vm_operations;
> +
> +extern void nfs_invalidatepage(struct page *, unsigned long);
> +extern int nfs_releasepage(struct page *, gfp_t);
> +
> +extern atomic_t nfs_fscache_to_pages;
> +extern atomic_t nfs_fscache_from_pages;
> +extern atomic_t nfs_fscache_uncache_page;
> +extern int nfs_fscache_from_error;
> +extern int nfs_fscache_to_error;
> +
> +/*
> + * register NFS for caching
> + */
> +static inline int nfs_fscache_register(void)
> +{
> + return fscache_register_netfs(&nfs_cache_netfs);
> +}
> +
> +/*
> + * unregister NFS for caching
> + */
> +static inline void nfs_fscache_unregister(void)
> +{
> + fscache_unregister_netfs(&nfs_cache_netfs);
> +}
> +
> +/*
> + * get the per-client index cookie for an NFS client if the appropriate mount
> + * flag was set
> + * - we always try and get an index cookie for the client, but get filehandle
> + * cookies on a per-superblock basis, depending on the mount flags
> + */
> +static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp)
> +{
> + /* create a cache index for looking up filehandles */
> + clp->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
> + &nfs_cache_server_index_def,
> + clp);
> + dfprintk(FSCACHE,"NFS: get client cookie (0x%p/0x%p)\n",
> + clp, clp->fscache);
> +}
> +
> +/*
> + * dispose of a per-client cookie
> + */
> +static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp)
> +{
> + dfprintk(FSCACHE,"NFS: releasing client cookie (0x%p/0x%p)\n",
> + clp, clp->fscache);
> +
> + fscache_relinquish_cookie(clp->fscache, 0);
> + clp->fscache = NULL;
> +}
> +
> +/*
> + * indicate the client caching state as readable text
> + */
> +static inline const char *nfs_server_fscache_state(struct nfs_server *server)
> +{
> + if (server->nfs_client->fscache && (server->flags & NFS_MOUNT_FSCACHE))
> + return "yes";
> + return "no ";
> +}
> +
> +/*
> + * get the per-filehandle cookie for an NFS inode
> + */
> +static inline void nfs_fscache_init_fh_cookie(struct inode *inode)
> +{
> + NFS_I(inode)->fscache = NULL;
> + if (S_ISREG(inode->i_mode))
> + set_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);
> +}
> +
> +/*
> + * get the per-filehandle cookie for an NFS inode
> + */
> +static inline void nfs_fscache_enable_fh_cookie(struct inode *inode)
> +{
> + struct super_block *sb = inode->i_sb;
> + struct nfs_inode *nfsi = NFS_I(inode);
> +
> + if (nfsi->fscache || !NFS_FSCACHE(inode))
> + return;
> +
> + if ((NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
> + nfsi->fscache = fscache_acquire_cookie(
> + NFS_SB(sb)->nfs_client->fscache,
> + &nfs_cache_fh_index_def,
> + nfsi);
> +
> + dfprintk(FSCACHE, "NFS: get FH cookie (0x%p/0x%p/0x%p)\n",
> + sb, nfsi, nfsi->fscache);
> + }
> +}
> +
> +/*
> + * change the filesize associated with a per-filehandle cookie
> + */
> +static inline void nfs_fscache_attr_changed(struct inode *inode)
> +{
> + fscache_attr_changed(NFS_I(inode)->fscache);
> +}
> +
> +/*
> + * replace a per-filehandle cookie due to revalidation detecting a file having
> + * changed on the server
> + */
> +static inline void nfs_fscache_renew_fh_cookie(struct inode *inode)
> +{
> + struct nfs_inode *nfsi = NFS_I(inode);
> + struct nfs_server *server = NFS_SERVER(inode);
> + struct fscache_cookie *old = nfsi->fscache;
> +
> + if (nfsi->fscache) {
> + /* retire the current fscache cache and get a new one */
> + fscache_relinquish_cookie(nfsi->fscache, 1);
> +
> + nfsi->fscache = fscache_acquire_cookie(
> + server->nfs_client->fscache,
> + &nfs_cache_fh_index_def,
> + nfsi);
> +
> + dfprintk(FSCACHE,
> + "NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
> + server, nfsi, old, nfsi->fscache);
> + }
> +}
> +
> +/*
> + * release a per-filehandle cookie
> + */
> +static inline void nfs_fscache_release_fh_cookie(struct inode *inode)
> +{
> + struct nfs_inode *nfsi = NFS_I(inode);
> +
> + dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
> + nfsi, nfsi->fscache);
> +
> + fscache_relinquish_cookie(nfsi->fscache, 0);
> + nfsi->fscache = NULL;
> +}
> +
> +/*
> + * retire a per-filehandle cookie, destroying the data attached to it
> + */
> +static inline void nfs_fscache_zap_fh_cookie(struct inode *inode)
> +{
> + struct nfs_inode *nfsi = NFS_I(inode);
> +
> + dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
> + nfsi, nfsi->fscache);
> +
> + fscache_relinquish_cookie(nfsi->fscache, 1);
> + nfsi->fscache = NULL;
> +}
> +
> +/*
> + * turn off the cache with regard to a filehandle cookie if opened for writing,
> + * invalidating all the pages in the page cache relating to the associated
> + * inode to clear the per-page caching
> + */
> +static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
> +{
> + clear_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);
> +
> + if (NFS_I(inode)->fscache) {
> + dfprintk(FSCACHE,
> + "NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
> +
> + /* Need to invalided any mapped pages that were read in before
> + * turning off the cache.
> + */
> + if (inode->i_mapping && inode->i_mapping->nrpages)
> + invalidate_inode_pages2(inode->i_mapping);
> +
> + nfs_fscache_zap_fh_cookie(inode);
> + }
> +}
> +
> +/*
> + * decide if we should enable or disable the FS cache for this inode
> + * - for now, only regular files that are open read-only will be able to use
> + * the cache
> + */
> +static inline void nfs_fscache_set_fh_cookie(struct inode *inode,
> + struct file *filp)
> +{
> + if (NFS_FSCACHE(inode)) {
> + if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
> + nfs_fscache_disable_fh_cookie(inode);
> + else
> + nfs_fscache_enable_fh_cookie(inode);
> + }
> +}
> +
> +/*
> + * install the VM ops for mmap() of an NFS file so that we can hold up writes
> + * to pages on shared writable mappings until the store to the cache is
> + * complete
> + */
> +static inline void nfs_fscache_install_vm_ops(struct inode *inode,
> + struct vm_area_struct *vma)
> +{
> + if (NFS_I(inode)->fscache)
> + vma->vm_ops = &nfs_fs_vm_operations;
> +}
> +
> +/*
> + * release the caching state associated with a page, if the page isn't busy
> + * interacting with the cache
> + * - returns true (can release page) or false (page busy)
> + */
> +static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
> +{
> + if (PageFsCacheWrite(page)) {
> + if (!(gfp & __GFP_WAIT))
> + return 0;
> + wait_on_page_fscache_write(page);
> + }
> +
> + if (PageFsCache(page)) {
> + struct nfs_inode *nfsi = NFS_I(page->mapping->host);
> +
> + BUG_ON(!nfsi->fscache);
> +
> + dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
> + nfsi->fscache, page, nfsi);
> +
> + fscache_uncache_page(nfsi->fscache, page);
> + atomic_inc(&nfs_fscache_uncache_page);
> + ClearPageFsCache(page);
> + }
> +
> + return 1;
> +}
> +
> +/*
> + * release the caching state associated with a page if undergoing complete page
> + * invalidation
> + */
> +static inline void nfs_fscache_invalidate_page(struct page *page,
> + struct inode *inode,
> + unsigned long offset)
> +{
> + struct nfs_inode *nfsi = NFS_I(page->mapping->host);
> +
> + if (PageFsCache(page)) {
> + BUG_ON(!nfsi->fscache);
> +
> + dfprintk(FSCACHE,
> + "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
> + nfsi->fscache, page, nfsi);
> +
> + wait_on_page_fscache_write(page);
> +
> + if (offset == 0) {
> + BUG_ON(!PageLocked(page));
> + if (!PageWriteback(page)) {
> + fscache_uncache_page(nfsi->fscache, page);
> + atomic_inc(&nfs_fscache_uncache_page);
> + ClearPageFsCache(page);
> + }
> + }
> + }
> +}
> +
> +/*
> + * store a newly fetched page in fscache
> + */
> +static inline void nfs_readpage_to_fscache(struct inode *inode,
> + struct page *page,
> + int sync)
> +{
> + int ret;
> +
> + if (PageFsCache(page)) {
> + dfprintk(FSCACHE,
> + "NFS: "
> + "readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
> + NFS_I(inode)->fscache, page, page->index, page->flags,
> + sync);
> +
> + ret = fscache_write_page(NFS_I(inode)->fscache, page,
> + GFP_KERNEL);
> + dfprintk(FSCACHE,
> + "NFS: "
> + "readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
> + page, page->index, page->flags, ret);
> +
> + if (ret != 0) {
> + fscache_uncache_page(NFS_I(inode)->fscache, page);
> + atomic_inc(&nfs_fscache_uncache_page);
> + ClearPageFsCache(page);
> + nfs_fscache_to_error = ret;
> + } else {
> + atomic_inc(&nfs_fscache_to_pages);
> + }
> + }
> +}
> +
> +/*
> + * retrieve a page from fscache
> + */
> +extern void nfs_readpage_from_fscache_complete(struct page *, void *, int);
> +
> +static inline
> +int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode,
> + struct page *page)
> +{
> + int ret;
> +
> + if (!NFS_I(inode)->fscache)
> + return 1;
> +
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
> + NFS_I(inode)->fscache, page, page->index, page->flags, inode);
> +
> + ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
> + page,
> + nfs_readpage_from_fscache_complete,
> + ctx,
> + GFP_KERNEL);
> +
> + switch (ret) {
> + case 0: /* read BIO submitted (page in fscache) */
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache: BIO submitted\n");
> + atomic_inc(&nfs_fscache_from_pages);
> + return ret;
> +
> + case -ENOBUFS: /* inode not in cache */
> + case -ENODATA: /* page not in cache */
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache error %d\n", ret);
> + return 1;
> +
> + default:
> + dfprintk(FSCACHE, "NFS: readpage_from_fscache %d\n", ret);
> + nfs_fscache_from_error = ret;
> + }
> + return ret;
> +}
> +
> +/*
> + * retrieve a set of pages from fscache
> + */
> +static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + int ret, npages = *nr_pages;
> +
> + if (!NFS_I(inode)->fscache)
> + return 1;
> +
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
> + NFS_I(inode)->fscache, *nr_pages, inode);
> +
> + ret = fscache_read_or_alloc_pages(NFS_I(inode)->fscache,
> + mapping, pages, nr_pages,
> + nfs_readpage_from_fscache_complete,
> + ctx,
> + mapping_gfp_mask(mapping));
> +
> +
> + switch (ret) {
> + case 0: /* read BIO submitted (page in fscache) */
> + BUG_ON(!list_empty(pages));
> + BUG_ON(*nr_pages != 0);
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache: BIO submitted\n");
> +
> + atomic_add(npages, &nfs_fscache_from_pages);
> + return ret;
> +
> + case -ENOBUFS: /* inode not in cache */
> + case -ENODATA: /* page not in cache */
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
> + return 1;
> +
> + default:
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache: ret %d\n", ret);
> + nfs_fscache_from_error = ret;
> + }
> +
> + return ret;
> +}
> +
> +#else /* CONFIG_NFS_FSCACHE */
> +static inline int nfs_fscache_register(void) { return 0; }
> +static inline void nfs_fscache_unregister(void) {}
> +static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
> +static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
> +static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
> +static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }
> +
> +static inline void nfs_fscache_init_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_enable_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_attr_changed(struct inode *inode) {}
> +static inline void nfs_fscache_release_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_zap_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_renew_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_set_fh_cookie(struct inode *inode, struct file *filp) {}
> +static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
> +static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
> +{
> + return 1; /* True: may release page */
> +}
> +static inline void nfs_fscache_invalidate_page(struct page *page,
> + struct inode *inode,
> + unsigned long offset)
> +{
> +}
> +static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {}
> +static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode, struct page *page)
> +{
> + return -ENOBUFS;
> +}
> +static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + return -ENOBUFS;
> +}
> +
> +#endif /* CONFIG_NFS_FSCACHE */
> +#endif /* _NFS_FSCACHE_H */
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index bca6cdc..8b7a39c 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -88,6 +88,7 @@ void nfs_clear_inode(struct inode *inode)
> BUG_ON(atomic_read(&NFS_I(inode)->data_updates) != 0);
> nfs_zap_acl_cache(inode);
> nfs_access_zap_cache(inode);
> + nfs_fscache_release_fh_cookie(inode);
> }
>
> /**
> @@ -220,6 +221,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
> };
> struct inode *inode = ERR_PTR(-ENOENT);
> unsigned long hash;
> + int maycache = 1;
>
> if ((fattr->valid & NFS_ATTR_FATTR) == 0)
> goto out_no_inode;
> @@ -269,6 +271,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
> else
> inode->i_op = &nfs_mountpoint_inode_operations;
> inode->i_fop = NULL;
> + maycache = 0;
> }
> } else if (S_ISLNK(inode->i_mode))
> inode->i_op = &nfs_symlink_inode_operations;
> @@ -300,6 +303,8 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
> memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
> nfsi->access_cache = RB_ROOT;
>
> + nfs_fscache_init_fh_cookie(inode);
> +
> unlock_new_inode(inode);
> } else
> nfs_refresh_inode(inode, fattr);
> @@ -573,6 +578,7 @@ int nfs_open(struct inode *inode, struct file *filp)
> ctx->mode = filp->f_mode;
> nfs_file_set_open_context(filp, ctx);
> put_nfs_open_context(ctx);
> + nfs_fscache_set_fh_cookie(inode, filp);
> return 0;
> }
>
> @@ -698,6 +704,7 @@ static int nfs_invalidate_mapping_nolock(struct inode *inode, struct address_spa
> }
> spin_unlock(&inode->i_lock);
> nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
> + nfs_fscache_renew_fh_cookie(inode);
> dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
> inode->i_sb->s_id, (long long)NFS_FILEID(inode));
> return 0;
> @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> if (data_stable) {
> inode->i_size = new_isize;
> invalid |= NFS_INO_INVALID_DATA;
> + nfs_fscache_attr_changed(inode);

Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a
spinlocked context here.

> }
> invalid |= NFS_INO_INVALID_ATTR;
> } else if (new_isize > cur_isize) {
> inode->i_size = new_isize;
> invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
> + nfs_fscache_attr_changed(inode);
> }
> nfsi->cache_change_attribute = now;
> dprintk("NFS: isize change on server for file %s/%ld\n",
> @@ -1191,6 +1200,10 @@ static int __init init_nfs_fs(void)
> {
> int err;
>
> + err = nfs_fscache_register();
> + if (err < 0)
> + goto out6;
> +
> err = nfs_fs_proc_init();
> if (err)
> goto out5;
> @@ -1237,6 +1250,8 @@ out3:
> out4:
> nfs_fs_proc_exit();
> out5:
> + nfs_fscache_unregister();
> +out6:
> return err;
> }
>
> @@ -1247,6 +1262,7 @@ static void __exit exit_nfs_fs(void)
> nfs_destroy_readpagecache();
> nfs_destroy_inodecache();
> nfs_destroy_nfspagecache();
> + nfs_fscache_unregister();
> #ifdef CONFIG_PROC_FS
> rpc_proc_unregister("nfs");
> #endif
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 76cf55d..fc75251 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -27,6 +27,11 @@ struct nfs_clone_mount {
> rpc_authflavor_t authflavor;
> };
>
> +/*
> + * include filesystem caching stuff here
> + */
> +#include "fscache.h"

Why? internal.h has no dependencies on anything from fscache.h.

> +
> /* client.c */
> extern struct rpc_program nfs_program;
>
> @@ -149,6 +154,9 @@ extern int nfs4_path_walk(struct nfs_server *server,
> const char *path);
> #endif
>
> +/* read.c */
> +extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, struct page *);
> +

Please put this in nfs_fs.h together with the other NFS read exports.

> /*
> * Determine the device name as a string
> */
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 19e0563..ad6f516 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -18,6 +18,7 @@
> #include <linux/sunrpc/clnt.h>
> #include <linux/nfs_fs.h>
> #include <linux/nfs_page.h>
> +#include <linux/nfs_mount.h>
> #include <linux/smp_lock.h>
>
> #include <asm/system.h>
> @@ -114,8 +115,8 @@ static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data)
> }
> }
>
> -static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> - struct page *page)
> +int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> + struct page *page)
> {
> LIST_HEAD(one_request);
> struct nfs_page *new;
> @@ -142,6 +143,11 @@ static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
>
> static void nfs_readpage_release(struct nfs_page *req)
> {
> + struct inode *d_inode = req->wb_context->path.dentry->d_inode;
> +
> + if (PageUptodate(req->wb_page))
> + nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
> +
> unlock_page(req->wb_page);
>
> dprintk("NFS: read done (%s/%Ld %d@%Ld)\n",
> @@ -500,8 +506,15 @@ int nfs_readpage(struct file *file, struct page *page)
> ctx = get_nfs_open_context((struct nfs_open_context *)
> file->private_data);
>
> + if (!IS_SYNC(inode)) {
> + error = nfs_readpage_from_fscache(ctx, inode, page);
> + if (error == 0)
> + goto out;
> + }
> +
> error = nfs_readpage_async(ctx, inode, page);
>
> +out:
> put_nfs_open_context(ctx);
> return error;
> out_unlock:
> @@ -578,6 +591,15 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
> } else
> desc.ctx = get_nfs_open_context((struct nfs_open_context *)
> filp->private_data);
> +
> + /* attempt to read as many of the pages as possible from the cache
> + * - this returns -ENOBUFS immediately if the cookie is negative
> + */
> + ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
> + pages, &nr_pages);
> + if (ret == 0)
> + goto read_complete; /* all pages were read */
> +
> if (rsize < PAGE_CACHE_SIZE)
> nfs_pageio_init(&pgio, inode, nfs_pagein_multi, rsize, 0);
> else
> @@ -588,6 +610,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
> nfs_pageio_complete(&pgio);
> npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> nfs_add_stats(inode, NFSIOS_READPAGES, npages);
> +read_complete:
> put_nfs_open_context(desc.ctx);
> out:
> return ret;
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index b2a851c..7a74677 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -456,6 +456,7 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
> { NFS_MOUNT_NOACL, ",noacl", "" },
> { NFS_MOUNT_NORDIRPLUS, ",nordirplus", "" },
> { NFS_MOUNT_UNSHARED, ",nosharecache", ""},
> + { NFS_MOUNT_FSCACHE, ",fsc", "" },
> { 0, NULL, NULL }
> };
> const struct proc_nfs_info *nfs_infop;
> diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
> index b62481d..e522177 100644
> --- a/fs/nfs/sysctl.c
> +++ b/fs/nfs/sysctl.c
> @@ -14,6 +14,7 @@
> #include <linux/nfs_fs.h>
>
> #include "callback.h"
> +#include "internal.h"
>
> static const int nfs_set_port_min = 0;
> static const int nfs_set_port_max = 65535;
> @@ -58,6 +59,48 @@ static ctl_table nfs_cb_sysctls[] = {
> .mode = 0644,
> .proc_handler = &proc_dointvec,
> },
> +#ifdef CONFIG_NFS_FSCACHE
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_from_error",
> + .data = &nfs_fscache_from_error,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_to_error",
> + .data = &nfs_fscache_to_error,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_uncache_page",
> + .data = &nfs_fscache_uncache_page,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_to_pages",
> + .data = &nfs_fscache_to_pages,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec_minmax,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_from_pages",
> + .data = &nfs_fscache_from_pages,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> +#endif
> { .ctl_name = 0 }
> };
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index ef97e0c..6576738 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -147,6 +147,9 @@ static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int c
> return;
> nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
> i_size_write(inode, end);
> +#ifdef FSCACHE_WRITE_SUPPORT
> + nfs_set_fscsize(NFS_SERVER(inode), NFS_I(inode), end);
> +#endif

I can't see how this belongs in this patch.

> }
>
> /* A writeback failed: mark the page as bad, and invalidate the page cache */
> diff --git a/include/linux/nfs4_mount.h b/include/linux/nfs4_mount.h
> index a0dcf66..40595d0 100644
> --- a/include/linux/nfs4_mount.h
> +++ b/include/linux/nfs4_mount.h
> @@ -66,6 +66,7 @@ struct nfs4_mount_data {
> #define NFS4_MOUNT_NOAC 0x0020 /* 1 */
> #define NFS4_MOUNT_STRICTLOCK 0x1000 /* 1 */
> #define NFS4_MOUNT_UNSHARED 0x8000 /* 1 */
> -#define NFS4_MOUNT_FLAGMASK 0x9033
> +#define NFS4_MOUNT_FSCACHE 0x10000 /* 1 */
> +#define NFS4_MOUNT_FLAGMASK 0x19033
>
> #endif
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 9ba4aec..d1d6192 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -172,6 +172,9 @@ struct nfs_inode {
> int delegation_state;
> struct rw_semaphore rwsem;
> #endif /* CONFIG_NFS_V4*/
> +#ifdef CONFIG_NFS_FSCACHE
> + struct fscache_cookie *fscache;
> +#endif
> struct inode vfs_inode;
> };
>
> @@ -193,6 +196,7 @@ struct nfs_inode {
> #define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */
> #define NFS_INO_STALE (2) /* possible stale inode */
> #define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */
> +#define NFS_INO_FSCACHE (4) /* inode can be cached by FS-Cache */
>
> static inline struct nfs_inode *NFS_I(struct inode *inode)
> {
> @@ -218,6 +222,7 @@ static inline struct nfs_inode *NFS_I(struct inode *inode)
>
> #define NFS_FLAGS(inode) (NFS_I(inode)->flags)
> #define NFS_STALE(inode) (test_bit(NFS_INO_STALE, &NFS_FLAGS(inode)))
> +#define NFS_FSCACHE(inode) (test_bit(NFS_INO_FSCACHE, &NFS_FLAGS(inode)))
>
> #define NFS_FILEID(inode) (NFS_I(inode)->fileid)
>
> @@ -553,6 +558,7 @@ extern void * nfs_root_data(void);
> #define NFSDBG_CALLBACK 0x0100
> #define NFSDBG_CLIENT 0x0200
> #define NFSDBG_MOUNT 0x0400
> +#define NFSDBG_FSCACHE 0x0800
> #define NFSDBG_ALL 0xFFFF
>
> #ifdef __KERNEL__
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 0cac49b..abe1043 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -3,6 +3,7 @@
>
> #include <linux/list.h>
> #include <linux/backing-dev.h>
> +#include <linux/fscache.h>
>
> struct nfs_iostats;
>
> @@ -65,6 +66,10 @@ struct nfs_client {
> char cl_ipaddr[16];
> unsigned char cl_id_uniquifier;
> #endif
> +
> +#ifdef CONFIG_NFS_FSCACHE
> + struct fscache_cookie *fscache; /* client index cache cookie */
> +#endif
> };
>
> /*
> diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
> index a3ade89..36971cd 100644
> --- a/include/linux/nfs_mount.h
> +++ b/include/linux/nfs_mount.h
> @@ -63,6 +63,7 @@ struct nfs_mount_data {
> #define NFS_MOUNT_SECFLAVOUR 0x2000 /* 5 */
> #define NFS_MOUNT_NORDIRPLUS 0x4000 /* 5 */
> #define NFS_MOUNT_UNSHARED 0x8000 /* 5 */
> -#define NFS_MOUNT_FLAGMASK 0xFFFF
> +#define NFS_MOUNT_FSCACHE 0x10000 /* 5 */
> +#define NFS_MOUNT_FLAGMASK 0x1FFFF
>
> #endif
>

I'm not too happy about the change to the binary mount interface. As far
as I'm concerned, the binary interface should really be considered
frozen, and should not be touched.
Instead, feel free to update the text-based mount interface (which can
be found in 2.6.23-rc1 and later). Please put any such mount interface
changes in a separate patch together with the Kconfig changes.

Cheers
Trond

2007-08-09 18:44:36

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

On Thu, 9 Aug 2007, David Howells wrote:

> James Morris <[email protected]> wrote:
>
> > > + u32 (*inode_get_secid)(struct inode *inode);
> >
> > To maintain API consistency, please return an int which only acts as an
> > error code, and returning the secid via a *u32 function parameter.
>
> Does that apply to *all* the functions, irrespective of whether or not they
> return an error?

LSM is theoretically an API, so we generally don't know if a security
module will return an error or not.

If they were simply calls directly into SElinux, where we could always
know the semantics, then that would be a different story.



- James
--
James Morris
<[email protected]>

2007-08-09 18:52:43

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

On Thu, 9 Aug 2007, Casey Schaufler wrote:

> This is SELinux specific functionality. It should not be an LSM
> interface.

As long as the security labels are themselves not being exported to the
kernel to be used e.g. for display or transport, then I agree, and we
should avoid passing them around outside the LSM entirely if possible.

Usually, they're attached to a significant kernel object, which you
typically pass around as part of the interface anyway.

David, I've looked at the code and can't see that you need to access the
label itself outside the LSM. Could you instead simply pass the inode
pointer around?

(I know it's not always possible, but much preferred).


- James
--
James Morris
<[email protected]>

2007-08-09 18:52:25

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]


--- David Howells <[email protected]> wrote:

> Casey Schaufler <[email protected]> wrote:
>
> > This is SELinux specific funtionality and should be done in the
> > SELinux code. You should not be adding interfaces that are SELinux
> > specific, in this case using secids instead of the LSM blob interfaces.
>
> Is using secids your only objection? Or are you objecting to the whole
> 'act-as' concept?

My knee jerk reaction is that that is likely to be SELinux specific
behavior as well. I'm going to have to look at the patch more carefully
before I can say for sure. I will try to make a constructive proposal
once I've had the chance to think on it a little. Sorry about the terse
and unhelpful initial reaction.


Casey Schaufler
[email protected]

2007-08-09 18:53:39

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 14/14] NFS: Use local caching [try #2]

Trond Myklebust <[email protected]> wrote:

> Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
> should really be moved into fscache.c.

If you wish. It seems a shame since a lot of them have only one caller.

> > + /* we can do this here as the bits are only set with the page lock
> > + * held, and our caller is holding that */
> > + if (!page->private)
> > + ClearPagePrivate(page);
>
> Why would PG_private be set at this point?

Looks like I've got a bit more cleaning up to do. PG_private isn't set by
FS-Cache, so this bit shouldn't be here.

> In any case, please send this and other PagePrivate changes as a
> separate patch. Any changes to the PagePrivate semantics must be made
> easy to debug.

There shouldn't be any.

Note that due to patch #2, PG_fscache causes releasepage() and
invalidatepage() to be called in addition to PG_private.

> > +
> > + if (!nfs_fscache_release_page(page, gfp))
> > + return 0;
>
> This looks _very_ dubious. Why shouldn't I be able to discard a page
> just because fscache isn't done writing it out? I may have very good
> reasons to do so.

Hmmm... Looking at the truncate routines, I suppose this ought to be okay,
provided the cache retains a reference on the page whilst it's writing it out
(put_page() won't can the page until we release it).

It also seems dubious, though, to release the page when the filesystem is
doing stuff to it, even if it's by proxy in the cache. I'll have to test
that, but I'm slightly concerned that the netfs could end up releasing its
cookie before the cache has finished with its pages. On the other hand, with
the new asynchronous stuff I've done, I'm not sure this'll be an actual
problem.

> > static int nfs_launder_page(struct page *page)
> > {
> > + nfs_fscache_invalidate_page(page, page->mapping->host, 0);
>
> Why? The function of launder_page() is to clean the page, not to
> truncate it.

Okay.

> > @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> > if (data_stable) {
> > inode->i_size = new_isize;
> > invalid |= NFS_INO_INVALID_DATA;
> > + nfs_fscache_attr_changed(inode);
>
> Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a
> spinlocked context here.

Hmmm... How about I move the call to fscache_attr_changed() to the callers of
nfs_update_inode(), to just after the spinlock is unlocked. The operation is
going to be asynchronous, so the delay shouldn't matter.

> I'm not too happy about the change to the binary mount interface. As far
> as I'm concerned, the binary interface should really be considered
> frozen, and should not be touched.

I hadn't come across that. I'll have a look.

> Instead, feel free to update the text-based mount interface (which can
> be found in 2.6.23-rc1 and later). Please put any such mount interface
> changes in a separate patch together with the Kconfig changes.

If you wish, but doesn't that violate the rules of patch division laid down by
Linus and Andrew?

David

2007-08-09 19:08:45

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

James Morris <[email protected]> wrote:

> David, I've looked at the code and can't see that you need to access the
> label itself outside the LSM. Could you instead simply pass the inode
> pointer around?

It's not quite that simple. I need to impose *two* security labels in
cachefiles_begin_secure() when I'm about to act on behalf of a process that's
tried to access a netfs file:

(1) The security label to act as. This is the label attached to the
cachefilesd process when it starts the cache. This is obtained by
cachefiles_get_security_ID().

(2) The security label to create files as. This is the label attached to
root directory of the cache. This is obtained by
cachefiles_determine_cache_secid().

David

2007-08-09 19:18:29

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 14/14] NFS: Use local caching [try #2]


> > Instead, feel free to update the text-based mount interface (which can
> > be found in 2.6.23-rc1 and later).

I presume you're referring to nfs_mount_option_tokens[] and friends. Is there
a mount program that can drive this?

David

2007-08-09 19:27:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 14/14] NFS: Use local caching [try #2]

On Thu, 2007-08-09 at 19:52 +0100, David Howells wrote:
> Trond Myklebust <[email protected]> wrote:
>
> > Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
> > should really be moved into fscache.c.
>
> If you wish. It seems a shame since a lot of them have only one caller.

...however it also forces you to export a lot of stuff which is really
private to fscache.c (the atomics etc).

> Note that due to patch #2, PG_fscache causes releasepage() and
> invalidatepage() to be called in addition to PG_private.



> > > +
> > > + if (!nfs_fscache_release_page(page, gfp))
> > > + return 0;
> >
> > This looks _very_ dubious. Why shouldn't I be able to discard a page
> > just because fscache isn't done writing it out? I may have very good
> > reasons to do so.
>
> Hmmm... Looking at the truncate routines, I suppose this ought to be okay,
> provided the cache retains a reference on the page whilst it's writing it out
> (put_page() won't can the page until we release it).
>
> It also seems dubious, though, to release the page when the filesystem is
> doing stuff to it, even if it's by proxy in the cache. I'll have to test
> that, but I'm slightly concerned that the netfs could end up releasing its
> cookie before the cache has finished with its pages. On the other hand, with
> the new asynchronous stuff I've done, I'm not sure this'll be an actual
> problem.

Actually, as long as launder_page() and invalidate_page() are doing
their thing, then your current code might be OK. The important thing is
to ensure that invalidate_inode_pages2() and truncate_inode_pages() work
as expected.

> > > static int nfs_launder_page(struct page *page)
> > > {
> > > + nfs_fscache_invalidate_page(page, page->mapping->host, 0);
> >
> > Why? The function of launder_page() is to clean the page, not to
> > truncate it.
>
> Okay.

What you should be doing here is probably to make a call to
wait_on_page_fscache_write(page). That should suffice to ensure that the
page is clean w.r.t. fscache activity afaics.

> > > @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> > > if (data_stable) {
> > > inode->i_size = new_isize;
> > > invalid |= NFS_INO_INVALID_DATA;
> > > + nfs_fscache_attr_changed(inode);
> >
> > Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a
> > spinlocked context here.
>
> Hmmm... How about I move the call to fscache_attr_changed() to the callers of
> nfs_update_inode(), to just after the spinlock is unlocked. The operation is
> going to be asynchronous, so the delay shouldn't matter.

Right. You might also want to add a flag NFS_INO_INVALID_FSCACHE_ATTR or
something like that in order to trigger it.

> > I'm not too happy about the change to the binary mount interface. As far
> > as I'm concerned, the binary interface should really be considered
> > frozen, and should not be touched.
>
> I hadn't come across that. I'll have a look.
>
> > Instead, feel free to update the text-based mount interface (which can
> > be found in 2.6.23-rc1 and later). Please put any such mount interface
> > changes in a separate patch together with the Kconfig changes.
>
> If you wish, but doesn't that violate the rules of patch division laid down by
> Linus and Andrew?

Why? It should be possible to set this up so that the NFS+fscache code
compiles correctly without the mount patch.

Trond

2007-08-09 19:35:05

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

On Thu, 9 Aug 2007, David Howells wrote:

> James Morris <[email protected]> wrote:
>
> > David, I've looked at the code and can't see that you need to access the
> > label itself outside the LSM. Could you instead simply pass the inode
> > pointer around?
>
> It's not quite that simple. I need to impose *two* security labels in
> cachefiles_begin_secure() when I'm about to act on behalf of a process that's
> tried to access a netfs file:

Ah ok, we had a similar problem with NFS mount options.

While I'm concerned about encoding SELinux-optimized secid labels into
general kernel structures, moving to more generalized pointers introduces
lifecycle maintenance issues and complexity which is not needed in the
mainline kernel. i.e. it'll be unused infrastructure maintained by
upstream, and used only by out-of-tree modules.

So, given that the kernel has no stable API, I suggest accepting the u32
secid as you propose, and if someone wants to merge a module which also
uses these hooks, but is entirely unable to use u32 labels, then they can
also justify making the interface more generalized and provide the code
for it.



- James
--
James Morris
<[email protected]>

2007-08-09 20:34:49

by Casey Schaufler

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]


--- James Morris <[email protected]> wrote:

> On Thu, 9 Aug 2007, David Howells wrote:
>
> > James Morris <[email protected]> wrote:
> >
> > > David, I've looked at the code and can't see that you need to access the
> > > label itself outside the LSM. Could you instead simply pass the inode
> > > pointer around?
> >
> > It's not quite that simple. I need to impose *two* security labels in
> > cachefiles_begin_secure() when I'm about to act on behalf of a process
> that's
> > tried to access a netfs file:
>
> Ah ok, we had a similar problem with NFS mount options.
>
> While I'm concerned about encoding SELinux-optimized secid labels into
> general kernel structures, moving to more generalized pointers introduces
> lifecycle maintenance issues and complexity which is not needed in the
> mainline kernel. i.e. it'll be unused infrastructure maintained by
> upstream, and used only by out-of-tree modules.
>
> So, given that the kernel has no stable API, I suggest accepting the u32
> secid as you propose, and if someone wants to merge a module which also
> uses these hooks, but is entirely unable to use u32 labels, then they can
> also justify making the interface more generalized and provide the code
> for it.

Grumble. Yet another thing to undo in the near future. I still
hope to suggest what I would consider a viable alternative "soon".


Casey Schaufler
[email protected]

2007-08-10 09:22:51

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

Casey Schaufler <[email protected]> wrote:

> Grumble. Yet another thing to undo in the near future. I still
> hope to suggest what I would consider a viable alternative "soon".

Use a struct key with the overrides attached? The key can be generated by
SELinux or whatever module is there.

David

2007-08-10 14:05:22

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 14/14] NFS: Use local caching [try #2]

Trond Myklebust <[email protected]> wrote:

> > > Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
> > > should really be moved into fscache.c.
> >
> > If you wish. It seems a shame since a lot of them have only one caller.
>
> ...however it also forces you to export a lot of stuff which is really
> private to fscache.c (the atomics etc).

The atomics is actually a bad example. These are referred to directly by part
of the table in fs/nfs/sysctl.c. Is there a better way of exporting
statistics than through /proc/sys/ files?

David

2007-08-10 16:17:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 14/14] NFS: Use local caching [try #2]

On Fri, 2007-08-10 at 15:04 +0100, David Howells wrote:
> Trond Myklebust <[email protected]> wrote:
>
> > > > Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
> > > > should really be moved into fscache.c.
> > >
> > > If you wish. It seems a shame since a lot of them have only one caller.
> >
> > ...however it also forces you to export a lot of stuff which is really
> > private to fscache.c (the atomics etc).
>
> The atomics is actually a bad example. These are referred to directly by part
> of the table in fs/nfs/sysctl.c. Is there a better way of exporting
> statistics than through /proc/sys/ files?

/proc/self/mountstats

Trond