2005-11-14 21:58:11

by David Howells

[permalink] [raw]
Subject: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

This series of patches does four things:

(1) Adds a generic intermediary (FS-Cache) by which filesystems may call on
local caching capabilities, and by which local caching backends may make
caches available:

+---------+
| | +-----------+
| NFS |--+ | |
| | | +-->| CacheFS |
+---------+ | +----------+ | | /dev/hda5 |
| | | | +-----------+
+---------+ +-->| | |
| | | |--+ +-------------+
| AFS |----->| FS-Cache | | |
| | | |----->| Cache Files |
+---------+ +-->| | | /var/cache |
| | |--+ +-------------+
+---------+ | +----------+ |
| | | | +-------------+
| ISOFS |--+ | | |
| | +-->| ReiserCache |
+---------+ | / |
+-------------+

(2) Adds a quasi-filesystem (CacheFS) that can turn block devices into a
local caches.

(3) Modifies the kAFS network filesystem to be able to read through this
cache.

(4) Documents the netfs interface and the cache backend interface.

Other backends may be added to the system, and other netfs's may be modified
to use caching.

There are a number of reasons why I'm not using i_mapping to do this. These
have been discussed a lot on the LKML and CacheFS mailing lists, but to
summarise the basics:

(1) Most filesystems don't do hole reportage. Holes in files are treated as
blocks of zeros and can't be distinguished otherwise, making it difficult
to distinguish blocks that have been read from the network and cached from
those that haven't.

(2) The backing inode must be fully populated before being exposed
to userspace through the main inode because the VM/VFS goes directly to
the backing inode and does not interrogate the front inode on VM ops.

Therefore:

(a) The backing inode must fit entirely within the cache.

(b) All backed files currently open must fit entirely within the cache at
the same time.

(c) A working set of files in total larger than the cache may not be
cached.

(d) A file may not grow larger than the available space in the cache.

(e) A file that's open and cached, and remotely grows larger than the
cache is potentially stuffed.

(3) Writes go to the backing filesystem, and can only be transferred to the
network when the file is closed.

(4) There's no record of what changes have been made, so the whole file must
be written back.

(5) The pages belong to the backing filesystem, and all metadata associated
with that page are relevant only to the backing filesystem, and not
anything stacked atop it.

David


2005-11-14 21:54:48

by David Howells

[permalink] [raw]
Subject: [PATCH 1/12] FS-Cache: Handle -Wsign-compare in i386 bitops

The attached patch makes i386's find_first_bit() use an unsigned integer as a
counter to avoid getting warnings when -Wsign-compare is given.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 asm-i386-bitops-2614mm2.diff
include/asm-i386/bitops.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff -uNrp linux-2.6.14-mm2/include/asm-i386/bitops.h linux-2.6.14-mm2-cachefs/include/asm-i386/bitops.h
--- linux-2.6.14-mm2/include/asm-i386/bitops.h 2005-08-30 13:56:33.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/include/asm-i386/bitops.h 2005-11-14 16:23:38.000000000 +0000
@@ -332,9 +332,9 @@ static inline unsigned long __ffs(unsign
* Returns the bit-number of the first set bit, not the number of the byte
* containing a bit.
*/
-static inline int find_first_bit(const unsigned long *addr, unsigned size)
+static inline unsigned find_first_bit(const unsigned long *addr, unsigned size)
{
- int x = 0;
+ unsigned x = 0;

while (x < size) {
unsigned long val = *addr++;

2005-11-14 21:55:20

by David Howells

[permalink] [raw]
Subject: [PATCH 3/12] FS-Cache: Add list_for_each_entry_safe_reverse()

The attached patch adds list_for_each_entry_safe_reverse() to linux/list.h

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 list-foreach-saferev-2614mm2.diff
include/linux/list.h | 14 ++++++++++++++
1 files changed, 14 insertions(+)

diff -uNrp linux-2.6.14-mm2/include/linux/list.h linux-2.6.14-mm2-cachefs/include/linux/list.h
--- linux-2.6.14-mm2/include/linux/list.h 2005-11-14 16:17:58.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/include/linux/list.h 2005-11-14 16:23:38.000000000 +0000
@@ -450,6 +450,20 @@ static inline void list_splice_init(stru
pos = n, n = list_entry(n->member.next, typeof(*n), member))

/**
+ * list_for_each_entry_safe_reverse - iterate backwards over list of given type safe against
+ * removal of list entry
+ * @pos: the type * to use as a loop counter.
+ * @n: another type * to use as temporary storage
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_safe_reverse(pos, n, head, member) \
+ for (pos = list_entry((head)->prev, typeof(*pos), member), \
+ n = list_entry(pos->member.prev, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = n, n = list_entry(n->member.prev, typeof(*n), member))
+
+/**
* list_for_each_rcu - iterate over an rcu-protected list
* @pos: the &struct list_head to use as a loop counter.
* @head: the head for your list.

2005-11-14 21:55:39

by David Howells

[permalink] [raw]
Subject: [PATCH 7/12] FS-Cache: Export a couple of VM functions

The attached patch exports a couple of VM functions needed by CacheFS.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 exports-2614mm2.diff
mm/page-writeback.c | 1 +
mm/swap.c | 2 ++
2 files changed, 3 insertions(+)

diff -uNrp linux-2.6.14-mm2/mm/page-writeback.c linux-2.6.14-mm2-cachefs/mm/page-writeback.c
--- linux-2.6.14-mm2/mm/page-writeback.c 2005-11-14 16:18:00.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/mm/page-writeback.c 2005-11-14 16:23:46.000000000 +0000
@@ -750,6 +750,7 @@ int clear_page_dirty_for_io(struct page
}
return TestClearPageDirty(page);
}
+EXPORT_SYMBOL_GPL(clear_page_dirty_for_io);

int test_clear_page_writeback(struct page *page)
{
diff -uNrp linux-2.6.14-mm2/mm/swap.c linux-2.6.14-mm2-cachefs/mm/swap.c
--- linux-2.6.14-mm2/mm/swap.c 2005-11-14 16:18:00.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/mm/swap.c 2005-11-14 16:23:46.000000000 +0000
@@ -149,6 +149,8 @@ void fastcall lru_cache_add(struct page
put_cpu_var(lru_add_pvecs);
}

+EXPORT_SYMBOL_GPL(lru_cache_add);
+
void fastcall lru_cache_add_active(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_active_pvecs);

2005-11-14 21:56:45

by David Howells

[permalink] [raw]
Subject: [PATCH 10/12] FS-Cache: Make kAFS use FS-Cache

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 fscache-afs-2614mm2.diff
fs/Kconfig | 7 +
fs/afs/cache.h | 27 -----
fs/afs/cell.c | 109 +++++++++++++----------
fs/afs/cell.h | 16 ---
fs/afs/cmservice.c | 2
fs/afs/dir.c | 15 +--
fs/afs/file.c | 224 ++++++++++++++++++++++++++++++++---------------
fs/afs/fsclient.c | 4
fs/afs/inode.c | 43 ++++++---
fs/afs/internal.h | 24 +----
fs/afs/main.c | 24 ++---
fs/afs/mntpt.c | 12 +-
fs/afs/proc.c | 1
fs/afs/server.c | 3
fs/afs/vlocation.c | 185 ++++++++++++++++++++++++---------------
fs/afs/vnode.c | 249 +++++++++++++++++++++++++++++++++++++++++++----------
fs/afs/vnode.h | 10 +-
fs/afs/volume.c | 78 ++++++----------
fs/afs/volume.h | 28 +----
19 files changed, 655 insertions(+), 406 deletions(-)

diff -uNrp linux-2.6.14-mm2/fs/Kconfig linux-2.6.14-mm2-cachefs/fs/Kconfig
--- linux-2.6.14-mm2/fs/Kconfig 2005-11-14 16:17:54.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/Kconfig 2005-11-14 16:23:38.000000000 +0000
@@ -1880,6 +1880,13 @@ config AFS_FS

If unsure, say N.

+config AFS_FSCACHE
+ bool "Provide AFS client caching support"
+ depends on AFS_FS && FSCACHE && EXPERIMENTAL
+ help
+ Say Y here if you want AFS data to be cached locally on through the
+ generic filesystem cache manager
+
config RXRPC
tristate

diff -uNrp linux-2.6.14-mm2/fs/afs/cache.h linux-2.6.14-mm2-cachefs/fs/afs/cache.h
--- linux-2.6.14-mm2/fs/afs/cache.h 2004-06-18 13:41:16.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/cache.h 1970-01-01 01:00:00.000000000 +0100
@@ -1,27 +0,0 @@
-/* cache.h: AFS local cache management interface
- *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells ([email protected])
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#ifndef _LINUX_AFS_CACHE_H
-#define _LINUX_AFS_CACHE_H
-
-#undef AFS_CACHING_SUPPORT
-
-#include <linux/mm.h>
-#ifdef AFS_CACHING_SUPPORT
-#include <linux/cachefs.h>
-#endif
-#include "types.h"
-
-#ifdef __KERNEL__
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_AFS_CACHE_H */
diff -uNrp linux-2.6.14-mm2/fs/afs/cell.c linux-2.6.14-mm2-cachefs/fs/afs/cell.c
--- linux-2.6.14-mm2/fs/afs/cell.c 2005-03-02 12:08:35.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/cell.c 2005-11-14 16:23:38.000000000 +0000
@@ -31,17 +31,21 @@ static DEFINE_RWLOCK(afs_cells_lock);
static DECLARE_RWSEM(afs_cells_sem); /* add/remove serialisation */
static struct afs_cell *afs_cell_root;

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
- const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
- .name = "cell_ix",
- .data_size = sizeof(struct afs_cache_cell),
- .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
- .match = afs_cell_cache_match,
- .update = afs_cell_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+
+static struct fscache_cookie_def afs_cell_cache_index_def = {
+ .name = "AFS cell",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = afs_cell_cache_get_key,
+ .get_aux = afs_cell_cache_get_aux,
+ .check_aux = afs_cell_cache_check_aux,
};
#endif

@@ -115,12 +119,11 @@ int afs_cell_create(const char *name, ch
if (ret < 0)
goto error;

-#ifdef AFS_CACHING_SUPPORT
- /* put it up for caching */
- cachefs_acquire_cookie(afs_cache_netfs.primary_index,
- &afs_vlocation_cache_index_def,
- cell,
- &cell->cache);
+#ifdef CONFIG_AFS_FSCACHE
+ /* put it up for caching (this never returns an error) */
+ cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
+ &afs_cell_cache_index_def,
+ cell);
#endif

/* add to the cell lists */
@@ -345,8 +348,8 @@ static void afs_cell_destroy(struct afs_
list_del_init(&cell->proc_link);
up_write(&afs_proc_cells_sem);

-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(cell->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(cell->cache, 0);
#endif

up_write(&afs_cells_sem);
@@ -526,44 +529,62 @@ void afs_cell_purge(void)

/*****************************************************************************/
/*
- * match a cell record obtained from the cache
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_cell *ccell = entry;
- struct afs_cell *cell = target;
+ const struct afs_cell *cell = cookie_netfs_data;
+ uint16_t klen;

- _enter("{%s},{%s}", ccell->name, cell->name);
+ _enter("%p,%p,%u", cell, buffer, bufmax);

- if (strncmp(ccell->name, cell->name, sizeof(ccell->name)) == 0) {
- _leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
- }
+ klen = strlen(cell->name);
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, cell->name, klen);
+ return klen;

- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
-} /* end afs_cell_cache_match() */
+} /* end afs_cell_cache_get_key() */
#endif

/*****************************************************************************/
/*
- * update a cell record in the cache
+ * provide new auxilliary cache data
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_cell_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- struct afs_cache_cell *ccell = entry;
- struct afs_cell *cell = source;
+ const struct afs_cell *cell = cookie_netfs_data;
+ uint16_t dlen;

- _enter("%p,%p", source, entry);
+ _enter("%p,%p,%u", cell, buffer, bufmax);

- strncpy(ccell->name, cell->name, sizeof(ccell->name));
+ dlen = cell->vl_naddrs * sizeof(cell->vl_addrs[0]);
+ dlen = min(dlen, bufmax);
+ dlen &= ~(sizeof(cell->vl_addrs[0]) - 1);

- memcpy(ccell->vl_servers,
- cell->vl_addrs,
- min(sizeof(ccell->vl_servers), sizeof(cell->vl_addrs)));
+ memcpy(buffer, cell->vl_addrs, dlen);
+
+ return dlen;
+
+} /* end afs_cell_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen)
+{
+ _leave(" = OKAY");
+ return FSCACHE_CHECKAUX_OKAY;

-} /* end afs_cell_cache_update() */
+} /* end afs_cell_cache_check_aux() */
#endif
diff -uNrp linux-2.6.14-mm2/fs/afs/cell.h linux-2.6.14-mm2-cachefs/fs/afs/cell.h
--- linux-2.6.14-mm2/fs/afs/cell.h 2004-06-18 13:41:16.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/cell.h 2005-11-14 16:23:38.000000000 +0000
@@ -13,7 +13,7 @@
#define _LINUX_AFS_CELL_H

#include "types.h"
-#include "cache.h"
+#include <linux/fscache.h>

#define AFS_CELL_MAX_ADDRS 15

@@ -21,16 +21,6 @@ extern volatile int afs_cells_being_purg

/*****************************************************************************/
/*
- * entry in the cached cell catalogue
- */
-struct afs_cache_cell
-{
- char name[64]; /* cell name (padded with NULs) */
- struct in_addr vl_servers[15]; /* cached cell VL servers */
-};
-
-/*****************************************************************************/
-/*
* AFS cell record
*/
struct afs_cell
@@ -39,8 +29,8 @@ struct afs_cell
struct list_head link; /* main cell list link */
struct list_head proc_link; /* /proc cell list link */
struct proc_dir_entry *proc_dir; /* /proc dir for this cell */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif

/* server record management */
diff -uNrp linux-2.6.14-mm2/fs/afs/cmservice.c linux-2.6.14-mm2-cachefs/fs/afs/cmservice.c
--- linux-2.6.14-mm2/fs/afs/cmservice.c 2005-03-02 12:08:35.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/cmservice.c 2005-11-14 16:23:38.000000000 +0000
@@ -24,7 +24,7 @@
#include "internal.h"

static unsigned afscm_usage; /* AFS cache manager usage count */
-static struct rw_semaphore afscm_sem; /* AFS cache manager start/stop semaphore */
+static DECLARE_RWSEM(afscm_sem); /* AFS cache manager start/stop semaphore */

static int afscm_new_call(struct rxrpc_call *call);
static void afscm_attention(struct rxrpc_call *call);
diff -uNrp linux-2.6.14-mm2/fs/afs/dir.c linux-2.6.14-mm2-cachefs/fs/afs/dir.c
--- linux-2.6.14-mm2/fs/afs/dir.c 2004-10-19 10:42:07.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/dir.c 2005-11-14 16:23:38.000000000 +0000
@@ -145,7 +145,7 @@ static inline void afs_dir_check_page(st
qty /= sizeof(union afs_dir_block);

/* check them */
- dbuf = page_address(page);
+ dbuf = kmap_atomic(page, KM_USER0);
for (tmp = 0; tmp < qty; tmp++) {
if (dbuf->blocks[tmp].pagehdr.magic != AFS_DIR_MAGIC) {
printk("kAFS: %s(%lu): bad magic %d/%d is %04hx\n",
@@ -154,12 +154,12 @@ static inline void afs_dir_check_page(st
goto error;
}
}
+ kunmap_atomic(dbuf, KM_USER0);

- SetPageChecked(page);
return;

error:
- SetPageChecked(page);
+ kunmap_atomic(dbuf, KM_USER0);
SetPageError(page);

} /* end afs_dir_check_page() */
@@ -170,7 +170,6 @@ static inline void afs_dir_check_page(st
*/
static inline void afs_dir_put_page(struct page *page)
{
- kunmap(page);
page_cache_release(page);

} /* end afs_dir_put_page() */
@@ -190,11 +189,9 @@ static struct page *afs_dir_get_page(str
NULL);
if (!IS_ERR(page)) {
wait_on_page_locked(page);
- kmap(page);
if (!PageUptodate(page))
goto fail;
- if (!PageChecked(page))
- afs_dir_check_page(dir, page);
+ afs_dir_check_page(dir, page);
if (PageError(page))
goto fail;
}
@@ -359,7 +356,7 @@ static int afs_dir_iterate(struct inode

limit = blkoff & ~(PAGE_SIZE - 1);

- dbuf = page_address(page);
+ dbuf = kmap_atomic(page, KM_USER0);

/* deal with the individual blocks stashed on this page */
do {
@@ -368,6 +365,7 @@ static int afs_dir_iterate(struct inode
ret = afs_dir_iterate_block(fpos, dblock, blkoff,
cookie, filldir);
if (ret != 1) {
+ kunmap_atomic(dbuf, KM_USER0);
afs_dir_put_page(page);
goto out;
}
@@ -376,6 +374,7 @@ static int afs_dir_iterate(struct inode

} while (*fpos < dir->i_size && blkoff < limit);

+ kunmap_atomic(dbuf, KM_USER0);
afs_dir_put_page(page);
ret = 0;
}
diff -uNrp linux-2.6.14-mm2/fs/afs/file.c linux-2.6.14-mm2-cachefs/fs/afs/file.c
--- linux-2.6.14-mm2/fs/afs/file.c 2005-11-14 16:17:53.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/file.c 2005-11-14 16:41:27.000000000 +0000
@@ -16,12 +16,15 @@
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/pagevec.h>
#include <linux/buffer_head.h>
#include "volume.h"
#include "vnode.h"
#include <rxrpc/call.h>
#include "internal.h"

+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+
#if 0
static int afs_file_open(struct inode *inode, struct file *file);
static int afs_file_release(struct inode *inode, struct file *file);
@@ -30,30 +33,68 @@ static int afs_file_release(struct inode
static int afs_file_readpage(struct file *file, struct page *page);
static int afs_file_invalidatepage(struct page *page, unsigned long offset);
static int afs_file_releasepage(struct page *page, gfp_t gfp_flags);
+static int afs_file_mmap(struct file * file, struct vm_area_struct * vma);
+
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages);
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+#endif

struct inode_operations afs_file_inode_operations = {
.getattr = afs_inode_getattr,
};

+struct file_operations afs_file_file_operations = {
+ .read = generic_file_read,
+ .mmap = afs_file_mmap,
+};
+
struct address_space_operations afs_fs_aops = {
.readpage = afs_file_readpage,
+#ifdef CONFIG_AFS_FSCACHE
+ .readpages = afs_file_readpages,
+#endif
.sync_page = block_sync_page,
.set_page_dirty = __set_page_dirty_nobuffers,
.releasepage = afs_file_releasepage,
.invalidatepage = afs_file_invalidatepage,
};

+static struct vm_operations_struct afs_fs_vm_operations = {
+ .nopage = filemap_nopage,
+ .populate = filemap_populate,
+#ifdef CONFIG_AFS_FSCACHE
+ .page_mkwrite = afs_file_page_mkwrite,
+#endif
+};
+
+/*****************************************************************************/
+/*
+ * set up a memory mapping on an AFS file
+ * - we set our own VMA ops so that we can catch the page becoming writable for
+ * userspace for shared-writable mmap
+ */
+static int afs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ _enter("");
+
+ file_accessed(file);
+ vma->vm_ops = &afs_fs_vm_operations;
+ return 0;
+
+} /* end afs_file_mmap() */
+
/*****************************************************************************/
/*
* deal with notification that a page was read from the cache
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_read_complete(void *cookie_data,
- struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_read_complete(struct page *page,
void *data,
int error)
{
- _enter("%p,%p,%p,%d", cookie_data, page, data, error);
+ _enter("%p,%p,%d", page, data, error);

if (error)
SetPageError(page);
@@ -68,15 +109,16 @@ static void afs_file_readpage_read_compl
/*
* deal with notification that a page was written to the cache
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_write_complete(void *cookie_data,
- struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_write_complete(struct page *page,
void *data,
int error)
{
- _enter("%p,%p,%p,%d", cookie_data, page, data, error);
+ _enter("%p,%p,%d", page, data, error);

- unlock_page(page);
+ /* note that the page has been written to the cache and can now be
+ * modified */
+ end_page_fs_misc(page);

} /* end afs_file_readpage_write_complete() */
#endif
@@ -88,16 +130,13 @@ static void afs_file_readpage_write_comp
static int afs_file_readpage(struct file *file, struct page *page)
{
struct afs_rxfs_fetch_descriptor desc;
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_page *pageio;
-#endif
struct afs_vnode *vnode;
struct inode *inode;
int ret;

inode = page->mapping->host;

- _enter("{%lu},{%lu}", inode->i_ino, page->index);
+ _enter("{%lu},%p{%lu}", inode->i_ino, page, page->index);

vnode = AFS_FS_I(inode);

@@ -107,13 +146,9 @@ static int afs_file_readpage(struct file
if (vnode->flags & AFS_VNODE_DELETED)
goto error;

-#ifdef AFS_CACHING_SUPPORT
- ret = cachefs_page_get_private(page, &pageio, GFP_NOIO);
- if (ret < 0)
- goto error;
-
+#ifdef CONFIG_AFS_FSCACHE
/* is it cached? */
- ret = cachefs_read_or_alloc_page(vnode->cache,
+ ret = fscache_read_or_alloc_page(vnode->cache,
page,
afs_file_readpage_read_complete,
NULL,
@@ -123,18 +158,20 @@ static int afs_file_readpage(struct file
#endif

switch (ret) {
- /* read BIO submitted and wb-journal entry found */
- case 1:
- BUG(); // TODO - handle wb-journal match
-
/* read BIO submitted (page in cache) */
case 0:
break;

- /* no page available in cache */
- case -ENOBUFS:
+ /* page not yet cached */
case -ENODATA:
+ _debug("cache said ENODATA");
+ goto go_on;
+
+ /* page will not be cached */
+ case -ENOBUFS:
+ _debug("cache said ENOBUFS");
default:
+ go_on:
desc.fid = vnode->fid;
desc.offset = page->index << PAGE_CACHE_SHIFT;
desc.size = min((size_t) (inode->i_size - desc.offset),
@@ -148,34 +185,40 @@ static int afs_file_readpage(struct file
ret = afs_vnode_fetch_data(vnode, &desc);
kunmap(page);
if (ret < 0) {
- if (ret==-ENOENT) {
- _debug("got NOENT from server"
+ if (ret == -ENOENT) {
+ kdebug("got NOENT from server"
" - marking file deleted and stale");
vnode->flags |= AFS_VNODE_DELETED;
ret = -ESTALE;
}

-#ifdef AFS_CACHING_SUPPORT
- cachefs_uncache_page(vnode->cache, page);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_uncache_page(vnode->cache, page);
+ ClearPagePrivate(page);
#endif
goto error;
}

SetPageUptodate(page);

-#ifdef AFS_CACHING_SUPPORT
- if (cachefs_write_page(vnode->cache,
- page,
- afs_file_readpage_write_complete,
- NULL,
- GFP_KERNEL) != 0
- ) {
- cachefs_uncache_page(vnode->cache, page);
- unlock_page(page);
+ /* send the page to the cache */
+#ifdef CONFIG_AFS_FSCACHE
+ if (PagePrivate(page)) {
+ if (TestSetPageFsMisc(page))
+ BUG();
+ if (fscache_write_page(vnode->cache,
+ page,
+ afs_file_readpage_write_complete,
+ NULL,
+ GFP_KERNEL) != 0
+ ) {
+ fscache_uncache_page(vnode->cache, page);
+ ClearPagePrivate(page);
+ end_page_fs_misc(page);
+ }
}
-#else
- unlock_page(page);
#endif
+ unlock_page(page);
}

_leave(" = 0");
@@ -192,20 +235,63 @@ static int afs_file_readpage(struct file

/*****************************************************************************/
/*
- * get a page cookie for the specified page
+ * read a set of pages
*/
-#ifdef AFS_CACHING_SUPPORT
-int afs_cache_get_page_cookie(struct page *page,
- struct cachefs_page **_page_cookie)
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages)
{
- int ret;
+ struct afs_vnode *vnode;
+#if 0
+ struct pagevec lru_pvec;
+ unsigned page_idx;
+#endif
+ int ret = 0;

- _enter("");
- ret = cachefs_page_get_private(page,_page_cookie, GFP_NOIO);
+ _enter(",{%lu},,%d", mapping->host->i_ino, nr_pages);

- _leave(" = %d", ret);
+ vnode = AFS_FS_I(mapping->host);
+ if (vnode->flags & AFS_VNODE_DELETED) {
+ _leave(" = -ESTALE");
+ return -ESTALE;
+ }
+
+ /* attempt to read as many of the pages as possible */
+ ret = fscache_read_or_alloc_pages(vnode->cache,
+ mapping,
+ pages,
+ &nr_pages,
+ afs_file_readpage_read_complete,
+ NULL,
+ mapping_gfp_mask(mapping));
+
+ switch (ret) {
+ /* all pages are being read from the cache */
+ case 0:
+ BUG_ON(!list_empty(pages));
+ BUG_ON(nr_pages != 0);
+ _leave(" = 0 [reading all]");
+ return 0;
+
+ /* there were pages that couldn't be read from the cache */
+ case -ENODATA:
+ case -ENOBUFS:
+ break;
+
+ /* other error */
+ default:
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ /* load the missing pages from the network */
+ ret = read_cache_pages(mapping, pages,
+ (void *) afs_file_readpage, NULL);
+
+ _leave(" = %d [netting]", ret);
return ret;
-} /* end afs_cache_get_page_cookie() */
+
+} /* end afs_file_readpages() */
#endif

/*****************************************************************************/
@@ -221,19 +307,12 @@ static int afs_file_invalidatepage(struc
BUG_ON(!PageLocked(page));

if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- cachefs_uncache_page(vnode->cache,page);
-#endif
-
/* We release buffers only if the entire page is being
* invalidated.
* The get_block cached value has been unconditionally
* invalidated, so real IO is not possible anymore.
*/
if (offset == 0) {
- BUG_ON(!PageLocked(page));
-
ret = 0;
if (!PageWriteback(page))
ret = page->mapping->a_ops->releasepage(page,
@@ -243,6 +322,7 @@ static int afs_file_invalidatepage(struc

_leave(" = %d", ret);
return ret;
+
} /* end afs_file_invalidatepage() */

/*****************************************************************************/
@@ -251,23 +331,29 @@ static int afs_file_invalidatepage(struc
*/
static int afs_file_releasepage(struct page *page, gfp_t gfp_flags)
{
- struct cachefs_page *pageio;
-
_enter("{%lu},%x", page->index, gfp_flags);

- if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- cachefs_uncache_page(vnode->cache, page);
+#ifdef CONFIG_AFS_FSCACHE
+ wait_on_page_fs_misc(page);
+ fscache_uncache_page(AFS_FS_I(page->mapping->host)->cache, page);
+ ClearPagePrivate(page);
#endif

- pageio = (struct cachefs_page *) page_private(page);
- set_page_private(page, 0);
- ClearPagePrivate(page);
-
- kfree(pageio);
- }
-
_leave(" = 0");
return 0;
+
} /* end afs_file_releasepage() */
+
+/*****************************************************************************/
+/*
+ * wait for the disc cache to finish writing before permitting modification of
+ * our page in the page cache
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+ wait_on_page_fs_misc(page);
+ return 0;
+
+} /* end afs_file_page_mkwrite() */
+#endif
diff -uNrp linux-2.6.14-mm2/fs/afs/fsclient.c linux-2.6.14-mm2-cachefs/fs/afs/fsclient.c
--- linux-2.6.14-mm2/fs/afs/fsclient.c 2004-10-19 10:42:07.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/fsclient.c 2005-11-14 16:23:38.000000000 +0000
@@ -398,6 +398,8 @@ int afs_rxfs_fetch_file_status(struct af
bp++; /* spare6 */
}

+ _debug("Data Version %llx\n", vnode->status.version);
+
/* success */
ret = 0;

@@ -408,7 +410,7 @@ int afs_rxfs_fetch_file_status(struct af
out_put_conn:
afs_server_release_callslot(server, &callslot);
out:
- _leave("");
+ _leave(" = %d", ret);
return ret;

abort:
diff -uNrp linux-2.6.14-mm2/fs/afs/inode.c linux-2.6.14-mm2-cachefs/fs/afs/inode.c
--- linux-2.6.14-mm2/fs/afs/inode.c 2005-11-14 16:17:53.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/inode.c 2005-11-14 16:23:38.000000000 +0000
@@ -65,6 +65,11 @@ static int afs_inode_map_status(struct a
return -EBADMSG;
}

+#ifdef CONFIG_AFS_FSCACHE
+ if (vnode->status.size != inode->i_size)
+ fscache_set_i_size(vnode->cache, vnode->status.size);
+#endif
+
inode->i_nlink = vnode->status.nlink;
inode->i_uid = vnode->status.owner;
inode->i_gid = 0;
@@ -101,13 +106,33 @@ static int afs_inode_fetch_status(struct
struct afs_vnode *vnode;
int ret;

+ _enter("");
+
vnode = AFS_FS_I(inode);

ret = afs_vnode_fetch_status(vnode);

- if (ret == 0)
+ if (ret == 0) {
+#ifdef CONFIG_AFS_FSCACHE
+ if (vnode->cache == FSCACHE_NEGATIVE_COOKIE) {
+ vnode->cache =
+ fscache_acquire_cookie(vnode->volume->cache,
+ &afs_vnode_cache_index_def,
+ vnode);
+ if (!vnode->cache)
+ printk("Negative\n");
+ }
+#endif
ret = afs_inode_map_status(vnode);
+#ifdef CONFIG_AFS_FSCACHE
+ if (ret < 0) {
+ fscache_relinquish_cookie(vnode->cache, 0);
+ vnode->cache = FSCACHE_NEGATIVE_COOKIE;
+ }
+#endif
+ }

+ _leave(" = %d", ret);
return ret;

} /* end afs_inode_fetch_status() */
@@ -122,6 +147,7 @@ static int afs_iget5_test(struct inode *

return inode->i_ino == data->fid.vnode &&
inode->i_version == data->fid.unique;
+
} /* end afs_iget5_test() */

/*****************************************************************************/
@@ -179,20 +205,11 @@ inline int afs_iget(struct super_block *
return ret;
}

-#ifdef AFS_CACHING_SUPPORT
- /* set up caching before reading the status, as fetch-status reads the
- * first page of symlinks to see if they're really mntpts */
- cachefs_acquire_cookie(vnode->volume->cache,
- NULL,
- vnode,
- &vnode->cache);
-#endif
-
/* okay... it's a new inode */
inode->i_flags |= S_NOATIME;
vnode->flags |= AFS_VNODE_CHANGED;
ret = afs_inode_fetch_status(inode);
- if (ret<0)
+ if (ret < 0)
goto bad_inode;

/* success */
@@ -278,8 +295,8 @@ void afs_clear_inode(struct inode *inode

afs_vnode_give_up_callback(vnode);

-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(vnode->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(vnode->cache, 0);
vnode->cache = NULL;
#endif

diff -uNrp linux-2.6.14-mm2/fs/afs/internal.h linux-2.6.14-mm2-cachefs/fs/afs/internal.h
--- linux-2.6.14-mm2/fs/afs/internal.h 2005-11-14 16:17:53.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/internal.h 2005-11-14 16:23:38.000000000 +0000
@@ -16,15 +16,17 @@
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/fscache.h>

/*
* debug tracing
*/
-#define kenter(FMT, a...) printk("==> %s("FMT")\n",__FUNCTION__ , ## a)
-#define kleave(FMT, a...) printk("<== %s()"FMT"\n",__FUNCTION__ , ## a)
-#define kdebug(FMT, a...) printk(FMT"\n" , ## a)
-#define kproto(FMT, a...) printk("### "FMT"\n" , ## a)
-#define knet(FMT, a...) printk(FMT"\n" , ## a)
+#define __kdbg(FMT, a...) printk("[%05d] "FMT"\n", current->pid , ## a)
+#define kenter(FMT, a...) __kdbg("==> %s("FMT")", __FUNCTION__ , ## a)
+#define kleave(FMT, a...) __kdbg("<== %s()"FMT, __FUNCTION__ , ## a)
+#define kdebug(FMT, a...) __kdbg(FMT , ## a)
+#define kproto(FMT, a...) __kdbg("### "FMT , ## a)
+#define knet(FMT, a...) __kdbg(FMT , ## a)

#ifdef __KDEBUG
#define _enter(FMT, a...) kenter(FMT , ## a)
@@ -56,9 +58,6 @@ static inline void afs_discard_my_signal
*/
extern struct rw_semaphore afs_proc_cells_sem;
extern struct list_head afs_proc_cells;
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_cache_cell_index_def;
-#endif

/*
* dir.c
@@ -72,11 +71,6 @@ extern struct file_operations afs_dir_fi
extern struct address_space_operations afs_fs_aops;
extern struct inode_operations afs_file_inode_operations;

-#ifdef AFS_CACHING_SUPPORT
-extern int afs_cache_get_page_cookie(struct page *page,
- struct cachefs_page **_page_cookie);
-#endif
-
/*
* inode.c
*/
@@ -97,8 +91,8 @@ extern void afs_key_unregister(void);
/*
* main.c
*/
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_netfs afs_cache_netfs;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_netfs afs_cache_netfs;
#endif

/*
diff -uNrp linux-2.6.14-mm2/fs/afs/main.c linux-2.6.14-mm2-cachefs/fs/afs/main.c
--- linux-2.6.14-mm2/fs/afs/main.c 2005-06-22 13:52:09.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/main.c 2005-11-14 16:23:38.000000000 +0000
@@ -1,6 +1,6 @@
/* main.c: AFS client file system
*
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2002,5 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
* This program is free software; you can redistribute it and/or
@@ -14,11 +14,11 @@
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/completion.h>
+#include <linux/fscache.h>
#include <rxrpc/rxrpc.h>
#include <rxrpc/transport.h>
#include <rxrpc/call.h>
#include <rxrpc/peer.h>
-#include "cache.h"
#include "cell.h"
#include "server.h"
#include "fsclient.h"
@@ -51,12 +51,11 @@ static struct rxrpc_peer_ops afs_peer_op
struct list_head afs_cb_hash_tbl[AFS_CB_HASH_COUNT];
DEFINE_SPINLOCK(afs_cb_hash_lock);

-#ifdef AFS_CACHING_SUPPORT
-static struct cachefs_netfs_operations afs_cache_ops = {
- .get_page_cookie = afs_cache_get_page_cookie,
+#ifdef CONFIG_AFS_FSCACHE
+static struct fscache_netfs_operations afs_cache_ops = {
};

-struct cachefs_netfs afs_cache_netfs = {
+struct fscache_netfs afs_cache_netfs = {
.name = "afs",
.version = 0,
.ops = &afs_cache_ops,
@@ -83,10 +82,9 @@ static int __init afs_init(void)
if (ret < 0)
return ret;

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
/* we want to be able to cache */
- ret = cachefs_register_netfs(&afs_cache_netfs,
- &afs_cache_cell_index_def);
+ ret = fscache_register_netfs(&afs_cache_netfs);
if (ret < 0)
goto error;
#endif
@@ -137,8 +135,8 @@ static int __init afs_init(void)
afs_key_unregister();
error_cache:
#endif
-#ifdef AFS_CACHING_SUPPORT
- cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_unregister_netfs(&afs_cache_netfs);
error:
#endif
afs_cell_purge();
@@ -167,8 +165,8 @@ static void __exit afs_exit(void)
#ifdef CONFIG_KEYS_TURNED_OFF
afs_key_unregister();
#endif
-#ifdef AFS_CACHING_SUPPORT
- cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_unregister_netfs(&afs_cache_netfs);
#endif
afs_proc_cleanup();

diff -uNrp linux-2.6.14-mm2/fs/afs/mntpt.c linux-2.6.14-mm2-cachefs/fs/afs/mntpt.c
--- linux-2.6.14-mm2/fs/afs/mntpt.c 2005-08-30 13:56:28.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/mntpt.c 2005-11-14 16:23:38.000000000 +0000
@@ -82,7 +82,7 @@ int afs_mntpt_check_symlink(struct afs_v

ret = -EIO;
wait_on_page_locked(page);
- buf = kmap(page);
+ buf = kmap_atomic(page, KM_USER0);
if (!PageUptodate(page))
goto out_free;
if (PageError(page))
@@ -105,7 +105,7 @@ int afs_mntpt_check_symlink(struct afs_v
ret = 0;

out_free:
- kunmap(page);
+ kunmap_atomic(buf, KM_USER0);
page_cache_release(page);
out:
_leave(" = %d", ret);
@@ -195,9 +195,9 @@ static struct vfsmount *afs_mntpt_do_aut
if (!PageUptodate(page) || PageError(page))
goto error;

- buf = kmap(page);
+ buf = kmap_atomic(page, KM_USER0);
memcpy(devname, buf, size);
- kunmap(page);
+ kunmap_atomic(buf, KM_USER0);
page_cache_release(page);
page = NULL;

@@ -276,12 +276,12 @@ static void *afs_mntpt_follow_link(struc
*/
static void afs_mntpt_expiry_timed_out(struct afs_timer *timer)
{
- kenter("");
+// kenter("");

mark_mounts_for_expiry(&afs_vfsmounts);

afs_kafstimod_add_timer(&afs_mntpt_expiry_timer,
afs_mntpt_expiry_timeout * HZ);

- kleave("");
+// kleave("");
} /* end afs_mntpt_expiry_timed_out() */
diff -uNrp linux-2.6.14-mm2/fs/afs/proc.c linux-2.6.14-mm2-cachefs/fs/afs/proc.c
--- linux-2.6.14-mm2/fs/afs/proc.c 2004-06-18 13:43:59.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/proc.c 2005-11-14 16:23:38.000000000 +0000
@@ -177,6 +177,7 @@ int afs_proc_init(void)
*/
void afs_proc_cleanup(void)
{
+ remove_proc_entry("rootcell", proc_afs);
remove_proc_entry("cells", proc_afs);

remove_proc_entry("fs/afs", NULL);
diff -uNrp linux-2.6.14-mm2/fs/afs/server.c linux-2.6.14-mm2-cachefs/fs/afs/server.c
--- linux-2.6.14-mm2/fs/afs/server.c 2005-03-02 12:08:35.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/server.c 2005-11-14 16:23:38.000000000 +0000
@@ -377,7 +377,6 @@ int afs_server_request_callslot(struct a
else if (list_empty(&server->fs_callq)) {
/* no one waiting */
server->fs_conn_cnt[nconn]++;
- spin_unlock(&server->fs_lock);
}
else {
/* someone's waiting - dequeue them and wake them up */
@@ -395,9 +394,9 @@ int afs_server_request_callslot(struct a
}
pcallslot->ready = 1;
wake_up_process(pcallslot->task);
- spin_unlock(&server->fs_lock);
}

+ spin_unlock(&server->fs_lock);
rxrpc_put_connection(callslot->conn);
callslot->conn = NULL;

diff -uNrp linux-2.6.14-mm2/fs/afs/vlocation.c linux-2.6.14-mm2-cachefs/fs/afs/vlocation.c
--- linux-2.6.14-mm2/fs/afs/vlocation.c 2005-03-02 12:08:35.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/vlocation.c 2005-11-14 16:23:38.000000000 +0000
@@ -59,17 +59,21 @@ static LIST_HEAD(afs_vlocation_update_pe
static struct afs_vlocation *afs_vlocation_update; /* VL currently being updated */
static DEFINE_SPINLOCK(afs_vlocation_update_lock); /* lock guarding update queue */

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
- const void *entry);
-static void afs_vlocation_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vlocation_cache_index_def = {
- .name = "vldb",
- .data_size = sizeof(struct afs_cache_vlocation),
- .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
- .match = afs_vlocation_cache_match,
- .update = afs_vlocation_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+
+static struct fscache_cookie_def afs_vlocation_cache_index_def = {
+ .name = "AFS.vldb",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = afs_vlocation_cache_get_key,
+ .get_aux = afs_vlocation_cache_get_aux,
+ .check_aux = afs_vlocation_cache_check_aux,
};
#endif

@@ -300,13 +304,12 @@ int afs_vlocation_lookup(struct afs_cell

list_add_tail(&vlocation->link, &cell->vl_list);

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
/* we want to store it in the cache, plus it might already be
* encached */
- cachefs_acquire_cookie(cell->cache,
- &afs_volume_cache_index_def,
- vlocation,
- &vlocation->cache);
+ vlocation->cache = fscache_acquire_cookie(cell->cache,
+ &afs_vlocation_cache_index_def,
+ vlocation);

if (vlocation->valid)
goto found_in_cache;
@@ -341,7 +344,7 @@ int afs_vlocation_lookup(struct afs_cell
active:
active = 1;

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
found_in_cache:
#endif
/* try to look up a cached volume in the cell VL databases by ID */
@@ -423,9 +426,9 @@ int afs_vlocation_lookup(struct afs_cell

afs_kafstimod_add_timer(&vlocation->upd_timer, 10 * HZ);

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
/* update volume entry in local cache */
- cachefs_update_cookie(vlocation->cache);
+ fscache_update_cookie(vlocation->cache);
#endif

*_vlocation = vlocation;
@@ -439,8 +442,8 @@ int afs_vlocation_lookup(struct afs_cell
}
else {
list_del(&vlocation->link);
-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(vlocation->cache, 0);
#endif
afs_put_cell(vlocation->cell);
kfree(vlocation);
@@ -538,8 +541,8 @@ void afs_vlocation_do_timeout(struct afs
}

/* we can now destroy it properly */
-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(vlocation->cache, 0);
#endif
afs_put_cell(cell);

@@ -890,65 +893,103 @@ static void afs_vlocation_update_discard

/*****************************************************************************/
/*
- * match a VLDB record stored in the cache
- * - may also load target from entry
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
- const void *entry)
-{
- const struct afs_cache_vlocation *vldb = entry;
- struct afs_vlocation *vlocation = target;
-
- _enter("{%s},{%s}", vlocation->vldb.name, vldb->name);
-
- if (strncmp(vlocation->vldb.name, vldb->name, sizeof(vldb->name)) == 0
- ) {
- if (!vlocation->valid ||
- vlocation->vldb.rtime == vldb->rtime
- ) {
- vlocation->vldb = *vldb;
- vlocation->valid = 1;
- _leave(" = SUCCESS [c->m]");
- return CACHEFS_MATCH_SUCCESS;
- }
- /* need to update cache if cached info differs */
- else if (memcmp(&vlocation->vldb, vldb, sizeof(*vldb)) != 0) {
- /* delete if VIDs for this name differ */
- if (memcmp(&vlocation->vldb.vid,
- &vldb->vid,
- sizeof(vldb->vid)) != 0) {
- _leave(" = DELETE");
- return CACHEFS_MATCH_SUCCESS_DELETE;
- }
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct afs_vlocation *vlocation = cookie_netfs_data;
+ uint16_t klen;

- _leave(" = UPDATE");
- return CACHEFS_MATCH_SUCCESS_UPDATE;
- }
- else {
- _leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
- }
- }
+ _enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
+
+ klen = strnlen(vlocation->vldb.name, sizeof(vlocation->vldb.name));
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, vlocation->vldb.name, klen);
+
+ _leave(" = %u", klen);
+ return klen;

- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
-} /* end afs_vlocation_cache_match() */
+} /* end afs_vlocation_cache_get_key() */
#endif

/*****************************************************************************/
/*
- * update a VLDB record stored in the cache
+ * provide new auxilliary cache data
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vlocation_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- struct afs_cache_vlocation *vldb = entry;
- struct afs_vlocation *vlocation = source;
+ const struct afs_vlocation *vlocation = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
+
+ dlen = sizeof(struct afs_cache_vlocation);
+ dlen -= offsetof(struct afs_cache_vlocation, nservers);
+ if (dlen > bufmax)
+ return 0;
+
+ memcpy(buffer, (uint8_t *)&vlocation->vldb.nservers, dlen);

- _enter("");
+ _leave(" = %u", dlen);
+ return dlen;
+
+} /* end afs_vlocation_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen)
+{
+ const struct afs_cache_vlocation *cvldb;
+ struct afs_vlocation *vlocation = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%s},%p,%u", vlocation->vldb.name, buffer, buflen);
+
+ /* check the size of the data is what we're expecting */
+ dlen = sizeof(struct afs_cache_vlocation);
+ dlen -= offsetof(struct afs_cache_vlocation, nservers);
+ if (dlen != buflen)
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ cvldb = container_of(buffer, struct afs_cache_vlocation, nservers);
+
+ /* if what's on disk is more valid than what's in memory, then use the
+ * VL record from the cache */
+ if (!vlocation->valid || vlocation->vldb.rtime == cvldb->rtime) {
+ memcpy((uint8_t *)&vlocation->vldb.nservers, buffer, dlen);
+ vlocation->valid = 1;
+ _leave(" = SUCCESS [c->m]");
+ return FSCACHE_CHECKAUX_OKAY;
+ }
+
+ /* need to update the cache if the cached info differs */
+ if (memcmp(&vlocation->vldb, buffer, dlen) != 0) {
+ /* delete if the volume IDs for this name differ */
+ if (memcmp(&vlocation->vldb.vid, &cvldb->vid,
+ sizeof(cvldb->vid)) != 0
+ ) {
+ _leave(" = OBSOLETE");
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ _leave(" = UPDATE");
+ return FSCACHE_CHECKAUX_NEEDS_UPDATE;
+ }

- *vldb = vlocation->vldb;
+ _leave(" = OKAY");
+ return FSCACHE_CHECKAUX_OKAY;

-} /* end afs_vlocation_cache_update() */
+} /* end afs_vlocation_cache_check_aux() */
#endif
diff -uNrp linux-2.6.14-mm2/fs/afs/vnode.c linux-2.6.14-mm2-cachefs/fs/afs/vnode.c
--- linux-2.6.14-mm2/fs/afs/vnode.c 2004-10-19 10:42:07.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/vnode.c 2005-11-14 16:23:38.000000000 +0000
@@ -29,17 +29,30 @@ struct afs_timer_ops afs_vnode_cb_timed_
.timed_out = afs_vnode_cb_timed_out,
};

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
- const void *entry);
-static void afs_vnode_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vnode_cache_index_def = {
- .name = "vnode",
- .data_size = sizeof(struct afs_cache_vnode),
- .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 4 },
- .match = afs_vnode_cache_match,
- .update = afs_vnode_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+ uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+struct fscache_cookie_def afs_vnode_cache_index_def = {
+ .name = "AFS.vnode",
+ .type = FSCACHE_COOKIE_TYPE_DATAFILE,
+ .get_key = afs_vnode_cache_get_key,
+ .get_attr = afs_vnode_cache_get_attr,
+ .get_aux = afs_vnode_cache_get_aux,
+ .check_aux = afs_vnode_cache_check_aux,
+ .mark_pages_cached = afs_vnode_cache_mark_pages_cached,
+ .now_uncached = afs_vnode_cache_now_uncached,
};
#endif

@@ -189,6 +202,8 @@ int afs_vnode_fetch_status(struct afs_vn

if (vnode->update_cnt > 0) {
/* someone else started a fetch */
+ _debug("conflict");
+
set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&vnode->update_waitq, &myself);

@@ -220,6 +235,7 @@ int afs_vnode_fetch_status(struct afs_vn
spin_unlock(&vnode->lock);
set_current_state(TASK_RUNNING);

+ _leave(" [conflicted, %d", !!(vnode->flags & AFS_VNODE_DELETED));
return vnode->flags & AFS_VNODE_DELETED ? -ENOENT : 0;
}

@@ -342,54 +358,197 @@ int afs_vnode_give_up_callback(struct af

/*****************************************************************************/
/*
- * match a vnode record stored in the cache
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_vnode *cvnode = entry;
- struct afs_vnode *vnode = target;
+ const struct afs_vnode *vnode = cookie_netfs_data;
+ uint16_t klen;

- _enter("{%x,%x,%Lx},{%x,%x,%Lx}",
- vnode->fid.vnode,
- vnode->fid.unique,
- vnode->status.version,
- cvnode->vnode_id,
- cvnode->vnode_unique,
- cvnode->data_version);
-
- if (vnode->fid.vnode != cvnode->vnode_id) {
- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
- }
+ _enter("{%x,%x,%Lx},%p,%u",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+ buffer, bufmax);
+
+ klen = sizeof(vnode->fid.vnode);
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, &vnode->fid.vnode, sizeof(vnode->fid.vnode));
+
+ _leave(" = %u", klen);
+ return klen;
+
+} /* end afs_vnode_cache_get_key() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide an updated file attributes
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+ uint64_t *size)
+{
+ const struct afs_vnode *vnode = cookie_netfs_data;
+
+ _enter("{%x,%x,%Lx},",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+ *size = i_size_read((struct inode *) &vnode->vfs_inode);
+
+} /* end afs_vnode_cache_get_attr() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide new auxilliary cache data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct afs_vnode *vnode = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%x,%x,%Lx},%p,%u",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+ buffer, bufmax);

- if (vnode->fid.unique != cvnode->vnode_unique ||
- vnode->status.version != cvnode->data_version) {
- _leave(" = DELETE");
- return CACHEFS_MATCH_SUCCESS_DELETE;
+ dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+ if (dlen > bufmax)
+ return 0;
+
+ memcpy(buffer, &vnode->fid.unique, sizeof(vnode->fid.unique));
+ buffer += sizeof(vnode->fid.unique);
+ memcpy(buffer, &vnode->status.version, sizeof(vnode->status.version));
+
+ _leave(" = %u", dlen);
+ return dlen;
+
+} /* end afs_vnode_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen)
+{
+ struct afs_vnode *vnode = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%x,%x,%Lx},%p,%u",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+ buffer, buflen);
+
+ /* check the size of the data is what we're expecting */
+ dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+ if (dlen != buflen) {
+ _leave(" = OBSOLETE [len %hx != %hx]", dlen, buflen);
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ if (memcmp(buffer,
+ &vnode->fid.unique,
+ sizeof(vnode->fid.unique)
+ ) != 0
+ ) {
+ unsigned unique;
+
+ memcpy(&unique, buffer, sizeof(unique));
+
+ _leave(" = OBSOLETE [uniq %x != %x]",
+ unique, vnode->fid.unique);
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ if (memcmp(buffer + sizeof(vnode->fid.unique),
+ &vnode->status.version,
+ sizeof(vnode->status.version)
+ ) != 0
+ ) {
+ afs_dataversion_t version;
+
+ memcpy(&version, buffer + sizeof(vnode->fid.unique),
+ sizeof(version));
+
+ _leave(" = OBSOLETE [vers %llx != %llx]",
+ version, vnode->status.version);
+ return FSCACHE_CHECKAUX_OBSOLETE;
}

_leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
-} /* end afs_vnode_cache_match() */
+ return FSCACHE_CHECKAUX_OKAY;
+
+} /* end afs_vnode_cache_check_aux() */
#endif

/*****************************************************************************/
/*
- * update a vnode record stored in the cache
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vnode_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec)
{
- struct afs_cache_vnode *cvnode = entry;
- struct afs_vnode *vnode = source;
+ unsigned long loop;

- _enter("");
+ for (loop = 0; loop < cached_pvec->nr; loop++) {
+ struct page *page = cached_pvec->pages[loop];

- cvnode->vnode_id = vnode->fid.vnode;
- cvnode->vnode_unique = vnode->fid.unique;
- cvnode->data_version = vnode->status.version;
+ _debug("- mark %p{%lx}", page, page->index);

-} /* end afs_vnode_cache_update() */
+ SetPagePrivate(page);
+ }
+
+} /* end afs_vnode_cache_mark_pages_cached() */
#endif
+
+/*****************************************************************************/
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ * is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data)
+{
+ struct afs_vnode *vnode = cookie_netfs_data;
+ struct pagevec pvec;
+ pgoff_t first;
+ int loop, nr_pages;
+
+ _enter("{%x,%x,%Lx}",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+ pagevec_init(&pvec, 0);
+ first = 0;
+
+ for (;;) {
+ /* grab a bunch of pages to clean */
+ nr_pages = find_get_pages(vnode->vfs_inode.i_mapping, first,
+ PAGEVEC_SIZE, pvec.pages);
+ if (!nr_pages)
+ break;
+
+ for (loop = 0; loop < nr_pages; loop++)
+ ClearPagePrivate(pvec.pages[loop]);
+
+ first = pvec.pages[nr_pages - 1]->index + 1;
+
+ pvec.nr = nr_pages;
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+
+ _leave("");
+
+} /* end afs_vnode_cache_now_uncached() */
diff -uNrp linux-2.6.14-mm2/fs/afs/vnode.h linux-2.6.14-mm2-cachefs/fs/afs/vnode.h
--- linux-2.6.14-mm2/fs/afs/vnode.h 2004-06-18 13:41:16.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/vnode.h 2005-11-14 16:23:38.000000000 +0000
@@ -13,9 +13,9 @@
#define _LINUX_AFS_VNODE_H

#include <linux/fs.h>
+#include <linux/fscache.h>
#include "server.h"
#include "kafstimod.h"
-#include "cache.h"

#ifdef __KERNEL__

@@ -32,8 +32,8 @@ struct afs_cache_vnode
afs_dataversion_t data_version; /* data version */
};

-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vnode_cache_index_def;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_cookie_def afs_vnode_cache_index_def;
#endif

/*****************************************************************************/
@@ -47,8 +47,8 @@ struct afs_vnode
struct afs_volume *volume; /* volume on which vnode resides */
struct afs_fid fid; /* the file identifier for this inode */
struct afs_file_status status; /* AFS status info for this file */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif

wait_queue_head_t update_waitq; /* status fetch waitqueue */
diff -uNrp linux-2.6.14-mm2/fs/afs/volume.c linux-2.6.14-mm2-cachefs/fs/afs/volume.c
--- linux-2.6.14-mm2/fs/afs/volume.c 2005-03-02 12:08:35.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/afs/volume.c 2005-11-14 16:23:38.000000000 +0000
@@ -15,10 +15,10 @@
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/fscache.h>
#include "volume.h"
#include "vnode.h"
#include "cell.h"
-#include "cache.h"
#include "cmservice.h"
#include "fsclient.h"
#include "vlclient.h"
@@ -28,18 +28,14 @@
static const char *afs_voltypes[] = { "R/W", "R/O", "BAK" };
#endif

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
- const void *entry);
-static void afs_volume_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_volume_cache_index_def = {
- .name = "volume",
- .data_size = sizeof(struct afs_cache_vhash),
- .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 1 },
- .keys[1] = { CACHEFS_INDEX_KEYS_BIN, 1 },
- .match = afs_volume_cache_match,
- .update = afs_volume_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+
+static struct fscache_cookie_def afs_volume_cache_index_def = {
+ .name = "AFS.volume",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = afs_volume_cache_get_key,
};
#endif

@@ -214,11 +210,10 @@ int afs_volume_lookup(const char *name,
}

/* attach the cache and volume location */
-#ifdef AFS_CACHING_SUPPORT
- cachefs_acquire_cookie(vlocation->cache,
- &afs_vnode_cache_index_def,
- volume,
- &volume->cache);
+#ifdef CONFIG_AFS_FSCACHE
+ volume->cache = fscache_acquire_cookie(vlocation->cache,
+ &afs_volume_cache_index_def,
+ volume);
#endif

afs_get_vlocation(vlocation);
@@ -286,8 +281,8 @@ void afs_put_volume(struct afs_volume *v
up_write(&vlocation->cell->vl_sem);

/* finish cleaning up the volume */
-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(volume->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(volume->cache, 0);
#endif
afs_put_vlocation(vlocation);

@@ -481,40 +476,25 @@ int afs_volume_release_fileserver(struct

/*****************************************************************************/
/*
- * match a volume hash record stored in the cache
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_vhash *vhash = entry;
- struct afs_volume *volume = target;
+ const struct afs_volume *volume = cookie_netfs_data;
+ uint16_t klen;

- _enter("{%u},{%u}", volume->type, vhash->vtype);
+ _enter("{%u},%p,%u", volume->type, buffer, bufmax);

- if (volume->type == vhash->vtype) {
- _leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
- }
-
- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
-} /* end afs_volume_cache_match() */
-#endif
-
-/*****************************************************************************/
-/*
- * update a volume hash record stored in the cache
- */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_volume_cache_update(void *source, void *entry)
-{
- struct afs_cache_vhash *vhash = entry;
- struct afs_volume *volume = source;
+ klen = sizeof(volume->type);
+ if (klen > bufmax)
+ return 0;

- _enter("");
+ memcpy(buffer, &volume->type, sizeof(volume->type));

- vhash->vtype = volume->type;
+ _leave(" = %u", klen);
+ return klen;

-} /* end afs_volume_cache_update() */
+} /* end afs_volume_cache_get_key() */
#endif
diff -uNrp linux-2.6.14-mm2/fs/afs/volume.h linux-2.6.14-mm2-cachefs/fs/afs/volume.h
--- linux-2.6.14-mm2/fs/afs/volume.h 2004-10-19 10:42:07.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/afs/volume.h 2005-11-14 16:23:38.000000000 +0000
@@ -12,11 +12,11 @@
#ifndef _LINUX_AFS_VOLUME_H
#define _LINUX_AFS_VOLUME_H

+#include <linux/fscache.h>
#include "types.h"
#include "fsclient.h"
#include "kafstimod.h"
#include "kafsasyncd.h"
-#include "cache.h"

#define __packed __attribute__((packed))

@@ -47,24 +47,6 @@ struct afs_cache_vlocation
time_t rtime; /* last retrieval time */
};

-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vlocation_cache_index_def;
-#endif
-
-/*****************************************************************************/
-/*
- * volume -> vnode hash table entry
- */
-struct afs_cache_vhash
-{
- afs_voltype_t vtype; /* which volume variation */
- uint8_t hash_bucket; /* which hash bucket this represents */
-} __attribute__((packed));
-
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_volume_cache_index_def;
-#endif
-
/*****************************************************************************/
/*
* AFS volume location record
@@ -75,8 +57,8 @@ struct afs_vlocation
struct list_head link; /* link in cell volume location list */
struct afs_timer timeout; /* decaching timer */
struct afs_cell *cell; /* cell to which volume belongs */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif
struct afs_cache_vlocation vldb; /* volume information DB record */
struct afs_volume *vols[3]; /* volume access record pointer (index by type) */
@@ -111,8 +93,8 @@ struct afs_volume
atomic_t usage;
struct afs_cell *cell; /* cell to which belongs (unrefd ptr) */
struct afs_vlocation *vlocation; /* volume location */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif
afs_volid_t vid; /* volume ID */
afs_voltype_t __packed type; /* type of volume */

2005-11-14 21:57:19

by David Howells

[permalink] [raw]
Subject: [PATCH 6/12] FS-Cache: Add a function to replace a page in the pagecache

The attached patch adds a function by which an existing page in the pagecache
may be traded for a new one at the same location without having to allocate or
free any radix tree nodes.

This permits CacheFS to write to start making a new version of a disk block in
memory for which the old version has not yet been written and journalled.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 replace-in-pagecache-2614mm2.diff
include/linux/pagemap.h | 3 +++
mm/filemap.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)

diff -uNrp linux-2.6.14-mm2/include/linux/pagemap.h linux-2.6.14-mm2-cachefs/include/linux/pagemap.h
--- linux-2.6.14-mm2/include/linux/pagemap.h 2005-11-14 16:17:59.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/include/linux/pagemap.h 2005-11-14 16:24:47.000000000 +0000
@@ -96,6 +96,9 @@ int add_to_page_cache(struct page *page,
unsigned long index, gfp_t gfp_mask);
int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
unsigned long index, gfp_t gfp_mask);
+extern struct page *replace_in_page_cache(struct page *page,
+ struct address_space *mapping,
+ pgoff_t offset);
extern void remove_from_page_cache(struct page *page);
extern void __remove_from_page_cache(struct page *page);

diff -uNrp linux-2.6.14-mm2/mm/filemap.c linux-2.6.14-mm2-cachefs/mm/filemap.c
--- linux-2.6.14-mm2/mm/filemap.c 2005-11-14 16:18:00.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/mm/filemap.c 2005-11-14 16:23:41.000000000 +0000
@@ -427,6 +427,39 @@ int add_to_page_cache_lru(struct page *p
EXPORT_SYMBOL(add_to_page_cache_lru);

/*
+ * This function replaces a page already in the page cache for a particular
+ * index with another, but only if there is already such a page in the page
+ * cache
+ */
+struct page *replace_in_page_cache(struct page *page,
+ struct address_space *mapping,
+ pgoff_t offset)
+{
+ struct page *old;
+ void **slot;
+
+ write_lock_irq(&mapping->tree_lock);
+
+ slot = radix_tree_lookup_slot(&mapping->page_tree, offset);
+ old = NULL;
+ if (slot) {
+ old = *slot;
+ *slot = page;
+ page_cache_get(page);
+ SetPageLocked(page);
+ page->mapping = mapping;
+ page->index = offset;
+ if (old)
+ old->mapping = NULL;
+ }
+
+ write_unlock_irq(&mapping->tree_lock);
+ return old;
+}
+
+EXPORT_SYMBOL(replace_in_page_cache);
+
+/*
* In order to wait for pages to become available there must be
* waitqueues associated with pages. By using a hash table of
* waitqueues where the bucket discipline is to maintain all

2005-11-14 21:56:07

by David Howells

[permalink] [raw]
Subject: [PATCH 8/12] FS-Cache: Add generic filesystem cache core module

The attached patch adds a generic core to which both networking filesystems and
caches may bind. It transfers requests from networking filesystems to
appropriate caches if possible, or else gracefully denies them.

It also:

(*) Adds a facility by which tags can be used to refer to caches, even if
they're not mounted yet.

(*) Keeps track of indexes.

(*) Permits caches to be added and removed dynamically.

(*) Permits network filesystems to annotate cache nodes that belong to them.

(*) Permits cache nodes to be pinned and reservations to be made.

If this facility is disabled in the kernel configuration, then all its
operations will be trivially reducible to nothing by the compiler.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 fscache-core-2614mm2.diff
fs/Kconfig | 13
fs/Makefile | 1
fs/fscache/Makefile | 13
fs/fscache/cookie.c | 1030 ++++++++++++++++++++++++++++++++++++++++++
fs/fscache/fscache-int.h | 71 ++
fs/fscache/fsdef.c | 113 ++++
fs/fscache/main.c | 112 ++++
fs/fscache/page.c | 521 +++++++++++++++++++++
include/linux/fscache-cache.h | 216 ++++++++
include/linux/fscache.h | 484 +++++++++++++++++++
10 files changed, 2574 insertions(+)

diff -uNrp linux-2.6.14-mm2/fs/Kconfig linux-2.6.14-mm2-cachefs/fs/Kconfig
--- linux-2.6.14-mm2/fs/Kconfig 2005-11-14 16:17:54.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/Kconfig 2005-11-14 16:23:38.000000000 +0000
@@ -511,6 +511,19 @@ config FUSE_FS
If you want to develop a userspace FS, or if you want to use
a filesystem based on FUSE, answer Y or M.

+menu "Caches"
+
+config FSCACHE
+ tristate "General filesystem cache manager"
+ depends on EXPERIMENTAL
+ help
+ This option enables a generic filesystem caching manager that can be
+ used by various network and other filesystems to cache data
+ locally. Different sorts of caches can be plugged in, depending on the
+ resources available.
+
+ See Documentation/filesystems/caching/fscache.txt for more information.
+
menu "CD-ROM/DVD Filesystems"

config ISO9660_FS
diff -uNrp linux-2.6.14-mm2/fs/Makefile linux-2.6.14-mm2-cachefs/fs/Makefile
--- linux-2.6.14-mm2/fs/Makefile 2005-11-14 16:17:54.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/fs/Makefile 2005-11-14 16:23:38.000000000 +0000
@@ -50,6 +50,7 @@ obj-y += devpts/
obj-$(CONFIG_PROFILING) += dcookies.o

# Do not add any filesystems before this line
+obj-$(CONFIG_FSCACHE) += fscache/
obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_REISER4_FS) += reiser4/
obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3
diff -uNrp linux-2.6.14-mm2/include/linux/fscache-cache.h linux-2.6.14-mm2-cachefs/include/linux/fscache-cache.h
--- linux-2.6.14-mm2/include/linux/fscache-cache.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/include/linux/fscache-cache.h 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,216 @@
+/* fscache-cache.h: general filesystem caching backing cache interface
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _LINUX_FSCACHE_CACHE_H
+#define _LINUX_FSCACHE_CACHE_H
+
+#include <linux/fscache.h>
+
+#define NR_MAXCACHES BITS_PER_LONG
+
+struct fscache_cache;
+struct fscache_cache_ops;
+struct fscache_object;
+
+/*
+ * cache tag definition
+ */
+struct fscache_cache_tag {
+ struct list_head link;
+ struct fscache_cache *cache; /* cache referred to by this tag */
+ atomic_t usage;
+ char name[0]; /* tag name */
+};
+
+/*
+ * cache definition
+ */
+struct fscache_cache {
+ struct fscache_cache_ops *ops;
+ struct fscache_cache_tag *tag; /* tag representing this cache */
+ struct list_head link; /* link in list of caches */
+ struct rw_semaphore withdrawal_sem; /* withdrawal control sem */
+ size_t max_index_size; /* maximum size of index data */
+ char identifier[32]; /* cache label */
+
+ /* node management */
+ struct list_head object_list; /* list of data/index objects */
+ spinlock_t object_list_lock;
+ struct fscache_object *fsdef; /* object for the fsdef index */
+};
+
+extern void fscache_init_cache(struct fscache_cache *cache,
+ struct fscache_cache_ops *ops,
+ const char *idfmt,
+ ...) __attribute__ ((format (printf,3,4)));
+
+extern int fscache_add_cache(struct fscache_cache *cache,
+ struct fscache_object *fsdef,
+ const char *tagname);
+extern void fscache_withdraw_cache(struct fscache_cache *cache);
+
+/*****************************************************************************/
+/*
+ * cache operations
+ */
+struct fscache_cache_ops {
+ /* name of cache provider */
+ const char *name;
+
+ /* look up the object for a cookie, creating it on disc if necessary */
+ struct fscache_object *(*lookup_object)(struct fscache_cache *cache,
+ struct fscache_object *parent,
+ struct fscache_cookie *cookie);
+
+ /* increment the usage count on this object (may fail if unmounting) */
+ struct fscache_object *(*grab_object)(struct fscache_object *object);
+
+ /* lock a semaphore on an object */
+ void (*lock_object)(struct fscache_object *object);
+
+ /* unlock a semaphore on an object */
+ void (*unlock_object)(struct fscache_object *object);
+
+ /* pin an object in the cache */
+ int (*pin_object)(struct fscache_object *object);
+
+ /* unpin an object in the cache */
+ void (*unpin_object)(struct fscache_object *object);
+
+ /* store the updated auxilliary data on an object */
+ void (*update_object)(struct fscache_object *object);
+
+ /* dispose of a reference to an object */
+ void (*put_object)(struct fscache_object *object);
+
+ /* sync a cache */
+ void (*sync_cache)(struct fscache_cache *cache);
+
+ /* set the data size of an object */
+ int (*set_i_size)(struct fscache_object *object, loff_t i_size);
+
+ /* reserve space for an object's data and associated metadata */
+ int (*reserve_space)(struct fscache_object *object, loff_t i_size);
+
+ /* request a backing block for a page be read or allocated in the
+ * cache */
+ int (*read_or_alloc_page)(struct fscache_object *object,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+ /* request backing blocks for a list of pages be read or allocated in
+ * the cache */
+ int (*read_or_alloc_pages)(struct fscache_object *object,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+ /* request a backing block for a page be allocated in the cache so that
+ * it can be written directly */
+ int (*allocate_page)(struct fscache_object *object,
+ struct page *page,
+ unsigned long gfp);
+
+ /* write a page to its backing block in the cache */
+ int (*write_page)(struct fscache_object *object,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+ /* write several pages to their backing blocks in the cache */
+ int (*write_pages)(struct fscache_object *object,
+ struct pagevec *pagevec,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+ /* detach backing block from a bunch of pages */
+ void (*uncache_pages)(struct fscache_object *object,
+ struct pagevec *pagevec);
+
+ /* dissociate a cache from all the pages it was backing */
+ void (*dissociate_pages)(struct fscache_cache *cache);
+};
+
+/*****************************************************************************/
+/*
+ * data file or index object cookie
+ * - a file will only appear in one cache
+ * - a request to cache a file may or may not be honoured, subject to
+ * constraints such as disc space
+ * - indexes files are created on disc just-in-time
+ */
+struct fscache_cookie {
+ atomic_t usage; /* number of users of this cookie */
+ atomic_t children; /* number of children of this cookie */
+ struct rw_semaphore sem; /* list creation vs scan lock */
+ struct hlist_head backing_objects; /* object(s) backing this file/index */
+ struct fscache_cookie_def *def; /* definition */
+ struct fscache_cookie *parent; /* parent of this entry */
+ struct fscache_netfs *netfs; /* owner network fs definition */
+ void *netfs_data; /* back pointer to netfs */
+};
+
+extern struct fscache_cookie fscache_fsdef_index;
+
+/*****************************************************************************/
+/*
+ * on-disc cache file or index handle
+ */
+struct fscache_object {
+ unsigned long flags;
+#define FSCACHE_OBJECT_RELEASING 0 /* T if object is being released */
+#define FSCACHE_OBJECT_RECYCLING 1 /* T if object is being retired */
+#define FSCACHE_OBJECT_WITHDRAWN 2 /* T if object has been withdrawn */
+
+ struct list_head cache_link; /* link in cache->object_list */
+ struct hlist_node cookie_link; /* link in cookie->backing_objects */
+ struct fscache_cache *cache; /* cache that supplied this object */
+ struct fscache_cookie *cookie; /* netfs's file/index object */
+};
+
+static inline
+void fscache_object_init(struct fscache_object *object)
+{
+ object->flags = 0;
+ INIT_LIST_HEAD(&object->cache_link);
+ INIT_HLIST_NODE(&object->cookie_link);
+ object->cache = NULL;
+ object->cookie = NULL;
+}
+
+/* find the parent index object for a object */
+static inline
+struct fscache_object *fscache_find_parent_object(struct fscache_object *object)
+{
+ struct fscache_object *parent;
+ struct fscache_cookie *cookie = object->cookie;
+ struct fscache_cache *cache = object->cache;
+ struct hlist_node *_p;
+
+ hlist_for_each_entry(parent, _p,
+ &cookie->parent->backing_objects,
+ cookie_link
+ ) {
+ if (parent->cache == cache)
+ return parent;
+ }
+
+ return NULL;
+}
+
+#endif /* _LINUX_FSCACHE_CACHE_H */
diff -uNrp linux-2.6.14-mm2/include/linux/fscache.h linux-2.6.14-mm2-cachefs/include/linux/fscache.h
--- linux-2.6.14-mm2/include/linux/fscache.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/include/linux/fscache.h 2005-11-14 16:40:44.000000000 +0000
@@ -0,0 +1,484 @@
+/* fscache.h: general filesystem caching interface
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _LINUX_FSCACHE_H
+#define _LINUX_FSCACHE_H
+
+#include <linux/config.h>
+#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+
+#ifdef CONFIG_FSCACHE_MODULE
+#define CONFIG_FSCACHE
+#endif
+
+struct pagevec;
+struct fscache_cache_tag;
+struct fscache_cookie;
+struct fscache_netfs;
+struct fscache_netfs_operations;
+
+#define FSCACHE_NEGATIVE_COOKIE ((struct fscache_cookie *) NULL)
+
+typedef void (*fscache_rw_complete_t)(struct page *page,
+ void *data,
+ int error);
+
+/* result of index entry consultation */
+typedef enum {
+ FSCACHE_CHECKAUX_OKAY, /* entry okay as is */
+ FSCACHE_CHECKAUX_NEEDS_UPDATE, /* entry requires update */
+ FSCACHE_CHECKAUX_OBSOLETE, /* entry requires deletion */
+} fscache_checkaux_t;
+
+/*****************************************************************************/
+/*
+ * fscache cookie definition
+ */
+struct fscache_cookie_def
+{
+ /* name of cookie type */
+ char name[16];
+
+ /* cookie type */
+ uint8_t type;
+#define FSCACHE_COOKIE_TYPE_INDEX 0
+#define FSCACHE_COOKIE_TYPE_DATAFILE 1
+
+ /* select the cache into which to insert an entry in this index
+ * - optional
+ * - should return a cache identifier or NULL to cause the cache to be
+ * inherited from the parent if possible or the first cache picked
+ * for a non-index file if not
+ */
+ struct fscache_cache_tag *(*select_cache)(const void *parent_netfs_data,
+ const void *cookie_netfs_data);
+
+ /* get an index key
+ * - should store the key data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+ uint16_t (*get_key)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ /* get certain file attributes from the netfs data
+ * - this function can be absent for an index
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+ void (*get_attr)(const void *cookie_netfs_data, uint64_t *size);
+
+ /* get the auxilliary data from netfs data
+ * - this function can be absent if the index carries no state data
+ * - should store the auxilliary data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+ uint16_t (*get_aux)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ /* consult the netfs about the state of an object
+ * - this function can be absent if the index carries no state data
+ * - the netfs data from the cookie being used as the target is
+ * presented, as is the auxilliary data
+ */
+ fscache_checkaux_t (*check_aux)(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen);
+
+ /* indicate pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
+ */
+ void (*mark_pages_cached)(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec);
+
+ /* indicate the cookie is no longer uncached
+ * - this function is called when the backing store currently caching
+ * a cookie is removed
+ * - the netfs should use this to clean up any markers indicating
+ * cached pages
+ * - this is mandatory for any object that may have data
+ */
+ void (*now_uncached)(void *cookie_netfs_data);
+};
+
+/* pattern used to fill dead space in an index entry */
+#define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
+
+#ifdef CONFIG_FSCACHE
+extern struct fscache_cookie *__fscache_acquire_cookie(struct fscache_cookie *parent,
+ struct fscache_cookie_def *def,
+ void *netfs_data);
+
+extern void __fscache_relinquish_cookie(struct fscache_cookie *cookie,
+ int retire);
+
+extern void __fscache_update_cookie(struct fscache_cookie *cookie);
+#endif
+
+static inline
+struct fscache_cookie *fscache_acquire_cookie(struct fscache_cookie *parent,
+ struct fscache_cookie_def *def,
+ void *netfs_data)
+{
+#ifdef CONFIG_FSCACHE
+ if (parent != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_acquire_cookie(parent, def, netfs_data);
+#endif
+ return FSCACHE_NEGATIVE_COOKIE;
+}
+
+static inline
+void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+ int retire)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ __fscache_relinquish_cookie(cookie, retire);
+#endif
+}
+
+static inline
+void fscache_update_cookie(struct fscache_cookie *cookie)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ __fscache_update_cookie(cookie);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * pin or unpin a cookie in a cache
+ * - only available for data cookies
+ */
+#ifdef CONFIG_FSCACHE
+extern int __fscache_pin_cookie(struct fscache_cookie *cookie);
+extern void __fscache_unpin_cookie(struct fscache_cookie *cookie);
+#endif
+
+static inline
+int fscache_pin_cookie(struct fscache_cookie *cookie)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_pin_cookie(cookie);
+#endif
+ return -ENOBUFS;
+}
+
+static inline
+void fscache_unpin_cookie(struct fscache_cookie *cookie)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ __fscache_unpin_cookie(cookie);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * fscache cached network filesystem type
+ * - name, version and ops must be filled in before registration
+ * - all other fields will be set during registration
+ */
+struct fscache_netfs
+{
+ uint32_t version; /* indexing version */
+ const char *name; /* filesystem name */
+ struct fscache_cookie *primary_index;
+ struct fscache_netfs_operations *ops;
+ struct list_head link; /* internal link */
+};
+
+struct fscache_netfs_operations
+{
+};
+
+#ifdef CONFIG_FSCACHE
+extern int __fscache_register_netfs(struct fscache_netfs *netfs);
+extern void __fscache_unregister_netfs(struct fscache_netfs *netfs);
+#endif
+
+static inline
+int fscache_register_netfs(struct fscache_netfs *netfs)
+{
+#ifdef CONFIG_FSCACHE
+ return __fscache_register_netfs(netfs);
+#else
+ return 0;
+#endif
+}
+
+static inline
+void fscache_unregister_netfs(struct fscache_netfs *netfs)
+{
+#ifdef CONFIG_FSCACHE
+ __fscache_unregister_netfs(netfs);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * look up a cache tag
+ * - cache tags are used to select specific caches in which to cache indexes
+ */
+#ifdef CONFIG_FSCACHE
+extern struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name);
+extern void __fscache_release_cache_tag(struct fscache_cache_tag *tag);
+#endif
+
+static inline
+struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name)
+{
+#ifdef CONFIG_FSCACHE
+ return __fscache_lookup_cache_tag(name);
+#else
+ return NULL;
+#endif
+}
+
+static inline
+void fscache_release_cache_tag(struct fscache_cache_tag *tag)
+{
+#ifdef CONFIG_FSCACHE
+ __fscache_release_cache_tag(tag);
+#endif
+}
+
+/*****************************************************************************/
+/*
+ * set the data size on a cached object
+ * - no pages beyond the end of the object will be accessible
+ * - returns -ENOBUFS if the file is not backed
+ * - returns -ENOSPC if a pinned file of that size can't be stored
+ * - returns 0 if okay
+ */
+#ifdef CONFIG_FSCACHE
+extern int __fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size);
+#endif
+
+static inline
+int fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_set_i_size(cookie, i_size);
+#endif
+ return -ENOBUFS;
+}
+
+/*****************************************************************************/
+/*
+ * reserve data space for a cached object
+ * - returns -ENOBUFS if the file is not backed
+ * - returns -ENOSPC if there isn't enough space to honour the reservation
+ * - returns 0 if okay
+ */
+#ifdef CONFIG_FSCACHE
+extern int __fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+#endif
+
+static inline
+int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_reserve_space(cookie, size);
+#endif
+ return -ENOBUFS;
+}
+
+/*****************************************************************************/
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - if the page is not backed by a file:
+ * - -ENOBUFS will be returned and nothing more will be done
+ * - else if the page is backed by a block in the cache:
+ * - a read will be started which will call end_io_func on completion
+ * - else if the page is unbacked:
+ * - a block will be allocated
+ * - -ENODATA will be returned
+ */
+#ifdef CONFIG_FSCACHE
+extern int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+#endif
+
+static inline
+int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_read_or_alloc_page(cookie, page, end_io_func,
+ end_io_data, gfp);
+#endif
+ return -ENOBUFS;
+}
+
+#ifdef CONFIG_FSCACHE
+extern int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+#endif
+
+static inline
+int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_read_or_alloc_pages(cookie, mapping, pages,
+ nr_pages, end_io_func,
+ end_io_data, gfp);
+#endif
+ return -ENOBUFS;
+}
+
+/*
+ * allocate a block in which to store a page
+ * - if the page is not backed by a file:
+ * - -ENOBUFS will be returned and nothing more will be done
+ * - else
+ * - a block will be allocated if there isn't one
+ * - 0 will be returned
+ */
+#ifdef CONFIG_FSCACHE
+extern int __fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp);
+#endif
+
+static inline
+int fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_alloc_page(cookie, page, gfp);
+#endif
+ return -ENOBUFS;
+}
+
+/*
+ * request a page be stored in the cache
+ * - this request may be ignored if no cache block is currently allocated, in
+ * which case it:
+ * - returns -ENOBUFS
+ * - if a cache block was already allocated:
+ * - a BIO will be dispatched to write the page (end_io_func will be called
+ * from the completion function)
+ * - returns 0
+ */
+#ifdef CONFIG_FSCACHE
+extern int __fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+
+extern int __fscache_write_pages(struct fscache_cookie *cookie,
+ struct pagevec *pagevec,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+#endif
+
+static inline
+int fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_write_page(cookie, page, end_io_func,
+ end_io_data, gfp);
+#endif
+ return -ENOBUFS;
+}
+
+static inline
+int fscache_write_pages(struct fscache_cookie *cookie,
+ struct pagevec *pagevec,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ return __fscache_write_pages(cookie, pagevec, end_io_func,
+ end_io_data, gfp);
+#endif
+ return -ENOBUFS;
+}
+
+/*
+ * indicate that caching is no longer required on a page
+ * - note: cannot cancel any outstanding BIOs between this page and the cache
+ */
+#ifdef CONFIG_FSCACHE
+extern void __fscache_uncache_page(struct fscache_cookie *cookie,
+ struct page *page);
+extern void __fscache_uncache_pages(struct fscache_cookie *cookie,
+ struct pagevec *pagevec);
+#endif
+
+static inline
+void fscache_uncache_page(struct fscache_cookie *cookie,
+ struct page *page)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ __fscache_uncache_page(cookie, page);
+#endif
+}
+
+static inline
+void fscache_uncache_pagevec(struct fscache_cookie *cookie,
+ struct pagevec *pagevec)
+{
+#ifdef CONFIG_FSCACHE
+ if (cookie != FSCACHE_NEGATIVE_COOKIE)
+ __fscache_uncache_pages(cookie, pagevec);
+#endif
+}
+
+#endif /* _LINUX_FSCACHE_H */
diff -uNrp linux-2.6.14-mm2/fs/fscache/cookie.c linux-2.6.14-mm2-cachefs/fs/fscache/cookie.c
--- linux-2.6.14-mm2/fs/fscache/cookie.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/fscache/cookie.c 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,1030 @@
+/* cookie.c: general filesystem cache cookie management
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include "fscache-int.h"
+
+static LIST_HEAD(fscache_cache_tag_list);
+static LIST_HEAD(fscache_cache_list);
+static LIST_HEAD(fscache_netfs_list);
+static DECLARE_RWSEM(fscache_addremove_sem);
+static struct fscache_cache_tag fscache_nomem_tag;
+
+kmem_cache_t *fscache_cookie_jar;
+
+static void fscache_withdraw_object(struct fscache_cache *cache,
+ struct fscache_object *object);
+
+static void __fscache_cookie_put(struct fscache_cookie *cookie);
+
+static inline void fscache_cookie_put(struct fscache_cookie *cookie)
+{
+#ifdef CONFIG_DEBUG_SLAB
+ BUG_ON((atomic_read(&cookie->usage) & 0xffff0000) == 0x6b6b0000);
+#endif
+
+ BUG_ON(atomic_read(&cookie->usage) <= 0);
+
+ if (atomic_dec_and_test(&cookie->usage))
+ __fscache_cookie_put(cookie);
+
+}
+
+/*****************************************************************************/
+/*
+ * look up a cache tag
+ */
+struct fscache_cache_tag *__fscache_lookup_cache_tag(const char *name)
+{
+ struct fscache_cache_tag *tag, *xtag;
+
+ /* firstly check for the existence of the tag under read lock */
+ down_read(&fscache_addremove_sem);
+
+ list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+ if (strcmp(tag->name, name) == 0) {
+ atomic_inc(&tag->usage);
+ up_read(&fscache_addremove_sem);
+ return tag;
+ }
+ }
+
+ up_read(&fscache_addremove_sem);
+
+ /* the tag does not exist - create a candidate */
+ xtag = kmalloc(sizeof(*tag) + strlen(name) + 1, GFP_KERNEL);
+ if (!xtag) {
+ /* return a dummy tag if out of memory */
+ up_read(&fscache_addremove_sem);
+ return &fscache_nomem_tag;
+ }
+
+ atomic_set(&tag->usage, 1);
+ strcpy(tag->name, name);
+
+ /* write lock, search again and add if still not present */
+ down_write(&fscache_addremove_sem);
+
+ list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+ if (strcmp(tag->name, name) == 0) {
+ atomic_inc(&tag->usage);
+ up_write(&fscache_addremove_sem);
+ kfree(xtag);
+ return tag;
+ }
+ }
+
+ list_add_tail(&xtag->link, &fscache_cache_tag_list);
+ up_write(&fscache_addremove_sem);
+ return xtag;
+
+} /* end __fscache_lookup_cache_tag() */
+
+/*****************************************************************************/
+/*
+ * release a reference to a cache tag
+ */
+void __fscache_release_cache_tag(struct fscache_cache_tag *tag)
+{
+ if (tag != &fscache_nomem_tag) {
+ down_write(&fscache_addremove_sem);
+
+ if (atomic_dec_and_test(&tag->usage))
+ list_del_init(&tag->link);
+ else
+ tag = NULL;
+
+ up_write(&fscache_addremove_sem);
+
+ kfree(tag);
+ }
+
+} /* end __fscache_release_cache_tag() */
+
+/*****************************************************************************/
+/*
+ * register a network filesystem for caching
+ */
+int __fscache_register_netfs(struct fscache_netfs *netfs)
+{
+ struct fscache_netfs *ptr;
+ int ret;
+
+ _enter("{%s}", netfs->name);
+
+ INIT_LIST_HEAD(&netfs->link);
+
+ /* allocate a cookie for the primary index */
+ netfs->primary_index =
+ kmem_cache_alloc(fscache_cookie_jar, SLAB_KERNEL);
+
+ if (!netfs->primary_index) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ /* initialise the primary index cookie */
+ memset(netfs->primary_index, 0, sizeof(*netfs->primary_index));
+
+ atomic_set(&netfs->primary_index->usage, 1);
+ atomic_set(&netfs->primary_index->children, 0);
+
+ netfs->primary_index->def = &fscache_fsdef_netfs_def;
+ netfs->primary_index->parent = &fscache_fsdef_index;
+ netfs->primary_index->netfs = netfs;
+ netfs->primary_index->netfs_data = netfs;
+
+ atomic_inc(&netfs->primary_index->parent->usage);
+ atomic_inc(&netfs->primary_index->parent->children);
+
+ init_rwsem(&netfs->primary_index->sem);
+ INIT_HLIST_HEAD(&netfs->primary_index->backing_objects);
+
+ /* check the netfs type is not already present */
+ down_write(&fscache_addremove_sem);
+
+ ret = -EEXIST;
+ list_for_each_entry(ptr, &fscache_netfs_list, link) {
+ if (strcmp(ptr->name, netfs->name) == 0)
+ goto already_registered;
+ }
+
+ list_add(&netfs->link, &fscache_netfs_list);
+ ret = 0;
+
+ printk("FS-Cache: netfs '%s' registered for caching\n", netfs->name);
+
+already_registered:
+ up_write(&fscache_addremove_sem);
+
+ if (ret < 0) {
+ netfs->primary_index->parent = NULL;
+ __fscache_cookie_put(netfs->primary_index);
+ netfs->primary_index = NULL;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_register_netfs() */
+
+EXPORT_SYMBOL(__fscache_register_netfs);
+
+/*****************************************************************************/
+/*
+ * unregister a network filesystem from the cache
+ * - all cookies must have been released first
+ */
+void __fscache_unregister_netfs(struct fscache_netfs *netfs)
+{
+ _enter("{%s.%u}", netfs->name, netfs->version);
+
+ down_write(&fscache_addremove_sem);
+
+ list_del(&netfs->link);
+ fscache_relinquish_cookie(netfs->primary_index, 0);
+
+ up_write(&fscache_addremove_sem);
+
+ printk("FS-Cache: netfs '%s' unregistered from caching\n",
+ netfs->name);
+
+ _leave("");
+
+} /* end __fscache_unregister_netfs() */
+
+EXPORT_SYMBOL(__fscache_unregister_netfs);
+
+/*****************************************************************************/
+/*
+ * initialise a cache record
+ */
+void fscache_init_cache(struct fscache_cache *cache,
+ struct fscache_cache_ops *ops,
+ const char *idfmt,
+ ...)
+{
+ va_list va;
+
+ memset(cache, 0, sizeof(*cache));
+
+ cache->ops = ops;
+
+ va_start(va, idfmt);
+ vsnprintf(cache->identifier, sizeof(cache->identifier), idfmt, va);
+ va_end(va);
+
+ INIT_LIST_HEAD(&cache->link);
+ INIT_LIST_HEAD(&cache->object_list);
+ spin_lock_init(&cache->object_list_lock);
+ init_rwsem(&cache->withdrawal_sem);
+
+} /* end fscache_init_cache() */
+
+EXPORT_SYMBOL(fscache_init_cache);
+
+/*****************************************************************************/
+/*
+ * declare a mounted cache as being open for business
+ */
+int fscache_add_cache(struct fscache_cache *cache,
+ struct fscache_object *ifsdef,
+ const char *tagname)
+{
+ struct fscache_cache_tag *tag;
+
+ BUG_ON(!cache->ops);
+ BUG_ON(!ifsdef);
+
+ if (!tagname)
+ tagname = cache->identifier;
+
+ BUG_ON(!tagname[0]);
+
+ _enter("{%s.%s},,%s", cache->ops->name, cache->identifier, tagname);
+
+ if (!cache->ops->grab_object(ifsdef))
+ BUG();
+
+ ifsdef->cookie = &fscache_fsdef_index;
+ ifsdef->cache = cache;
+ cache->fsdef = ifsdef;
+
+ down_write(&fscache_addremove_sem);
+
+ /* instantiate or allocate a cache tag */
+ list_for_each_entry(tag, &fscache_cache_tag_list, link) {
+ if (strcmp(tag->name, tagname) == 0) {
+ if (tag->cache) {
+ printk(KERN_ERR
+ "FS-Cache: cache tag '%s' already in use\n",
+ tagname);
+ up_write(&fscache_addremove_sem);
+ return -EEXIST;
+ }
+
+ atomic_inc(&tag->usage);
+ goto found_cache_tag;
+ }
+ }
+
+ tag = kmalloc(sizeof(*tag) + strlen(tagname) + 1, GFP_KERNEL);
+ if (!tag) {
+ up_write(&fscache_addremove_sem);
+ return -ENOMEM;
+ }
+
+ atomic_set(&tag->usage, 1);
+ strcpy(tag->name, tagname);
+ list_add_tail(&tag->link, &fscache_cache_tag_list);
+
+found_cache_tag:
+ tag->cache = cache;
+ cache->tag = tag;
+
+ /* add the cache to the list */
+ list_add(&cache->link, &fscache_cache_list);
+
+ /* add the cache's netfs definition index object to the cache's
+ * list */
+ spin_lock(&cache->object_list_lock);
+ list_add_tail(&ifsdef->cache_link, &cache->object_list);
+ spin_unlock(&cache->object_list_lock);
+
+ /* add the cache's netfs definition index object to the top level index
+ * cookie as a known backing object */
+ down_write(&fscache_fsdef_index.sem);
+
+ hlist_add_head(&ifsdef->cookie_link,
+ &fscache_fsdef_index.backing_objects);
+
+ atomic_inc(&fscache_fsdef_index.usage);
+
+ /* done */
+ up_write(&fscache_fsdef_index.sem);
+ up_write(&fscache_addremove_sem);
+
+ printk(KERN_NOTICE
+ "FS-Cache: Cache \"%s\" added (type %s)\n",
+ cache->tag->name, cache->ops->name);
+
+ _leave(" = 0 [%s]", cache->identifier);
+ return 0;
+
+} /* end fscache_add_cache() */
+
+EXPORT_SYMBOL(fscache_add_cache);
+
+/*****************************************************************************/
+/*
+ * withdraw an unmounted cache from the active service
+ */
+void fscache_withdraw_cache(struct fscache_cache *cache)
+{
+ struct fscache_object *object;
+
+ _enter("");
+
+ printk(KERN_NOTICE
+ "FS-Cache: Withdrawing cache \"%s\"\n",
+ cache->tag->name);
+
+ /* make the cache unavailable for cookie acquisition */
+ down_write(&cache->withdrawal_sem);
+
+ down_write(&fscache_addremove_sem);
+ list_del_init(&cache->link);
+ cache->tag->cache = NULL;
+ up_write(&fscache_addremove_sem);
+
+ /* mark all objects as being withdrawn */
+ spin_lock(&cache->object_list_lock);
+ list_for_each_entry(object, &cache->object_list, cache_link) {
+ set_bit(FSCACHE_OBJECT_WITHDRAWN, &object->flags);
+ }
+ spin_unlock(&cache->object_list_lock);
+
+ /* make sure all pages pinned by operations on behalf of the netfs are
+ * written to disc */
+ cache->ops->sync_cache(cache);
+
+ /* dissociate all the netfs pages backed by this cache from the block
+ * mappings in the cache */
+ cache->ops->dissociate_pages(cache);
+
+ /* we now have to destroy all the active objects pertaining to this
+ * cache */
+ spin_lock(&cache->object_list_lock);
+
+ while (!list_empty(&cache->object_list)) {
+ object = list_entry(cache->object_list.next,
+ struct fscache_object, cache_link);
+ list_del_init(&object->cache_link);
+ spin_unlock(&cache->object_list_lock);
+
+ _debug("withdraw %p", object->cookie);
+
+ /* we've extracted an active object from the tree - now dispose
+ * of it */
+ fscache_withdraw_object(cache, object);
+
+ spin_lock(&cache->object_list_lock);
+ }
+
+ spin_unlock(&cache->object_list_lock);
+
+ fscache_release_cache_tag(cache->tag);
+ cache->tag = NULL;
+
+ _leave("");
+
+} /* end fscache_withdraw_cache() */
+
+EXPORT_SYMBOL(fscache_withdraw_cache);
+
+/*****************************************************************************/
+/*
+ * withdraw an object from active service at the behest of the cache
+ * - need break the links to a cached object cookie
+ * - called under two situations:
+ * (1) recycler decides to reclaim an in-use object
+ * (2) a cache is unmounted
+ * - have to take care as the cookie can be being relinquished by the netfs
+ * simultaneously
+ * - the active object is pinned by the caller holding a refcount on it
+ */
+static void fscache_withdraw_object(struct fscache_cache *cache,
+ struct fscache_object *object)
+{
+ struct fscache_cookie *cookie, *xcookie = NULL;
+
+ _enter(",%p", object);
+
+ /* first of all we have to break the links between the object and the
+ * cookie
+ * - we have to hold both semaphores BUT we have to get the cookie sem
+ * FIRST
+ */
+ cache->ops->lock_object(object);
+
+ cookie = object->cookie;
+ if (cookie) {
+ /* pin the cookie so that is doesn't escape */
+ atomic_inc(&cookie->usage);
+
+ /* re-order the locks to avoid deadlock */
+ cache->ops->unlock_object(object);
+ down_write(&cookie->sem);
+ cache->ops->lock_object(object);
+
+ /* erase references from the object to the cookie */
+ hlist_del_init(&object->cookie_link);
+
+ xcookie = object->cookie;
+ object->cookie = NULL;
+
+ up_write(&cookie->sem);
+ }
+
+ cache->ops->unlock_object(object);
+
+ /* we've broken the links between cookie and object */
+ if (xcookie) {
+ fscache_cookie_put(xcookie);
+ cache->ops->put_object(object);
+ }
+
+ /* unpin the cookie */
+ if (cookie) {
+ if (cookie->def && cookie->def->now_uncached)
+ cookie->def->now_uncached(cookie->netfs_data);
+ fscache_cookie_put(cookie);
+ }
+
+ _leave("");
+
+} /* end fscache_withdraw_object() */
+
+/*****************************************************************************/
+/*
+ * select a cache on which to store an object
+ * - the cache addremove semaphore must be at least read-locked by the caller
+ * - the object will never be an index
+ */
+static struct fscache_cache *fscache_select_cache_for_object(struct fscache_cookie *cookie)
+{
+ struct fscache_cache_tag *tag;
+ struct fscache_object *object;
+ struct fscache_cache *cache;
+
+ _enter("");
+
+ if (list_empty(&fscache_cache_list)) {
+ _leave(" = NULL [no cache]");
+ return NULL;
+ }
+
+ /* we check the parent to determine the cache to use */
+ down_read(&cookie->parent->sem);
+
+ /* the first in the parent's backing list should be the preferred
+ * cache */
+ if (!hlist_empty(&cookie->parent->backing_objects)) {
+ object = hlist_entry(cookie->parent->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ cache = object->cache;
+ up_read(&cookie->parent->sem);
+ _leave(" = %p [parent]", cache);
+ return cache;
+ }
+
+ /* the parent is unbacked */
+ if (cookie->parent->def->type != FSCACHE_COOKIE_TYPE_INDEX) {
+ /* parent not an index and is unbacked */
+ up_read(&cookie->parent->sem);
+ _leave(" = NULL [parent ubni]");
+ return NULL;
+ }
+
+ up_read(&cookie->parent->sem);
+
+ if (!cookie->parent->def->select_cache)
+ goto no_preference;
+
+ /* ask the netfs for its preference */
+ tag = cookie->parent->def->select_cache(
+ cookie->parent->parent->netfs_data,
+ cookie->parent->netfs_data);
+
+ if (!tag)
+ goto no_preference;
+
+ if (tag == &fscache_nomem_tag) {
+ _leave(" = NULL [nomem tag]");
+ return NULL;
+ }
+
+ if (!tag->cache) {
+ _leave(" = NULL [unbacked tag]");
+ return NULL;
+ }
+
+ _leave(" = %p [specific]", tag->cache);
+ return tag->cache;
+
+no_preference:
+ /* netfs has no preference - just select first cache */
+ cache = list_entry(fscache_cache_list.next,
+ struct fscache_cache, link);
+ _leave(" = %p [first]", cache);
+ return cache;
+
+} /* end fscache_select_cache_for_object() */
+
+/*****************************************************************************/
+/*
+ * get a backing object for a cookie from the chosen cache
+ * - the cookie must be write-locked by the caller
+ * - all parent indexes will be obtained recursively first
+ */
+static struct fscache_object *fscache_lookup_object(struct fscache_cookie *cookie,
+ struct fscache_cache *cache)
+{
+ struct fscache_cookie *parent = cookie->parent;
+ struct fscache_object *pobject, *object;
+ struct hlist_node *_p;
+
+ _enter("{%s/%s},",
+ parent && parent->def ? parent->def->name : "",
+ cookie->def ? (char *) cookie->def->name : "<file>");
+
+ /* see if we have the backing object for this cookie + cache immediately
+ * to hand
+ */
+ object = NULL;
+ hlist_for_each_entry(object, _p,
+ &cookie->backing_objects, cookie_link
+ ) {
+ if (object->cache == cache)
+ break;
+ }
+
+ if (object) {
+ _leave(" = %p [old]", object);
+ return object;
+ }
+
+ BUG_ON(!parent); /* FSDEF entries don't have a parent */
+
+ /* we don't have a backing cookie, so we need to consult the object's
+ * parent index in the selected cache and maybe insert an entry
+ * therein; so the first thing to do is make sure that the parent index
+ * is represented on disc
+ */
+ down_read(&parent->sem);
+
+ pobject = NULL;
+ hlist_for_each_entry(pobject, _p,
+ &parent->backing_objects, cookie_link
+ ) {
+ if (pobject->cache == cache)
+ break;
+ }
+
+ if (!pobject) {
+ /* we don't know about the parent object */
+ up_read(&parent->sem);
+ down_write(&parent->sem);
+
+ pobject = fscache_lookup_object(parent, cache);
+ if (IS_ERR(pobject)) {
+ up_write(&parent->sem);
+ _leave(" = %ld [no ipobj]", PTR_ERR(pobject));
+ return pobject;
+ }
+
+ _debug("pobject=%p", pobject);
+
+ BUG_ON(pobject->cookie != parent);
+
+ downgrade_write(&parent->sem);
+ }
+
+ /* now we can attempt to look up this object in the parent, possibly
+ * creating a representation on disc when we do so
+ */
+ object = cache->ops->lookup_object(cache, pobject, cookie);
+ up_read(&parent->sem);
+
+ if (IS_ERR(object)) {
+ _leave(" = %ld [no obj]", PTR_ERR(object));
+ return object;
+ }
+
+ /* keep track of it */
+ cache->ops->lock_object(object);
+
+ BUG_ON(!hlist_unhashed(&object->cookie_link));
+
+ /* attach to the cache's object list */
+ if (list_empty(&object->cache_link)) {
+ spin_lock(&cache->object_list_lock);
+ list_add(&object->cache_link, &cache->object_list);
+ spin_unlock(&cache->object_list_lock);
+ }
+
+ /* attach to the cookie */
+ object->cookie = cookie;
+ atomic_inc(&cookie->usage);
+ hlist_add_head(&object->cookie_link, &cookie->backing_objects);
+
+ /* done */
+ cache->ops->unlock_object(object);
+ _leave(" = %p [new]", object);
+ return object;
+
+} /* end fscache_lookup_object() */
+
+/*****************************************************************************/
+/*
+ * request a cookie to represent an object (index, datafile, xattr, etc)
+ * - parent specifies the parent object
+ * - the top level index cookie for each netfs is stored in the fscache_netfs
+ * struct upon registration
+ * - idef points to the definition
+ * - the netfs_data will be passed to the functions pointed to in *def
+ * - all attached caches will be searched to see if they contain this object
+ * - index objects aren't stored on disk until there's a dependent file that
+ * needs storing
+ * - other objects are stored in a selected cache immediately, and all the
+ * indexes forming the path to it are instantiated if necessary
+ * - we never let on to the netfs about errors
+ * - we may set a negative cookie pointer, but that's okay
+ */
+struct fscache_cookie *__fscache_acquire_cookie(struct fscache_cookie *parent,
+ struct fscache_cookie_def *def,
+ void *netfs_data)
+{
+ struct fscache_cookie *cookie;
+ struct fscache_cache *cache;
+ struct fscache_object *object;
+ int ret = 0;
+
+ BUG_ON(!def);
+
+ _enter("{%s},{%s},%p",
+ parent ? (char *) parent->def->name : "<no-parent>",
+ def->name, netfs_data);
+
+ /* if there's no parent cookie, then we don't create one here either */
+ if (parent == FSCACHE_NEGATIVE_COOKIE) {
+ _leave(" [no parent]");
+ return FSCACHE_NEGATIVE_COOKIE;
+ }
+
+ /* validate the definition */
+ BUG_ON(!def->get_key);
+ BUG_ON(!def->name[0]);
+
+ BUG_ON(def->type == FSCACHE_COOKIE_TYPE_INDEX &&
+ parent->def->type != FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* allocate and initialise a cookie */
+ cookie = kmem_cache_alloc(fscache_cookie_jar, SLAB_KERNEL);
+ if (!cookie) {
+ _leave(" [ENOMEM]");
+ return FSCACHE_NEGATIVE_COOKIE;
+ }
+
+ atomic_set(&cookie->usage, 1);
+ atomic_set(&cookie->children, 0);
+
+ atomic_inc(&parent->usage);
+ atomic_inc(&parent->children);
+
+ cookie->def = def;
+ cookie->parent = parent;
+ cookie->netfs = parent->netfs;
+ cookie->netfs_data = netfs_data;
+
+ /* now we need to see whether the backing objects for this cookie yet
+ * exist, if not there'll be nothing to search */
+ down_read(&fscache_addremove_sem);
+
+ if (list_empty(&fscache_cache_list)) {
+ up_read(&fscache_addremove_sem);
+ _leave(" = %p [no caches]", cookie);
+ return cookie;
+ }
+
+ /* if the object is an index then we need do nothing more here - we
+ * create indexes on disk when we need them as an index may exist in
+ * multiple caches */
+ if (cookie->def->type != FSCACHE_COOKIE_TYPE_INDEX) {
+ down_write(&cookie->sem);
+
+ /* the object is a file - we need to select a cache in which to
+ * store it */
+ cache = fscache_select_cache_for_object(cookie);
+ if (!cache)
+ goto no_cache; /* couldn't decide on a cache */
+
+ /* create a file index entry on disc, along with all the
+ * indexes required to find it again later */
+ object = fscache_lookup_object(cookie, cache);
+ if (IS_ERR(object)) {
+ ret = PTR_ERR(object);
+ goto error;
+ }
+
+ up_write(&cookie->sem);
+ }
+out:
+ up_read(&fscache_addremove_sem);
+ _leave(" = %p", cookie);
+ return cookie;
+
+no_cache:
+ ret = -ENOMEDIUM;
+error:
+ printk(KERN_ERR "FS-Cache: error from cache: %d\n", ret);
+ if (cookie) {
+ up_write(&cookie->sem);
+ __fscache_cookie_put(cookie);
+ cookie = FSCACHE_NEGATIVE_COOKIE;
+ atomic_dec(&parent->children);
+ }
+
+ goto out;
+
+} /* end __fscache_acquire_cookie() */
+
+EXPORT_SYMBOL(__fscache_acquire_cookie);
+
+/*****************************************************************************/
+/*
+ * release a cookie back to the cache
+ * - the object will be marked as recyclable on disc if retire is true
+ * - all dependents of this cookie must have already been unregistered
+ * (indexes/files/pages)
+ */
+void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire)
+{
+ struct fscache_cache *cache;
+ struct fscache_object *object;
+ struct hlist_node *_p;
+
+ if (cookie == FSCACHE_NEGATIVE_COOKIE) {
+ _leave(" [no cookie]");
+ return;
+ }
+
+ _enter("%p{%s},%d", cookie, cookie->def->name, retire);
+
+ if (atomic_read(&cookie->children) != 0) {
+ printk("FS-Cache: cookie still has children\n");
+ BUG();
+ }
+
+ /* detach pointers back to the netfs */
+ down_write(&cookie->sem);
+
+ cookie->netfs_data = NULL;
+ cookie->def = NULL;
+
+ /* mark retired objects for recycling */
+ if (retire) {
+ hlist_for_each_entry(object, _p,
+ &cookie->backing_objects,
+ cookie_link
+ ) {
+ set_bit(FSCACHE_OBJECT_RECYCLING, &object->flags);
+ }
+ }
+
+ /* break links with all the active objects */
+ while (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object,
+ cookie_link);
+
+ /* detach each cache object from the object cookie */
+ set_bit(FSCACHE_OBJECT_RELEASING, &object->flags);
+
+ hlist_del_init(&object->cookie_link);
+
+ cache = object->cache;
+ cache->ops->lock_object(object);
+ object->cookie = NULL;
+ cache->ops->unlock_object(object);
+
+ if (atomic_dec_and_test(&cookie->usage))
+ /* the cookie refcount shouldn't be reduced to 0 yet */
+ BUG();
+
+ spin_lock(&cache->object_list_lock);
+ list_del_init(&object->cache_link);
+ spin_unlock(&cache->object_list_lock);
+
+ cache->ops->put_object(object);
+ }
+
+ up_write(&cookie->sem);
+
+ if (cookie->parent) {
+#ifdef CONFIG_DEBUG_SLAB
+ BUG_ON((atomic_read(&cookie->parent->children) & 0xffff0000) == 0x6b6b0000);
+#endif
+ atomic_dec(&cookie->parent->children);
+ }
+
+ /* finally dispose of the cookie */
+ fscache_cookie_put(cookie);
+
+ _leave("");
+
+} /* end __fscache_relinquish_cookie() */
+
+EXPORT_SYMBOL(__fscache_relinquish_cookie);
+
+/*****************************************************************************/
+/*
+ * update the index entries backing a cookie
+ */
+void __fscache_update_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ struct hlist_node *_p;
+
+ if (cookie == FSCACHE_NEGATIVE_COOKIE) {
+ _leave(" [no cookie]");
+ return;
+ }
+
+ _enter("{%s}", cookie->def->name);
+
+ BUG_ON(!cookie->def->get_aux);
+
+ down_write(&cookie->sem);
+ down_read(&cookie->parent->sem);
+
+ /* update the index entry on disc in each cache backing this cookie */
+ hlist_for_each_entry(object, _p,
+ &cookie->backing_objects, cookie_link
+ ) {
+ object->cache->ops->update_object(object);
+ }
+
+ up_read(&cookie->parent->sem);
+ up_write(&cookie->sem);
+ _leave("");
+
+} /* end __fscache_update_cookie() */
+
+EXPORT_SYMBOL(__fscache_update_cookie);
+
+/*****************************************************************************/
+/*
+ * destroy a cookie
+ */
+static void __fscache_cookie_put(struct fscache_cookie *cookie)
+{
+ struct fscache_cookie *parent;
+
+ _enter("%p", cookie);
+
+ for (;;) {
+ parent = cookie->parent;
+ BUG_ON(!hlist_empty(&cookie->backing_objects));
+ kmem_cache_free(fscache_cookie_jar, cookie);
+
+ if (!parent)
+ break;
+
+ cookie = parent;
+ BUG_ON(atomic_read(&cookie->usage) <= 0);
+ if (!atomic_dec_and_test(&cookie->usage))
+ break;
+ }
+
+ _leave("");
+
+} /* end __fscache_cookie_put() */
+
+/*****************************************************************************/
+/*
+ * initialise an cookie jar slab element prior to any use
+ */
+void fscache_cookie_init_once(void *_cookie, kmem_cache_t *cachep,
+ unsigned long flags)
+{
+ struct fscache_cookie *cookie = _cookie;
+
+ if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
+ SLAB_CTOR_CONSTRUCTOR) {
+ memset(cookie, 0, sizeof(*cookie));
+ init_rwsem(&cookie->sem);
+ INIT_HLIST_HEAD(&cookie->backing_objects);
+ }
+
+} /* end fscache_cookie_init_once() */
+
+/*****************************************************************************/
+/*
+ * pin an object into the cache
+ */
+int __fscache_pin_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p", cookie);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * read and write attempts on pages
+ */
+ down_write(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (!object->cache->ops->pin_object) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->pin_object(object);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+out:
+ up_write(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_pin_cookie() */
+
+EXPORT_SYMBOL(__fscache_pin_cookie);
+
+/*****************************************************************************/
+/*
+ * unpin an object into the cache
+ */
+void __fscache_unpin_cookie(struct fscache_cookie *cookie)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p", cookie);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" [no obj]");
+ return;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * read and write attempts on pages
+ */
+ down_write(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and unpin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (!object->cache->ops->unpin_object)
+ goto out;
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ object->cache->ops->unpin_object(object);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+out:
+ up_write(&cookie->sem);
+ _leave("");
+
+} /* end __fscache_unpin_cookie() */
+
+EXPORT_SYMBOL(__fscache_unpin_cookie);
diff -uNrp linux-2.6.14-mm2/fs/fscache/fscache-int.h linux-2.6.14-mm2-cachefs/fs/fscache/fscache-int.h
--- linux-2.6.14-mm2/fs/fscache/fscache-int.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/fscache/fscache-int.h 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,71 @@
+/* fscache-int.h: internal definitions
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _FSCACHE_INT_H
+#define _FSCACHE_INT_H
+
+#include <linux/fscache-cache.h>
+#include <linux/timer.h>
+#include <linux/bio.h>
+
+extern kmem_cache_t *fscache_cookie_jar;
+
+extern struct fscache_cookie fscache_fsdef_index;
+extern struct fscache_cookie_def fscache_fsdef_netfs_def;
+
+extern void fscache_cookie_init_once(void *_cookie, kmem_cache_t *cachep, unsigned long flags);
+
+/*****************************************************************************/
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT,...) \
+ printk("[%-6.6s] "FMT"\n",current->comm ,##__VA_ARGS__)
+#define _dbprintk(FMT,...) do { } while(0)
+
+#define kenter(FMT,...) dbgprintk("==> %s("FMT")",__FUNCTION__ ,##__VA_ARGS__)
+#define kleave(FMT,...) dbgprintk("<== %s()"FMT"",__FUNCTION__ ,##__VA_ARGS__)
+#define kdebug(FMT,...) dbgprintk(FMT ,##__VA_ARGS__)
+
+#define kjournal(FMT,...) _dbprintk(FMT ,##__VA_ARGS__)
+
+#define dbgfree(ADDR) _dbprintk("%p:%d: FREEING %p",__FILE__,__LINE__,ADDR)
+
+#define dbgpgalloc(PAGE) \
+do { \
+ _dbprintk("PGALLOC %s:%d: %p {%lx,%lu}\n", \
+ __FILE__,__LINE__, \
+ (PAGE),(PAGE)->mapping->host->i_ino,(PAGE)->index \
+ ); \
+} while(0)
+
+#define dbgpgfree(PAGE) \
+do { \
+ if ((PAGE)) \
+ _dbprintk("PGFREE %s:%d: %p {%lx,%lu}\n", \
+ __FILE__,__LINE__, \
+ (PAGE), \
+ (PAGE)->mapping->host->i_ino, \
+ (PAGE)->index \
+ ); \
+} while(0)
+
+#ifdef __KDEBUG
+#define _enter(FMT,...) kenter(FMT,##__VA_ARGS__)
+#define _leave(FMT,...) kleave(FMT,##__VA_ARGS__)
+#define _debug(FMT,...) kdebug(FMT,##__VA_ARGS__)
+#else
+#define _enter(FMT,...) do { } while(0)
+#define _leave(FMT,...) do { } while(0)
+#define _debug(FMT,...) do { } while(0)
+#endif
+
+#endif /* _FSCACHE_INT_H */
diff -uNrp linux-2.6.14-mm2/fs/fscache/fsdef.c linux-2.6.14-mm2-cachefs/fs/fscache/fsdef.c
--- linux-2.6.14-mm2/fs/fscache/fsdef.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/fscache/fsdef.c 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,113 @@
+/* fsdef.c: filesystem index definition
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include "fscache-int.h"
+
+static uint16_t fscache_fsdef_netfs_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax);
+
+static uint16_t fscache_fsdef_netfs_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax);
+
+static fscache_checkaux_t fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen);
+
+struct fscache_cookie_def fscache_fsdef_netfs_def = {
+ .name = "FSDEF.netfs",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = fscache_fsdef_netfs_get_key,
+ .get_aux = fscache_fsdef_netfs_get_aux,
+ .check_aux = fscache_fsdef_netfs_check_aux,
+};
+
+struct fscache_cookie fscache_fsdef_index = {
+ .usage = ATOMIC_INIT(1),
+ .def = NULL,
+ .sem = __RWSEM_INITIALIZER(fscache_fsdef_index.sem),
+ .backing_objects = HLIST_HEAD_INIT,
+};
+
+EXPORT_SYMBOL(fscache_fsdef_index);
+
+/*****************************************************************************/
+/*
+ * get the key data for an FSDEF index record
+ */
+static uint16_t fscache_fsdef_netfs_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct fscache_netfs *netfs = cookie_netfs_data;
+ unsigned klen;
+
+ _enter("{%s.%u},", netfs->name, netfs->version);
+
+ klen = strlen(netfs->name);
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, netfs->name, klen);
+ return klen;
+
+} /* end fscache_fsdef_netfs_get_key() */
+
+/*****************************************************************************/
+/*
+ * get the auxilliary data for an FSDEF index record
+ */
+static uint16_t fscache_fsdef_netfs_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct fscache_netfs *netfs = cookie_netfs_data;
+ unsigned dlen;
+
+ _enter("{%s.%u},", netfs->name, netfs->version);
+
+ dlen = sizeof(uint32_t);
+ if (dlen > bufmax)
+ return 0;
+
+ memcpy(buffer, &netfs->version, dlen);
+ return dlen;
+
+} /* end fscache_fsdef_netfs_get_aux() */
+
+/*****************************************************************************/
+/*
+ * check that the version stored in the auxilliary data is correct
+ */
+static fscache_checkaux_t fscache_fsdef_netfs_check_aux(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen)
+{
+ struct fscache_netfs *netfs = cookie_netfs_data;
+ uint32_t version;
+
+ _enter("{%s},,%hu", netfs->name, datalen);
+
+ if (datalen != sizeof(version)) {
+ _leave(" = OBSOLETE [dl=%d v=%d]",
+ datalen, sizeof(version));
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ memcpy(&version, data, sizeof(version));
+ if (version != netfs->version) {
+ _leave(" = OBSOLETE [ver=%x net=%x]",
+ version, netfs->version);
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ _leave(" = OKAY");
+ return FSCACHE_CHECKAUX_OKAY;
+
+} /* end fscache_fsdef_netfs_check_aux() */
diff -uNrp linux-2.6.14-mm2/fs/fscache/main.c linux-2.6.14-mm2-cachefs/fs/fscache/main.c
--- linux-2.6.14-mm2/fs/fscache/main.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/fscache/main.c 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,112 @@
+/* main.c: general filesystem caching manager
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include "fscache-int.h"
+
+int fscache_debug = 0;
+
+static int fscache_init(void);
+static void fscache_exit(void);
+
+fs_initcall(fscache_init);
+module_exit(fscache_exit);
+
+MODULE_DESCRIPTION("FS Cache Manager");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+/*****************************************************************************/
+/*
+ * initialise the fs caching module
+ */
+static int fscache_init(void)
+{
+ fscache_cookie_jar =
+ kmem_cache_create("fscache_cookie_jar",
+ sizeof(struct fscache_cookie),
+ 0,
+ 0,
+ fscache_cookie_init_once,
+ NULL);
+
+ if (!fscache_cookie_jar) {
+ printk(KERN_NOTICE
+ "FS-Cache: Failed to allocate a cookie jar\n");
+ return -ENOMEM;
+ }
+
+ printk(KERN_NOTICE "FS-Cache: Loaded\n");
+ return 0;
+
+} /* end fscache_init() */
+
+/*****************************************************************************/
+/*
+ * clean up on module removal
+ */
+static void __exit fscache_exit(void)
+{
+ _enter("");
+
+ kmem_cache_destroy(fscache_cookie_jar);
+ printk(KERN_NOTICE "FS-Cache: unloaded\n");
+
+} /* end fscache_exit() */
+
+/*****************************************************************************/
+/*
+ * clear the dead space between task_struct and kernel stack
+ * - called by supplying -finstrument-functions to gcc
+ */
+#if 0
+void __cyg_profile_func_enter (void *this_fn, void *call_site)
+__attribute__((no_instrument_function));
+
+void __cyg_profile_func_enter (void *this_fn, void *call_site)
+{
+ asm volatile(" movl %%esp,%%edi \n"
+ " andl %0,%%edi \n"
+ " addl %1,%%edi \n"
+ " movl %%esp,%%ecx \n"
+ " subl %%edi,%%ecx \n"
+ " shrl $2,%%ecx \n"
+ " movl $0xedededed,%%eax \n"
+ " rep stosl \n"
+ :
+ : "i"(~(THREAD_SIZE-1)), "i"(sizeof(struct thread_info))
+ : "eax", "ecx", "edi", "memory", "cc"
+ );
+}
+
+void __cyg_profile_func_exit(void *this_fn, void *call_site)
+__attribute__((no_instrument_function));
+
+void __cyg_profile_func_exit(void *this_fn, void *call_site)
+{
+ asm volatile(" movl %%esp,%%edi \n"
+ " andl %0,%%edi \n"
+ " addl %1,%%edi \n"
+ " movl %%esp,%%ecx \n"
+ " subl %%edi,%%ecx \n"
+ " shrl $2,%%ecx \n"
+ " movl $0xdadadada,%%eax \n"
+ " rep stosl \n"
+ :
+ : "i"(~(THREAD_SIZE-1)), "i"(sizeof(struct thread_info))
+ : "eax", "ecx", "edi", "memory", "cc"
+ );
+}
+#endif
diff -uNrp linux-2.6.14-mm2/fs/fscache/Makefile linux-2.6.14-mm2-cachefs/fs/fscache/Makefile
--- linux-2.6.14-mm2/fs/fscache/Makefile 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/fscache/Makefile 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,13 @@
+#
+# Makefile for general filesystem caching code
+#
+
+#CFLAGS += -finstrument-functions
+
+fscache-objs := \
+ cookie.o \
+ fsdef.o \
+ main.o \
+ page.o
+
+obj-$(CONFIG_FSCACHE) := fscache.o
diff -uNrp linux-2.6.14-mm2/fs/fscache/page.c linux-2.6.14-mm2-cachefs/fs/fscache/page.c
--- linux-2.6.14-mm2/fs/fscache/page.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/fs/fscache/page.c 2005-11-14 16:41:41.000000000 +0000
@@ -0,0 +1,521 @@
+/* page.c: general filesystem cache cookie management
+ *
+ * Copyright (C) 2004-5 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/buffer_head.h>
+#include <linux/pagevec.h>
+#include "fscache-int.h"
+
+/*****************************************************************************/
+/*
+ * set the data file size on an object in the cache
+ */
+int __fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,%llu,", cookie, i_size);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * read and write attempts on pages
+ */
+ down_write(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (object->cache->ops->set_i_size &&
+ down_read_trylock(&object->cache->withdrawal_sem)
+ ) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->set_i_size(object,
+ i_size);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_write(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_set_i_size() */
+
+EXPORT_SYMBOL(__fscache_set_i_size);
+
+/*****************************************************************************/
+/*
+ * reserve space for an object
+ */
+int __fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,%llu,", cookie, size);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it and exclude
+ * read and write attempts on pages
+ */
+ down_write(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ if (!object->cache->ops->reserve_space) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->reserve_space(object,
+ size);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+out:
+ up_write(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_reserve_space() */
+
+EXPORT_SYMBOL(__fscache_reserve_space);
+
+/*****************************************************************************/
+/*
+ * read a page from the cache or allocate a block in which to store it
+ * - we return:
+ * -ENOMEM - out of memory, nothing done
+ * -EINTR - interrupted
+ * -ENOBUFS - no backing object available in which to cache the block
+ * -ENODATA - no data available in the backing object for this block
+ * 0 - dispatched a read - it'll call end_io_func() when finished
+ */
+int __fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,{%lu},", cookie, page->index);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" -ENOBUFS [no backing objects]");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it */
+ down_read(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->read_or_alloc_page(
+ object,
+ page,
+ end_io_func,
+ end_io_data,
+ gfp);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_read_or_alloc_page() */
+
+EXPORT_SYMBOL(__fscache_read_or_alloc_page);
+
+/*****************************************************************************/
+/*
+ * read a list of page from the cache or allocate a block in which to store
+ * them
+ * - we return:
+ * -ENOMEM - out of memory, some pages may be being read
+ * -EINTR - interrupted, some pages may be being read
+ * -ENOBUFS - no backing object or space available in which to cache any
+ * pages not being read
+ * -ENODATA - no data available in the backing object for some or all of
+ * the pages
+ * 0 - dispatched a read on all pages
+ *
+ * end_io_func() will be called for each page read from the cache as it is
+ * finishes being read
+ *
+ * any pages for which a read is dispatched will be removed from pages and
+ * nr_pages
+ */
+int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,,%d,,,", cookie, *nr_pages);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" -ENOBUFS [no backing objects]");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+ BUG_ON(list_empty(pages));
+ BUG_ON(*nr_pages <= 0);
+
+ /* prevent the file from being uncached whilst we access it */
+ down_read(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->read_or_alloc_pages(
+ object,
+ mapping,
+ pages,
+ nr_pages,
+ end_io_func,
+ end_io_data,
+ gfp);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_read_or_alloc_pages() */
+
+EXPORT_SYMBOL(__fscache_read_or_alloc_pages);
+
+/*****************************************************************************/
+/*
+ * allocate a block in the cache on which to store a page
+ * - we return:
+ * -ENOMEM - out of memory, nothing done
+ * -EINTR - interrupted
+ * -ENOBUFS - no backing object available in which to cache the block
+ * 0 - block allocated
+ */
+int __fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,{%lu},", cookie, page->index);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" -ENOBUFS [no backing objects]");
+ return -ENOBUFS;
+ }
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from being uncached whilst we access it */
+ down_read(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ /* get and pin the backing object */
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ if (object->cache->ops->grab_object(object)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->allocate_page(object,
+ page,
+ gfp);
+
+ object->cache->ops->put_object(object);
+ }
+
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_alloc_page() */
+
+EXPORT_SYMBOL(__fscache_alloc_page);
+
+/*****************************************************************************/
+/*
+ * request a page be stored in the cache
+ * - returns:
+ * -ENOMEM - out of memory, nothing done
+ * -EINTR - interrupted
+ * -ENOBUFS - no backing object available in which to cache the page
+ * 0 - dispatched a write - it'll call end_io_func() when finished
+ */
+int __fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,{%lu},", cookie, page->index);
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from been uncached whilst we deal with it */
+ down_read(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->write_page(object,
+ page,
+ end_io_func,
+ end_io_data,
+ gfp);
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_write_page() */
+
+EXPORT_SYMBOL(__fscache_write_page);
+
+/*****************************************************************************/
+/*
+ * request several pages be stored in the cache
+ * - returns:
+ * -ENOMEM - out of memory, nothing done
+ * -EINTR - interrupted
+ * -ENOBUFS - no backing object available in which to cache the page
+ * 0 - dispatched a write - it'll call end_io_func() when finished
+ */
+int __fscache_write_pages(struct fscache_cookie *cookie,
+ struct pagevec *pagevec,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+{
+ struct fscache_object *object;
+ int ret;
+
+ _enter("%p,{%d},", cookie, pagevec->nr);
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ /* prevent the file from been uncached whilst we deal with it */
+ down_read(&cookie->sem);
+
+ ret = -ENOBUFS;
+ if (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->write_pages(object,
+ pagevec,
+ end_io_func,
+ end_io_data,
+ gfp);
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+ _leave(" = %d", ret);
+ return ret;
+
+} /* end __fscache_write_pages() */
+
+EXPORT_SYMBOL(__fscache_write_pages);
+
+/*****************************************************************************/
+/*
+ * remove a page from the cache
+ */
+void __fscache_uncache_page(struct fscache_cookie *cookie, struct page *page)
+{
+ struct fscache_object *object;
+ struct pagevec pagevec;
+
+ _enter(",{%lu}", page->index);
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" [no backing]");
+ return;
+ }
+
+ pagevec_init(&pagevec, 0);
+ pagevec_add(&pagevec, page);
+
+ /* ask the cache to honour the operation */
+ down_read(&cookie->sem);
+
+ if (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ object->cache->ops->uncache_pages(object, &pagevec);
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+
+ _leave("");
+ return;
+
+} /* end __fscache_uncache_page() */
+
+EXPORT_SYMBOL(__fscache_uncache_page);
+
+/*****************************************************************************/
+/*
+ * remove a bunch of pages from the cache
+ */
+void __fscache_uncache_pages(struct fscache_cookie *cookie,
+ struct pagevec *pagevec)
+{
+ struct fscache_object *object;
+
+ _enter(",{%d}", pagevec->nr);
+
+ BUG_ON(pagevec->nr <= 0);
+ BUG_ON(!pagevec->pages[0]);
+
+ /* not supposed to use this for indexes */
+ BUG_ON(cookie->def->type == FSCACHE_COOKIE_TYPE_INDEX);
+
+ if (hlist_empty(&cookie->backing_objects)) {
+ _leave(" [no backing]");
+ return;
+ }
+
+ /* ask the cache to honour the operation */
+ down_read(&cookie->sem);
+
+ if (!hlist_empty(&cookie->backing_objects)) {
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ /* prevent the cache from being withdrawn */
+ if (down_read_trylock(&object->cache->withdrawal_sem)) {
+ object->cache->ops->uncache_pages(object, pagevec);
+ up_read(&object->cache->withdrawal_sem);
+ }
+ }
+
+ up_read(&cookie->sem);
+
+ _leave("");
+ return;
+
+} /* end __fscache_uncache_pages() */
+
+EXPORT_SYMBOL(__fscache_uncache_pages);

2005-11-14 21:57:28

by David Howells

[permalink] [raw]
Subject: [PATCH 9/12] FS-Cache: Add documentation for FS-Cache and its interfaces

The attached patch adds documentation for FS-Cache in general and its network
filesystem and cache-backend interfaces specifically.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 fscache-docs-2614mm2.diff
Documentation/filesystems/caching/backend-api.txt | 334 ++++++++++
Documentation/filesystems/caching/fscache.txt | 150 ++++
Documentation/filesystems/caching/netfs-api.txt | 726 ++++++++++++++++++++++
3 files changed, 1210 insertions(+)

diff -uNrp linux-2.6.14-mm2/Documentation/filesystems/caching/fscache.txt linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/fscache.txt
--- linux-2.6.14-mm2/Documentation/filesystems/caching/fscache.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/fscache.txt 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,150 @@
+ ==========================
+ General Filesystem Caching
+ ==========================
+
+========
+OVERVIEW
+========
+
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+
+FS-Cache mediates between cache backends (such as CacheFS) and network
+filesystems:
+
+ +---------+
+ | | +-----------+
+ | NFS |--+ | |
+ | | | +-->| CacheFS |
+ +---------+ | +----------+ | | /dev/hda5 |
+ | | | | +-----------+
+ +---------+ +-->| | |
+ | | | |--+ +-------------+
+ | AFS |----->| FS-Cache | | |
+ | | | |----->| Cache Files |
+ +---------+ +-->| | | /var/cache |
+ | | |--+ +-------------+
+ +---------+ | +----------+ |
+ | | | | +-------------+
+ | ISOFS |--+ | | |
+ | | +-->| ReiserCache |
+ +---------+ | / |
+ +-------------+
+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+ cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+ must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+ one-off access of a small portion of it (such as might be done with the
+ "file" program).
+
+It instead serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+FS-Cache provides the following facilities:
+
+ (1) More than one cache can be used at once. Caches can be selected explicitly
+ by use of tags.
+
+ (2) Caches can be added / removed at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+ withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+ rather to let the netfs remain oblivious.
+
+ (5) Cookies are used to represent indexes, files and other objects to the
+ netfs. The simplest cookie is just a NULL pointer - indicating nothing
+ cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+ desires, though it must be aware that the index search function is
+ recursive, stack space is limited, and indexes can only be children of
+ indexes.
+
+ (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
+ that page A is at index B of the data-file represented by cookie C, and
+ that it should be read or written. The cache backend may or may not start
+ I/O on that page, but if it does, a netfs callback will be invoked to
+ indicate completion. The I/O may be either synchronous or asynchronous.
+
+ (8) Cookies can be "retired" upon release. At this point FS-Cache will mark
+ them as obsolete and the index hierarchy rooted at that point will get
+ recycled.
+
+ (9) The netfs provides a "match" function for index searches. In addition to
+ saying whether a match was made or not, this can also specify that an
+ entry should be updated or deleted.
+
+
+FS-Cache maintains a virtual indexing tree in which all indexes, files, objects
+and pages are kept. Bits of this tree may actually reside in one or more
+caches.
+
+ FSDEF
+ |
+ +------------------------------------+
+ | |
+ NFS AFS
+ | |
+ +--------------------------+ +-----------+
+ | | | |
+ homedir mirror afs.org redhat.com
+ | | |
+ +------------+ +---------------+ +----------+
+ | | | | | |
+ 00001 00002 00007 00125 vol00001 vol00002
+ | | | | |
+ +---+---+ +-----+ +---+ +------+------+ +-----+----+
+ | | | | | | | | | | | | |
+PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
+ | |
+ PG0 +-------+
+ | |
+ 00001 00003
+ |
+ +---+---+
+ | | |
+ PG0 PG1 PG2
+
+In the example above, you can see two netfs's being backed: NFS and AFS. These
+have different index hierarchies:
+
+ (*) The NFS primary index contains per-server indexes. Each server index is
+ indexed by NFS file handles to get data file objects. Each data file
+ objects can have an array of pages, but may also have further child
+ objects, such as extended attributes and directory entries. Extended
+ attribute objects themselves have page-array contents.
+
+ (*) The AFS primary index contains per-cell indexes. Each cell index contains
+ per-logical-volume indexes. Each of volume index contains up to three
+ indexes for the read-write, read-only and backup mirrors of those
+ volumes. Each of these contains vnode data file objects, each of which
+ contains an array of pages.
+
+The very top index is the FS-Cache master index in which individual netfs's
+have entries.
+
+Any index object may reside in more than one cache, provided it only has index
+children. Any index with non-index object children will be assumed to only
+reside in one cache.
+
+
+The netfs API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/netfs-api.txt
+
+The cache backend API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/backend-api.txt
diff -uNrp linux-2.6.14-mm2/Documentation/filesystems/caching/netfs-api.txt linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/netfs-api.txt
--- linux-2.6.14-mm2/Documentation/filesystems/caching/netfs-api.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/netfs-api.txt 2005-11-14 17:02:22.000000000 +0000
@@ -0,0 +1,726 @@
+ ===============================
+ FS-CACHE NETWORK FILESYSTEM API
+ ===============================
+
+There's an API by which a network filesystem can make use of the FS-Cache
+facilities. This is based around a number of principles:
+
+ (1) Caches can store a number of different object types. There are two main
+ object types: indexes and files. The first is a special type used by
+ FS-Cache to make finding objects faster and to make retiring of groups of
+ objects easier.
+
+ (2) Every index, file or other object is represented by a cookie. This cookie
+ may or may not have anything associated with it, but the netfs doesn't
+ need to care.
+
+ (3) Barring the top-level index (one entry per cached netfs), the index
+ hierarchy for each netfs is structured according the whim of the netfs.
+
+This API is declared in <linux/fscache.h>.
+
+This document contains the following sections:
+
+ (1) Network filesystem definition
+ (2) Index definition
+ (3) Object definition
+ (4) Network filesystem (un)registration
+ (5) Cache tag lookup
+ (6) Index registration
+ (7) Data file registration
+ (8) Miscellaneous object registration
+ (9) Setting the data file size
+ (10) Page alloc/read/write
+ (11) Page uncaching
+ (12) Index and data file update
+ (13) Miscellaneous cookie operations
+ (14) Cookie unregistration
+ (15) Index and data file invalidation
+
+
+=============================
+NETWORK FILESYSTEM DEFINITION
+=============================
+
+FS-Cache needs a description of the network filesystem. This is specified using
+a record of the following structure:
+
+ struct fscache_netfs {
+ uint32_t version;
+ const char *name;
+ struct fscache_netfs_operations *ops;
+ struct fscache_cookie *primary_index;
+ ...
+ };
+
+This first three fields should be filled in before registration, and the fourth
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+
+The fields are:
+
+ (1) The name of the netfs (used as the key in the toplevel index).
+
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+ entire in-cache hierarchy for this netfs will be scrapped and begun
+ afresh).
+
+ (3) The operations table is defined as follows:
+
+ struct fscache_netfs_operations {
+ };
+
+ Currently there aren't any functions here.
+
+ (4) The cookie representing the primary index will be allocated according to
+ another parameter passed into the registration function.
+
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+
+ static struct fscache_netfs_operations afs_cache_ops = {
+ };
+
+ struct fscache_netfs afs_cache_netfs = {
+ .version = 0,
+ .name = "afs",
+ .ops = &afs_cache_ops,
+ };
+
+
+================
+INDEX DEFINITION
+================
+
+Indexes are used for two purposes:
+
+ (1) To aid the finding of a file based on a series of keys (such as AFS's
+ "cell", "volume ID", "vnode ID").
+
+ (2) To make it easier to discard a subset of all the files cached based around
+ a particular key - for instance to mirror the removal of an AFS volume.
+
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, FS-Cache tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree. The netfs can even mix indexes and data files at the same level, but
+it's not recommended.
+
+Each index entry consists of a key of indeterminate length plus some auxilliary
+data, also of indeterminate length.
+
+There are some limits on indexes:
+
+ (1) Any index containing non-index objects should be restricted to a single
+ cache. Any such objects created within an index will be created in the
+ first cache only. The cache in which an index is created can be controlled
+ by cache tags (see below).
+
+ (2) The entry data must be atomically journallable, so it is limited to about
+ 400 bytes at present. At least 400 bytes will be available.
+
+ (3) The depth of the index tree should be judged with care as the search
+ function is recursive. Too many layers will run the kernel out of stack.
+
+
+=================
+OBJECT DEFINITION
+=================
+
+To define an object, a structure of the following type should be filled out:
+
+ struct fscache_object_def
+ {
+ uint8_t name[16];
+ uint8_t type;
+
+ struct fscache_cache_tag *(*select_cache)(
+ const void *parent_netfs_data,
+ const void *cookie_netfs_data);
+
+ uint16_t (*get_key)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ void (*get_attr)(const void *cookie_netfs_data,
+ uint64_t *size);
+
+ uint16_t (*get_aux)(const void *cookie_netfs_data,
+ void *buffer,
+ uint16_t bufmax);
+
+ fscache_checkaux_t (*check_aux)(void *cookie_netfs_data,
+ const void *data,
+ uint16_t datalen);
+
+ void (*mark_pages_cached)(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec);
+
+ void (*now_uncached)(void *cookie_netfs_data);
+ };
+
+This has the following fields:
+
+ (1) The type of the object [mandatory].
+
+ This is one of the following values:
+
+ (*) FSCACHE_COOKIE_TYPE_INDEX
+
+ This defines an index, which is a special FS-Cache type.
+
+ (*) FSCACHE_COOKIE_TYPE_DATAFILE
+
+ This defines an ordinary data file.
+
+ (*) Any other value between 2 and 255
+
+ This defines an extraordinary object such as an XATTR.
+
+ (2) The name of the object type (NUL terminated unless all 16 chars are used)
+ [optional].
+
+ (3) A function to select the cache in which to store an index [optional].
+
+ This function is invoked when an index needs to be instantiated in a cache
+ during the instantiation of a non-index object. Only the immediate index
+ parent for the non-index object will be queried. Any indexes above that in
+ the hierarchy may be stored in multiple caches. This function does not
+ need to be supplied for any non-index object or any index that will only
+ have index children.
+
+ If this function is not supplied or if it returns NULL then the first
+ cache in the parent's list will be chosed, or failing that, the first
+ cache in the master list.
+
+ (4) A function to retrieve an object's key from the netfs [mandatory].
+
+ This function will be called with the netfs data that was passed to the
+ cookie acquisition function and the maximum length of key data that it may
+ provide. It should write the required key data into the given buffer and
+ return the quantity it wrote.
+
+ (5) A function to retrieve attribute data from the netfs [optional].
+
+ This function will be called with the netfs data that was passed to the
+ cookie acquisition function. It should return the size of the file if this
+ is a data file. The size may be used to govern how much cache must be
+ reserved for this file in the cache.
+
+ If the function is absent, a file size of 0 is assumed.
+
+ (6) A function to retrieve auxilliary data from the netfs [optional].
+
+ This function will be called with the netfs data that was passed to the
+ cookie acquisition function and the maximum length of auxilliary data that
+ it may provide. It should write the auxilliary data into the given buffer
+ and return the quantity it wrote.
+
+ If this function is absent, the auxilliary data length will be set to 0.
+
+ The length of the auxilliary data buffer may be dependent on the key
+ length. A netfs mustn't rely on being able to provide more than 400 bytes
+ for both.
+
+ (7) A function to check the auxilliary data [optional].
+
+ This function will be called to check that a match found in the cache for
+ this object is valid. For instance with AFS it could check the auxilliary
+ data against the data version number returned by the server to determine
+ whether the index entry in a cache is still valid.
+
+ If this function is absent, it will be assumed that matching objects in a
+ cache are always valid.
+
+ If present, the function should return one of the following values:
+
+ (*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
+ (*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
+ (*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
+
+ This function can also be used to extract data from the auxilliary data in
+ the cache and copy it into the netfs's structures.
+
+ (8) A function to mark a page as retaining cache metadata [mandatory].
+
+ This is called by the cache to indicate that it is retaining in-memory
+ information for this page and that the netfs should uncache the page when
+ it has finished. This does not indicate whether there's data on the disk
+ or not. Note that several pages at once may be presented for marking.
+
+ kAFS and NFS use the PG_private bit on the page structure for this, but
+ that may not be appropriate in all cases.
+
+ This function is not required for indexes as they're not permitted data.
+
+ (9) A function to unmark all the pages retaining cache metadata [mandatory].
+
+ This is called by FS-Cache to indicate that a backing store is being
+ unbound from a cookie and that all the marks on the pages should be
+ cleared to prevent confusion. Note that the cache will have torn down all
+ its tracking information so that the pages don't need to be explicitly
+ uncached.
+
+ This function is not required for indexes as they're not permitted data.
+
+
+===================================
+NETWORK FILESYSTEM (UN)REGISTRATION
+===================================
+
+The first step is to declare the network filesystem to the cache. This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+
+The registration function is:
+
+ int fscache_register_netfs(struct fscache_netfs *netfs);
+
+It just takes a pointer to the netfs definition. It returns 0 or an error as
+appropriate.
+
+For kAFS, registration is done as follows:
+
+ ret = fscache_register_netfs(&afs_cache_netfs);
+
+The last step is, of course, unregistration:
+
+ void fscache_unregister_netfs(struct fscache_netfs *netfs);
+
+
+================
+CACHE TAG LOOKUP
+================
+
+FS-Cache permits the use of more than one cache. To permit particular index
+subtrees to be bound to particular caches, the second step is to look up cache
+representation tags. This step is optional; it can be left entirely up to
+FS-Cache as to which cache should be used. The problem with doing that is that
+FS-Cache will always pick the first cache that was registered.
+
+To get the representation for a named tag:
+
+ struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
+
+This takes a text string as the name and returns a representation of a tag. It
+will never return an error. It may return a dummy tag, however, if it runs out
+of memory; this will inhibit caching with this tag.
+
+Any representation so obtained must be released by passing it to this function:
+
+ void fscache_release_cache_tag(struct fscache_cache_tag *tag);
+
+The tag will be retrieved by FS-Cache when it calls the object definition
+operation select_cache().
+
+
+==================
+INDEX REGISTRATION
+==================
+
+The third step is to inform FS-Cache about part of an index hierarchy that can
+be used to locate files. This is done by requesting a cookie for each index in
+the path to the file:
+
+ struct fscache_cookie *
+ fscache_acquire_cookie(struct fscache_cookie *parent,
+ struct fscache_object_def *def,
+ void *netfs_data);
+
+This function creates an index entry in the index represented by parent,
+filling in the index entry by calling the operations pointed to by def.
+
+Note that this function never returns an error - all errors are handled
+internally. It may also return FSCACHE_NEGATIVE_COOKIE. It is quite acceptable
+to pass this token back to this function as the parent to another acquisition
+(or even to the relinquish cookie, read page and write page functions - see
+below).
+
+Note also that no indexes are actually created in a cache until a non-index
+object needs to be created somewhere down the hierarchy. Furthermore, an index
+may be created in several different caches independently at different
+times. This is all handled transparently, and the netfs doesn't see any of it.
+
+For example, with AFS, a cell would be added to the primary index. This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+
+ cell->cache =
+ fscache_acquire_cookie(afs_cache_netfs.primary_index,
+ &afs_cell_cache_index_def,
+ cell);
+
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+
+ vlocation->cache =
+ fscache_acquire_cookie(cell->cache,
+ &afs_vlocation_cache_index_def,
+ vlocation);
+
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+
+ volume->cache =
+ fscache_acquire_cookie(vlocation->cache,
+ &afs_volume_cache_index_def,
+ volume);
+
+
+======================
+DATA FILE REGISTRATION
+======================
+
+The fourth step is to request a data file be created in the cache. This is
+identical to index cookie acquisition. The only difference is that the type in
+the object definition should be something other than index type.
+
+ vnode->cache =
+ fscache_acquire_cookie(volume->cache,
+ &afs_vnode_cache_object_def,
+ vnode);
+
+
+=================================
+MISCELLANEOUS OBJECT REGISTRATION
+=================================
+
+An optional step is to request an object of miscellaneous type be created in
+the cache. This is almost identical to index cookie acquisition. The only
+difference is that the type in the object definition should be something other
+than index type. Whilst the parent object could be an index, it's more likely
+it would be some other type of object such as a data file.
+
+ xattr->cache =
+ fscache_acquire_cookie(vnode->cache,
+ &afs_xattr_cache_object_def,
+ xattr);
+
+Miscellaneous objects might be used to store extended attributes or directory
+entries for example.
+
+
+==========================
+SETTING THE DATA FILE SIZE
+==========================
+
+The fifth step is to set the size of the file. This doesn't automatically
+reserve any space in the cache, but permits the cache to adjust its metadata
+for data tracking appropriately:
+
+ int fscache_set_i_size(struct fscache_cookie *cookie, loff_t i_size);
+
+The cache will return -ENOBUFS if there is no backing cache or if there is no
+space to allocate any extra metadata required in the cache.
+
+Note that attempts to read or write data pages in the cache over this size may
+be rebuffed with -ENOBUFS.
+
+
+=====================
+PAGE READ/ALLOC/WRITE
+=====================
+
+And the sixth step is to store and retrieve pages in the cache. There are three
+functions that are used to do this.
+
+Note:
+
+ (1) A page should not be re-read or re-allocated without uncaching it first.
+
+ (2) A read or allocated page must be uncached when the netfs page is released
+ from the pagecache.
+
+ (3) A page should only be written to the cache if previous read or allocated.
+
+This permits the cache to maintain its page tracking in proper order.
+
+
+PAGE READ
+---------
+
+Firstly, the netfs should ask FS-Cache to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+
+ typedef
+ void (*fscache_rw_complete_t)(void *cookie_data,
+ struct page *page,
+ void *end_io_data,
+ int error);
+
+ int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+
+The cookie argument must specify a cookie for an object that isn't an index,
+the page specified will have the data loaded into it (and is also used to
+specify the page number), and the gfp argument is used to control how any
+memory allocations made are satisfied.
+
+If the cookie indicates the inode is not cached:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a copy of the page resident in the cache:
+
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+
+ (2) The function will submit a request to read the data from the cache's
+ backing device directly into the page specified.
+
+ (3) The function will return 0.
+
+ (4) When the read is complete, end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The end_io_data argument passed to the above function.
+
+ (*) An argument that's 0 on success or negative for an error code.
+
+ If an error occurs, it should be assumed that the page contains no usable
+ data.
+
+Otherwise, if there's not a copy available in cache, but the cache may be able
+to store the page:
+
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+
+ (2) A block may be reserved in the cache and attached to the object at the
+ appropriate place.
+
+ (3) The function will return -ENODATA.
+
+This function may also return -ENOMEM or -EINTR, in which case it won't have
+read any data from the cache.
+
+
+PAGE ALLOCATE
+-------------
+
+Alternatively, if there's not expected to be any data in the cache for a page
+because the file has been extended, a block can simply be allocated instead:
+
+ int fscache_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ gfp_t gfp);
+
+This is similar to the fscache_read_or_alloc_page() function, except that it
+never reads from the cache. It will return 0 if a block has been allocated,
+rather than -ENODATA as the other would. One or the other must be performed
+before writing to the cache.
+
+The mark_pages_cached() cookie operation will be called on the page if
+successful.
+
+
+PAGE WRITE
+----------
+
+Secondly, if the netfs changes the contents of the page (either due to an
+initial download or if a user performs a write), then the page should be
+written back to the cache:
+
+ int fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+
+The cookie argument must specify a data file cookie, the page specified should
+contain the data to be written (and is also used to specify the page number),
+and the gfp argument is used to control how any memory allocations made are
+satisfied.
+
+The page must have first been read or allocated successfully and must not have
+been uncached before writing is performed.
+
+If the cookie indicates the inode is not cached then:
+
+ (1) The function will return -ENOBUFS.
+
+Else if space can be allocated in the cache to hold this page:
+
+ (1) The function will submit a request to write the data to cache's backing
+ device directly from the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the write is complete the end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The end_io_data argument passed to the function.
+
+ (*) An argument that's 0 on success or negative for an error.
+
+ If an error occurs, it can be assumed that the page has not been written
+ to the cache, and that either there's a block containing the old data or
+ no block at all in the cache.
+
+Else if there's no space available in the cache, -ENOBUFS will be returned.
+
+
+MULTIPLE PAGE READ
+------------------
+
+A facility is provided to read several pages at once, as requested by the
+readpages() address space operation:
+
+ int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ struct list_head *pages,
+ int *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp);
+
+This works in a similar way to fscache_read_or_alloc_page(), except:
+
+ (1) Any page it can retrieve data for is removed from pages and nr_pages and
+ dispatched for reading to the disk. Reads of adjacent pages on disk may be
+ merged for greater efficiency.
+
+ (2) The mark_pages_cached() cookie operation will be called on several pages
+ at once if they're being read or allocated.
+
+ (3) If there was an general error, then that error will be returned.
+
+ Else if some pages couldn't be allocated or read, then -ENOBUFS will be
+ returned.
+
+ Else if some pages couldn't be read but were allocated, then -ENODATA will
+ be returned.
+
+ Otherwise, if all pages had reads dispatched, then 0 will be returned, the
+ list will be empty and *nr_pages will be 0.
+
+ (4) end_io_func will be called once for each page being read as the reads
+ complete.
+
+Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
+some of the pages being read and some being allocated. Those pages will have
+been marked appropriately and will need uncaching.
+
+
+==============
+PAGE UNCACHING
+==============
+
+To uncache a page, this function should be called:
+
+ void fscache_uncache_page(struct fscache_cookie *cookie,
+ struct page *page);
+
+This function permits the cache to release any in-memory representation it
+might be holding for this netfs page. This function must be called once for
+each page on which the read or write page functions above have been called to
+make sure the cache's in-memory tracking information gets torn down.
+
+Note that pages can't be explicitly deleted from the a data file. The whole
+data file must be retired (see the relinquish cookie function below).
+
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions.
+
+There is another unbinding operation similar to the above that takes a set of
+pages to unbind in one go:
+
+ void fscache_uncache_pagevec(struct fscache_cookie *cookie,
+ struct pagevec *pagevec);
+
+
+==========================
+INDEX AND DATA FILE UPDATE
+==========================
+
+To request an update of the index data for an index or other object, the
+following function should be called:
+
+ void fscache_update_cookie(struct fscache_cookie *cookie);
+
+This function will refer back to the netfs_data pointer stored in the cookie by
+the acquisition function to obtain the data to write into each revised index
+entry. The update method in the parent index definition will be called to
+transfer the data.
+
+Note that partial updates may happen automatically at other times, such as when
+data blocks are added to a data file object.
+
+
+===============================
+MISCELLANEOUS COOKIE OPERATIONS
+===============================
+
+There are a number of operations that can be used to control cookies:
+
+ (*) Cookie pinning:
+
+ int fscache_pin_cookie(struct fscache_cookie *cookie);
+ void fscache_unpin_cookie(struct fscache_cookie *cookie);
+
+ These operations permit data cookies to be pinned into the cache and to
+ have the pinning removed. They are not permitted on index cookies.
+
+ The pinning function will return 0 if successful, -ENOBUFS in the cookie
+ isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
+ -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+ -EIO if there's any other problem.
+
+ (*) Data space reservation:
+
+ int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+
+ This permits a netfs to request cache space be reserved to store up to the
+ given amount of a file. It is permitted to ask for more than the current
+ size of the file to allow for future file expansion.
+
+ If size is given as zero then the reservation will be cancelled.
+
+ The function will return 0 if successful, -ENOBUFS in the cookie isn't
+ backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
+ -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+ -EIO if there's any other problem.
+
+ Note that this doesn't pin an object in a cache; it can still be culled to
+ make space if it's not in use.
+
+
+=====================
+COOKIE UNREGISTRATION
+=====================
+
+To get rid of a cookie, this function should be called.
+
+ void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+ int retire);
+
+If retire is non-zero, then the object will be marked for recycling, and all
+copies of it will be removed from all active caches in which it is present. Not
+only that but all child objects will also be retired.
+
+If retire is zero, then the object may be available again when next the
+acquisition function is called. Retirement here will overrule the pinning on a
+cookie.
+
+One very important note - relinquish must NOT be called for a cookie unless all
+the cookies for "child" indexes, objects and pages have been relinquished
+first.
+
+
+================================
+INDEX AND DATA FILE INVALIDATION
+================================
+
+There is no direct way to invalidate an index subtree or a data file. To do
+this, the caller should relinquish and retire the cookie they have, and then
+acquire a new one.
diff -uNrp linux-2.6.14-mm2/Documentation/filesystems/caching/backend-api.txt linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/backend-api.txt
--- linux-2.6.14-mm2/Documentation/filesystems/caching/backend-api.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/backend-api.txt 2005-11-14 17:03:15.000000000 +0000
@@ -0,0 +1,334 @@
+ ==========================
+ FS-CACHE CACHE BACKEND API
+ ==========================
+
+The FS-Cache system provides an API by which actual caches can be supplied to
+FS-Cache for it to then serve out to network filesystems and other interested
+parties.
+
+This API is declared in <linux/fscache-cache.h>.
+
+
+====================================
+INITIALISING AND REGISTERING A CACHE
+====================================
+
+To start off, a cache definition must be initialised and registered for each
+cache the backend wants to make available. For instance, CacheFS does this in
+the fill_super() operation on mounting.
+
+The cache definition (struct fscache_cache) should be initialised by calling:
+
+ void fscache_init_cache(struct fscache_cache *cache,
+ struct fscache_cache_ops *ops,
+ const char *idfmt,
+ ...)
+
+Where:
+
+ (*) "cache" is a pointer to the cache definition;
+
+ (*) "ops" is a pointer to the table of operations that the backend supports on
+ this cache;
+
+ (*) and a format and printf-style arguments for constructing a label for the
+ cache.
+
+
+The cache should then be registered with FS-Cache by passing a pointer to the
+previously initialised cache definition to:
+
+ int fscache_add_cache(struct fscache_cache *cache,
+ struct fscache_object *fsdef,
+ const char *tagname);
+
+Two extra arguments should also be supplied:
+
+ (*) "fsdef" which should point to the object representation for the FS-Cache
+ master index in this cache. Netfs primary index entries will be created
+ here.
+
+ (*) "tagname" which, if given, should be a text string naming this cache. If
+ this is NULL, the identifier will be used instead. For CacheFS, the
+ identifier is set to name the underlying block device and the tag can be
+ supplied by mount.
+
+This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
+is already in use. 0 will be returned on success.
+
+
+=====================
+UNREGISTERING A CACHE
+=====================
+
+A cache can be withdrawn from the system by calling this function with a
+pointer to the cache definition:
+
+ void fscache_withdraw_cache(struct fscache_cache *cache)
+
+In CacheFS's case, this is called by put_super().
+
+
+==================
+FS-CACHE UTILITIES
+==================
+
+FS-Cache provides some utilities that a cache backend may make use of:
+
+ (*) Find the parent of an object:
+
+ struct fscache_object *
+ fscache_find_parent_object(struct fscache_object *object)
+
+ This allows a backend to find the logical parent of an index or data file
+ in the cache hierarchy.
+
+
+========================
+RELEVANT DATA STRUCTURES
+========================
+
+ (*) Index/Data file FS-Cache representation cookie.
+
+ struct fscache_cookie {
+ struct fscache_object_def *def;
+ struct fscache_netfs *netfs;
+ void *netfs_data;
+ ...
+ };
+
+ The fields that might be of use to the backend describe the object
+ definition, the netfs definition and the netfs's data for this
+ cookie. The object definition contain functions supplied by the netfs for
+ loading and matching index entries; these are required to provide some of
+ the cache operations.
+
+ (*) In-cache object representation.
+
+ struct fscache_object {
+ struct fscache_cache *cache;
+ struct fscache_cookie *cookie;
+ unsigned long flags;
+ #define FSCACHE_OBJECT_RECYCLING 1
+ ...
+ };
+
+ Structures of this type should be allocated by the cache backend and
+ passed to FS-Cache when requested by the appropriate cache operation. In
+ the case of CacheFS, they're embedded in CacheFS's internal object
+ structures.
+
+ Each object contains a pointer to the cookie that represents the object it
+ is backing. It also contains a flag that indicates whether this is an
+ index or not. This should be initialised by calling
+ fscache_object_init(object).
+
+
+================
+CACHE OPERATIONS
+================
+
+The cache backend provides FS-Cache with a table of operations that can be
+performed on the denizens of the cache. These are held in a structure of type
+
+ struct fscache_cache_ops
+
+ (*) Name of cache provider [mandatory].
+
+ const char *name
+
+ This isn't strictly an operation, but should be pointed at a string naming
+ the backend.
+
+ (*) Object lookup [mandatory].
+
+ struct fscache_object *(*lookup_object)(struct fscache_cache *cache,
+ struct fscache_object *parent,
+ struct fscache_cookie *cookie)
+
+ This method is used to look up an object in the specified cache, given a
+ pointer to the parent object and the cookie to which the object will be
+ attached. This should instantiate that object in the cache if it can, or
+ return -ENOBUFS or -ENOMEM if it can't.
+
+ (*) Increment object refcount [mandatory].
+
+ struct fscache_object *(*grab_object)(struct fscache_object *object)
+
+ This method is called to increment the reference count on an object. It
+ may fail (for instance if the cache is being withdrawn) by returning
+ NULL. It should return the object pointer if successful.
+
+ (*) Lock/Unlock object [mandatory].
+
+ void (*lock_object)(struct fscache_object *object)
+ void (*unlock_object)(struct fscache_object *object)
+
+ These methods are used to exclusively lock an object. It must be possible
+ to schedule with the lock held, so a spinlock isn't sufficient.
+
+ (*) Pin/Unpin object [optional].
+
+ int (*pin_object)(struct fscache_object *object)
+ void (*unpin_object)(struct fscache_object *object)
+
+ These methods are used to pin an object into the cache. Once pinned an
+ object cannot be reclaimed to make space. Return -ENOSPC if there's not
+ enough space in the cache to permit this.
+
+ (*) Update object [mandatory].
+
+ int (*update_object)(struct fscache_object *object)
+
+ This is called to update the index entry for the specified object. The new
+ information should be in object->cookie->netfs_data. This can be obtained
+ by calling object->cookie->def->get_aux()/get_attr().
+
+ (*) Release object reference [mandatory].
+
+ void (*put_object)(struct fscache_object *object)
+
+ This method is used to discard a reference to an object. The object may
+ be destroyed when all the references held by FS-Cache are released.
+
+ (*) Synchronise a cache [mandatory].
+
+ void (*sync)(struct fscache_cache *cache)
+
+ This is called to ask the backend to synchronise a cache with its backing
+ device.
+
+ (*) Dissociate a cache [mandatory].
+
+ void (*dissociate_pages)(struct fscache_cache *cache)
+
+ This is called to ask a cache to perform any page dissociations as part of
+ cache withdrawal.
+
+ (*) Set the data size on a cache file [mandatory].
+
+ int (*set_i_size)(struct fscache_object *object, loff_t i_size);
+
+ This is called to indicate to the cache the maximum size a file may
+ reach. The cache may use this to reserve space on the cache. It may also
+ return -ENOBUFS to indicate that insufficient space is available to expand
+ the metadata used to track the data. It should return 0 if successful or
+ -ENOMEM or -EIO on error.
+
+ (*) Reserve cache space for an object's data [optional].
+
+ int (*reserve_space)(struct fscache_object *object, loff_t size);
+
+ This is called to request that cache space be reserved to hold the data
+ for an object and the metadata used to track it. Zero size should be taken
+ as request to cancel a reservation.
+
+ This should return 0 if successful, -ENOSPC if there isn't enough space
+ available, or -ENOMEM or -EIO on other errors.
+
+ The reservation may exceed the size of the object, thus permitting future
+ expansion. If the amount of space consumed by an object would exceed the
+ reservation, it's permitted to refuse requests to allocate pages, but not
+ required. An object may be pruned down to its reservation size if larger
+ than that already.
+
+ (*) Request page be read from cache [mandatory].
+
+ int (*read_or_alloc_page)(struct fscache_object *object,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+
+ This is called to attempt to read a netfs page from the cache, or to
+ reserve a backing block if not. FS-Cache will have done as much checking
+ as it can before calling, but most of the work belongs to the backend.
+
+ If there's no page in the cache, then -ENODATA should be returned if the
+ backend managed to reserve a backing block; -ENOBUFS, -ENOMEM or -EIO if
+ it didn't.
+
+ If there is a page in the cache, then a read operation should be queued
+ and 0 returned. When the read finishes, end_io_func() should be called
+ with the following arguments:
+
+ (*end_io_func)(object->cookie->netfs_data,
+ page,
+ end_io_data,
+ error);
+
+ The mark_pages_cached() cookie operation should be called for the page if
+ any cache metadata is retained. This will indicate to the netfs that the
+ page needs explicit uncaching. This operation takes a pagevec, thus
+ allowing several pages to be marked at once.
+
+ (*) Request pages be read from cache [mandatory].
+
+ int (*read_or_alloc_pages)(struct fscache_object *object,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+
+ This is like the previous operation, except it will be handed a list of
+ pages instead of one page. Any pages on which a read operation is started
+ must be added to the page cache for the specified mapping and also to the
+ LRU. Such pages must also be removed from the pages list and nr_pages
+ decremented per page.
+
+ If there was an error such as -ENOMEM, then that should be returned; else
+ if one or more pages couldn't be read or allocated, then -ENOBUFS should
+ be returned; else if one or more pages couldn't be read, then -ENODATA
+ should be returned. If all the pages are dispatched then 0 should be
+ returned.
+
+ (*) Request page be allocated in the cache [mandatory].
+
+ int (*allocate_page)(struct fscache_object *object,
+ struct page *page,
+ gfp_t gfp)
+
+ This is like read_or_alloc_page(), except that it shouldn't read from the
+ cache, even if there's data there that could be retrieved. It should,
+ however, set up any internal metadata required such that write_page() can
+ write to the cache.
+
+ If there's no backing block available, then -ENOBUFS should be returned
+ (or -ENOMEM or -EIO if there were other problems). If a block is
+ successfully allocated, then the netfs page should be marked and 0
+ returned.
+
+ (*) Request page be written to cache [mandatory].
+
+ int (*write_page)(struct fscache_object *object,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ gfp_t gfp)
+
+ This is called to write from a page on which there was a previously
+ successful read_or_alloc_page() call. FS-Cache filters out pages that
+ don't have mappings.
+
+ If there's no backing block available, then -ENOBUFS should be returned
+ (or -ENOMEM or -EIO if there were other problems).
+
+ If the write operation could be queued, then 0 should be returned. When
+ the write completes, end_io_func() should be called with the following
+ arguments:
+
+ (*end_io_func)(object->cookie->netfs_data,
+ page,
+ end_io_data,
+ error);
+
+ (*) Discard retained per-page metadata [mandatory].
+
+ void (*uncache_pages)(struct fscache_object *object,
+ struct pagevec *pagevec)
+
+ This is called when one or more netfs pages are being evicted from the
+ pagecache. The cache backend should tear down any internal representation
+ or tracking it maintains.

2005-11-14 21:57:55

by David Howells

[permalink] [raw]
Subject: [PATCH 12/12] FS-Cache: CacheFS: Add Documentation

The attached patch adds documentation for CacheFS.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 fscache-cachefs-docs-2614mm2.diff
Documentation/filesystems/caching/cachefs.txt | 375 ++++++++++++++++++++++++++
1 files changed, 375 insertions(+)

diff -uNrp linux-2.6.14-mm2/Documentation/filesystems/caching/cachefs.txt linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/cachefs.txt
--- linux-2.6.14-mm2/Documentation/filesystems/caching/cachefs.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.14-mm2-cachefs/Documentation/filesystems/caching/cachefs.txt 2005-11-14 16:23:38.000000000 +0000
@@ -0,0 +1,375 @@
+ ===========================
+ CacheFS: Caching Filesystem
+ ===========================
+
+========
+OVERVIEW
+========
+
+CacheFS is a backend for the general filesystem cache facility.
+
+CacheFS uses a block device directly rather than a bunch of files under an
+already mounted filesystem. For why this is so, see further on. If necessary,
+however, a file can be loopback mounted as a cache.
+
+CacheFS is based on a wandering tree approach. This means that data already on
+the disk are not changed (more or less), only replaced. This means that
+CacheFS provides both metadata integrity and data integrity. There is a small,
+simple journal that tracks the state of the tree and the block allocation
+management. Should the power be cut to a computer, or should it crash, all
+changes made to the cache since the last time the journal was cranked will be
+lost; but a valid tree will remain, albeit slightly out of date.
+
+
+========
+MOUNTING
+========
+
+Since CacheFS is actually a quasi-filesystem, it requires a block device behind
+it. The way to give it one is to mount it as cachefs type on a directory
+somewhere. The mounted filesystem will then present the user with a single file
+describing the current cache management status.
+
+There are a number of mount options that can be provided when the cache is
+mounted:
+
+ (*) -o tag=<name>
+
+ This tells FS-Cache the name by which netfs's will refer to the cache.
+ This is not strictly a necessity; if it's not given, a tag will be
+ invented based on the major and minor numbers of the block device. If the
+ netfs doesn't give FS-Cache any specific instructions, the first cache in
+ the list will be picked by default.
+
+ (*) -o wander=<n>
+
+ Set the wander timer so that CacheFS will commit the journal that long
+ after a change is made if nothing else causes the tree to wander.
+
+ n may be in the range 0 to 3600. If n is 0 then automatic wandering will
+ be disabled, otherwise it's a number of seconds. The tree is also forced
+ to wander by allocator underrun, sync and unmounting the cache.
+
+ A smaller number means that the cache will be more up to date if the power
+ fails, but that the allocator will cycle faster and blocks will be
+ replaced more often, lowering performance.
+
+ (*) -o autodel
+
+ All files should be deleted when the last reference to them is dropped.
+ This is primarily for debugging purposes.
+
+For instance, the cache might by mounted thusly:
+
+ root>mount -t cachefs /dev/hdg9 /cache-hdg9 -o tag=mycache
+ root>ls -1 /cache-hdg9
+ status
+
+However, a block device that's going to be used for a cache must be prepared
+before it can be mounted initially. This is done very simply by:
+
+ echo "cachefs___" >/dev/hdg9
+
+During the initial mount, the basic structure will be written into the cache
+and then the journal will be replayed as during a normal mount.
+
+Note that trying to mount a cache read only will result in an error.
+
+
+=============================================
+WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES?
+=============================================
+
+CacheFS is backed by a block device rather than being backed by a bunch of
+files on a filesystem. This confers several advantages:
+
+ (1) Performance.
+
+ Going directly to a block device means that we can DMA directly to/from
+ the the netfs's pages. If another filesystem was managing the backing
+ store, everything would have to be copied between pages. Whilst DirectIO
+ does exist, it doesn't appear easy to make use of in this situation.
+
+ New address space or file operations could be added to make it possible to
+ persuade a backing diskfs to generate block I/O directly to/from disk
+ blocks under its control, but that then means the diskfs has to keep track
+ of I/O requests to pages not under its control.
+
+ Furthermore, we only have to do one lot of readahead calculations, not
+ two; in the diskfs backing case, the netfs would do one and the diskfs
+ would also do one.
+
+ (2) Memory.
+
+ Using a block device means that we have a lower memory usage - all data
+ pages belong to the netfs we're backing. If we used a filesystem, we would
+ have twice as many pages at certain points - one from the netfs and one
+ from the backing diskfs. In the backing diskfs model, under situations of
+ memory pressure, we'd have to allocate or keep around a diskfs page to be
+ able to write out a netfs page; or else we'd need to be able to punch a
+ hole in the backing file.
+
+ Furthermore, whilst we have to keep a certain amount of memory around for
+ every netfs inode we're backing, a backing diskfs would have to keep the
+ inode, dentry and possibly a file struct, in addition to FS-specific
+ stuff, thus adding to the burden.
+
+ (3) Holes.
+
+ The cache uses holes in files to indicate to the netfs that it hasn't yet
+ downloaded the data for that page.
+
+ Since CacheFS is its own filesystem, it can support holes in files
+ trivially. Running on top of another diskfs would limit us to using ones
+ that can support holes.
+
+ Furthermore, it would have to be made possible to detect holes in a diskfs
+ file, rather than just seeing zero filled blocks.
+
+ (4) Integrity
+
+ CacheFS maintains filesystem integrity through its use of a wandering
+ tree. It (for the most part) replaces blocks that need updating rather
+ than overwriting them in place. That said, certain non-structural changes
+ - such as the updating of atimes - are done in place.
+
+ CacheFS gets data integrity for free - more or less - by treating the
+ data exactly as it treats the metadata. Data blocks that need changing
+ are simply replaced. Whilst this does mean that the meta data pointing to
+ it also needs updating, quite often these changes elide between journal
+ updates.
+
+ Knowing that your cache is in a good state is vitally important if you,
+ say, put /usr on AFS. Some organisations put everything barring /etc,
+ /sbin, /lib and /var on AFS and have an enormous cache on every
+ computer. Imagine if the power goes out and renders every cache
+ inconsistent, requiring all the computers to re-initialise their caches
+ when the power comes back on...
+
+ (5) Disk Space.
+
+ Whilst the block device does set a hard ceiling on the amount of space
+ available, CacheFS can guarantee that all that space will be available to
+ the cache. On a diskfs-backed cache, the administrator would probably want
+ to set a cache size limit, but the system wouldn't be able guarantee that
+ all that space would be available to the cache - not unless that cache was
+ on a partition of its own.
+
+ Furthermore, with a diskfs-backed cache, if the recycler starts to reclaim
+ cache files to make space, the freed blocks may just be eaten directly by
+ userspace programs, potentially resulting in the entire cache being
+ consumed. Alternatively, netfs operations may end up being held up because
+ the cache can't get blocks on which to store the data.
+
+ (6) Users.
+
+ Users can't so easily go into CacheFS and run amok. The worst they can do
+ is cause bits of the cache to be recycled early. With a diskfs-backed
+ cache, they can do all sorts of bad things to the files belonging to the
+ cache, and they can do this quite by accident.
+
+
+On the other hand, there would be some advantages to using a file-based cache
+rather than a blockdev-based cache:
+
+ (1) Having to copy to a diskfs's page would mean that a netfs could just make
+ the copy and then assume its own page is ready to go.
+
+ (2) Backing onto a diskfs wouldn't require a committed block device. You would
+ just nominate a directory and go from there. With CacheFS you have to
+ repartition or install an extra drive to make use of it in an existing
+ system (though the loopback device offers a way out).
+
+ (3) You could easily make your cache bigger if the diskfs has plenty of space,
+ you could even go across multiple mountpoints. This last isn't so much of
+ a problem as you can have multiple caches.
+
+
+======================
+CACHEFS ON-DISK LAYOUT
+======================
+
+The filesystem is divided into a number of parts:
+
+ 0 +---------------------------+
+ | Superblock |
+ 1 +---------------------------+
+ | Journal |
+ 17 +---------------------------+
+ | |
+ | Data |
+ | |
+ END +---------------------------+
+
+The superblock contains the filesystem ID tags and pointers to all the other
+regions. All blocks are PAGE_SIZE in size and the blocks are numbered, starting
+with the superblock as 0. Using 32-bit block pointers, a maximum number of
+0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB
+for 4KB pages.
+
+CachefS will use the endianness and block size details from the kernel that
+created a cache, and it will not permit the cache to be mounted if these
+details differ from what was written on disk.
+
+The journal consists of a set of entries of sector size that keep track of the
+current root block of the tree and the various recycling and allocation lists.
+
+The journal scanned on mounting to find the most recent fully committed tree
+root, and that will be used. Any changes that were made but not connected to
+the tree rooted in the latest journal entry will be lost.
+
+The data region holds a number of things:
+
+ (1) The Metadata Tree
+
+ As previously mentioned, CacheFS is tree based. The journal points to the
+ current committed root of the tree. The structure of this tree is
+ discussed below:
+
+ (2) Data Blocks
+
+ These are fragments of files that are cached on the behalf of network
+ filesystems.
+
+ (4) Allocation, Recycling and Reclamation Stacks and Free Blocks
+
+ The free blocks of the filesystem are kept in either the two allocation
+ stacks if they're laundered and ready to be used, or the reclamation
+ stack if they'll be ready once the journal has ticked over. Note that
+ stacks are used in order that committed stack nodes don't have to be
+ changed - we can just add another block on the front and change the stack
+ top pointer in the journal.
+
+ There are also two stacks associated with recycling trees of data blocks
+ from deleted nodes. These are processed in the background by kcachefsd
+ and their components all get transferred to the reclamation stacks and
+ thence to the allocation stacks.
+
+
+============================
+CACHEFS METADATA NODE LAYOUT
+============================
+
+The CacheFS metadata tree has its layout based around the filesystem block size
+(PAGE_SIZE) and the sector size of the underlying device (512 bytes normally).
+
+Each "node" in the tree is mapped onto a single block and contains a number of
+slots of sector size, aligned on sector boundaries:
+
+ +-------+----------+
+ | | SLOT 0 |
+ | +----------+
+ | | SLOT 1 |
+ | +----------+
+ | | SLOT 2 |
+ | +----------+
+ | | SLOT 3 |
+ | NODE +----------+
+ | | SLOT 4 |
+ | +----------+
+ | | SLOT 5 |
+ | +----------+
+ | | SLOT 6 |
+ | +----------+
+ | | SLOT 7 |
+ +-------+----------+
+
+Each slot can either be empty or it can hold a "leaf". There are a number of
+types of leaves:
+
+ (1) Index Object Leaf.
+ (2) Data File Object Leaf.
+ (3) Other Object Leaf.
+
+ These three all look exactly the same on disk. They have the following
+ attributes:
+
+ - The object type.
+ - A unique object ID.
+ - The parent's object ID.
+ - A netfs key and key length.
+ - A digest of the netfs key, parent object ID and netfs key length.
+ - Netfs auxilliary data.
+ - The inode maximum size.
+ - The last time this object was accessed.
+ - The netfs's name for the class of this object.
+ - A tree of data pages, the depth of that tree and the number of blocks
+ it contains.
+
+ In general, any type of object can have data or child objects; however,
+ indexes aren't permitted data and non-indexes aren't permitted indexes as
+ children.
+
+ (4) Pointer Leaf.
+
+ This is simply a leaf that is entirely given over to pointers to other
+ blocks (it can also contain null pointers).
+
+ (5) Shortcut Leaf.
+
+ This is a leaf that permits a chunk of keyspace to be skipped, allowing
+ the path through the tree to be shortened in some extreme cases.
+
+Note that pointer leaves can be distinguished from other leaf types by the
+second pointer slot in the leaf. If this points into the journal, then it
+actually indicates the type of one of the other types of leaf.
+
+
+===============================
+CACHEFS METADATA TREE STRUCTURE
+===============================
+
+The CacheFS metadata tree is navigated by rigidly partitioned key space. For a
+4KB page size, each step along the path of the tree consumes 10 bits of the key
+(a "subkey"), assuming bit 0 of byte 0 to be the first bit of the key:
+
+ LEVEL 0 LEVEL 1 LEVEL 2 LEVEL 3 LEVEL 4
+
+ +-------+ +-------+ +-------+ +-------+ +-------+
+ | PTR |------>| PTR |------>| PTR |------>| PTR |------>| LEAF |
+ | LEAF | | | | | | PTR |---+ | LEAF |
+ | PTR | | | | | | | | | LEAF |
+ +-------+ +-------+ +-------+ +-------+ | +-------+
+ |
+ | +-------+
+ +-->| LEAF |
+ | |
+ | |
+ +-------+
+
+Whilst this would seem to be horribly inefficient, there are a number of
+optimisations that help to make the scheme much more efficient:
+
+ (1) No path is longer than it has to be. A node can hold more than one leaf,
+ so we don't bother fanning out a node that isn't full to overflowing.
+
+ +-------+ +-------+ +-------+ +-------+ +-------+
+ | PTR |------>| PTR |------>| PTR |------>| PTR |------>| LEAF |
+ | LEAF | | | | | | LEAF | | LEAF |
+ | PTR | | | | | | | | LEAF |
+ +-------+ +-------+ +-------+ +-------+ +-------+
+
+ (2) If a path to the point at which a number of nodes can be distinguished is
+ made up of a line of nodes, each of which contains one pointer, then part
+ of the path can be made up of a shortcut leaf pointing to a node. The
+ shortcut represents several adjacent nodes.
+
+ +-------+ +-------+ +-------+
+ | SHORT |------>| PTR |------>| LEAF |
+ | LEAF | | LEAF | | LEAF |
+ | PTR | | | | LEAF |
+ +-------+ +-------+ +-------+
+
+ (3) With a digest function that produces a reasonably even distibution based
+ on the key set presented, you get, for the most part, the shortest paths
+ everywhere.
+
+ (4) With a digest function that produces a certain amount of clumping bits of
+ the tree wind up staying in memory longer because they're referred to more
+ often. Also with a certain amount of clumping the tree ends up being less
+ sparse and thus occupies less disk space.
+
+ (5) It is assumed that if a node has a non-null pointer in a pointer leaf at
+ the location indexed by the subkey for that level of the key you're
+ looking for, then the leaf must lie behind that pointer, if it exists.
+ Otherwise you just have to look in the current node and no further.

2005-11-14 21:58:16

by David Howells

[permalink] [raw]
Subject: [PATCH 4/12] FS-Cache: Permit pre-allocation of radix-tree nodes

The attached patch permits advance allocation of radix-tree nodes on a per-task
basis to make sure that ENOMEM doesn't crop up at an inconvenient moment in
CacheFS.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 radix-cache-2614mm2.diff
include/linux/radix-tree.h | 14 ++++
include/linux/sched.h | 2
kernel/exit.c | 1
kernel/fork.c | 2
lib/radix-tree.c | 134 ++++++++++++++++++++++++++++++++++-----------
5 files changed, 122 insertions(+), 31 deletions(-)

diff -uNrp linux-2.6.14-mm2/include/linux/radix-tree.h linux-2.6.14-mm2-cachefs/include/linux/radix-tree.h
--- linux-2.6.14-mm2/include/linux/radix-tree.h 2005-11-14 16:17:59.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/include/linux/radix-tree.h 2005-11-14 16:23:41.000000000 +0000
@@ -51,7 +51,6 @@ void *radix_tree_delete(struct radix_tre
unsigned int
radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
-int radix_tree_preload(gfp_t gfp_mask);
void radix_tree_init(void);
void *radix_tree_tag_set(struct radix_tree_root *root,
unsigned long index, int tag);
@@ -64,6 +63,19 @@ radix_tree_gang_lookup_tag(struct radix_
unsigned long first_index, unsigned int max_items, int tag);
int radix_tree_tagged(struct radix_tree_root *root, int tag);

+/*
+ * radix tree advance loading
+ */
+struct radix_tree_preload {
+ int count;
+ struct radix_tree_node *nodes;
+};
+
+int radix_tree_preload(unsigned int gfp_mask);
+
+extern int radix_tree_preload_task(unsigned int gfp_mask, int nitems);
+extern void radix_tree_preload_drain_task(void);
+
static inline void radix_tree_preload_end(void)
{
preempt_enable();
diff -uNrp linux-2.6.14-mm2/include/linux/sched.h linux-2.6.14-mm2-cachefs/include/linux/sched.h
--- linux-2.6.14-mm2/include/linux/sched.h 2005-11-14 16:17:59.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/include/linux/sched.h 2005-11-14 16:25:59.000000000 +0000
@@ -35,6 +35,7 @@
#include <linux/topology.h>
#include <linux/seccomp.h>
#include <linux/rcupdate.h>
+#include <linux/radix-tree.h>

#include <linux/auxvec.h> /* For AT_VECTOR_SIZE */

@@ -863,6 +864,7 @@ struct task_struct {

/* VM state */
struct reclaim_state *reclaim_state;
+ struct radix_tree_preload radix_preload;

struct dentry *proc_dentry;
struct backing_dev_info *backing_dev_info;
diff -uNrp linux-2.6.14-mm2/kernel/exit.c linux-2.6.14-mm2-cachefs/kernel/exit.c
--- linux-2.6.14-mm2/kernel/exit.c 2005-11-14 16:17:59.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/kernel/exit.c 2005-11-14 16:23:41.000000000 +0000
@@ -64,6 +64,7 @@ void release_task(struct task_struct * p
struct dentry *proc_dentry;

repeat:
+ radix_tree_preload_drain_task();
atomic_dec(&p->user->processes);
spin_lock(&p->proc_lock);
proc_dentry = proc_pid_unhash(p);
diff -uNrp linux-2.6.14-mm2/kernel/fork.c linux-2.6.14-mm2-cachefs/kernel/fork.c
--- linux-2.6.14-mm2/kernel/fork.c 2005-11-14 16:17:59.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/kernel/fork.c 2005-11-14 16:23:41.000000000 +0000
@@ -122,6 +122,7 @@ void __put_task_struct(struct task_struc
if (!profile_handoff_task(tsk))
free_task(tsk);
}
+EXPORT_SYMBOL(__put_task_struct);

void __init fork_init(unsigned long mempages)
{
@@ -940,6 +941,7 @@ static task_t *copy_process(unsigned lon
goto bad_fork_cleanup;

p->proc_dentry = NULL;
+ memset(&p->radix_preload, 0, sizeof(p->radix_preload));

INIT_LIST_HEAD(&p->children);
INIT_LIST_HEAD(&p->sibling);
diff -uNrp linux-2.6.14-mm2/lib/radix-tree.c linux-2.6.14-mm2-cachefs/lib/radix-tree.c
--- linux-2.6.14-mm2/lib/radix-tree.c 2005-11-14 16:18:00.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/lib/radix-tree.c 2005-11-14 16:47:06.000000000 +0000
@@ -30,6 +30,7 @@
#include <linux/gfp.h>
#include <linux/string.h>
#include <linux/bitops.h>
+#include <linux/sched.h>


#ifdef __KERNEL__
@@ -69,10 +70,6 @@ static kmem_cache_t *radix_tree_node_cac
/*
* Per-cpu pool of preloaded nodes
*/
-struct radix_tree_preload {
- int nr;
- struct radix_tree_node *nodes[RADIX_TREE_MAX_PATH];
-};
DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };

/*
@@ -89,10 +86,12 @@ radix_tree_node_alloc(struct radix_tree_
struct radix_tree_preload *rtp;

rtp = &__get_cpu_var(radix_tree_preloads);
- if (rtp->nr) {
- ret = rtp->nodes[rtp->nr - 1];
- rtp->nodes[rtp->nr - 1] = NULL;
- rtp->nr--;
+ ret = rtp->nodes;
+ if (ret) {
+ rtp->nodes = ret->slots[0];
+ if (rtp->nodes)
+ ret->slots[0] = NULL;
+ rtp->count--;
}
}
return ret;
@@ -113,29 +112,89 @@ radix_tree_node_free(struct radix_tree_n
int radix_tree_preload(gfp_t gfp_mask)
{
struct radix_tree_preload *rtp;
- struct radix_tree_node *node;
- int ret = -ENOMEM;
+ struct radix_tree_node *node, *sp;
+ int ret = -ENOMEM, n;

preempt_disable();
+
rtp = &__get_cpu_var(radix_tree_preloads);
- while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
- preempt_enable();
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
- if (node == NULL)
- goto out;
- preempt_disable();
- rtp = &__get_cpu_var(radix_tree_preloads);
- if (rtp->nr < ARRAY_SIZE(rtp->nodes))
- rtp->nodes[rtp->nr++] = node;
- else
- kmem_cache_free(radix_tree_node_cachep, node);
+ if (rtp->count < RADIX_TREE_MAX_PATH) {
+ /* load up from the per-task cache first */
+ n = current->radix_preload.count;
+ if (n > 0) {
+ if (RADIX_TREE_MAX_PATH - rtp->count < n)
+ n = RADIX_TREE_MAX_PATH - rtp->count;
+ current->radix_preload.count -= n;
+ rtp->count += n;
+
+ sp = current->radix_preload.nodes;
+
+ for (; n > 0; n--) {
+ node = sp;
+ sp = node->slots[0];
+ node->slots[0] = rtp->nodes;
+ rtp->nodes = node;
+ }
+
+ current->radix_preload.nodes = sp;
+ }
+
+ /* then load up from the slab */
+ while (rtp->count < RADIX_TREE_MAX_PATH) {
+ preempt_enable();
+ node = kmem_cache_alloc(radix_tree_node_cachep,
+ gfp_mask);
+ if (node == NULL)
+ goto out;
+ preempt_disable();
+ rtp = &__get_cpu_var(radix_tree_preloads);
+
+ if (rtp->count < RADIX_TREE_MAX_PATH) {
+ node->slots[0] = rtp->nodes;
+ rtp->nodes = node;
+ rtp->count++;
+ } else {
+ kmem_cache_free(radix_tree_node_cachep, node);
+ }
+ }
}
+
ret = 0;
out:
return ret;
}
EXPORT_SYMBOL(radix_tree_preload);

+/*
+ * Load up an auxiliary cache with sufficient objects to ensure a number of
+ * items may be added to the radix tree
+ */
+int radix_tree_preload_task(unsigned int __nocast gfp_mask,
+ int nitems)
+{
+ struct radix_tree_preload *rtp = &current->radix_preload;
+ struct radix_tree_node *node;
+
+ nitems *= RADIX_TREE_MAX_PATH;
+
+ while (rtp->count < nitems) {
+ node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ if (node == NULL)
+ goto nomem;
+
+ node->slots[0] = rtp->nodes;
+ rtp->nodes = node;
+ rtp->count++;
+ }
+ return 0;
+
+nomem:
+ radix_tree_preload_drain_task();
+ return -ENOMEM;
+}
+
+EXPORT_SYMBOL(radix_tree_preload_task);
+
static inline void tag_set(struct radix_tree_node *node, int tag, int offset)
{
__set_bit(offset, node->tags[tag]);
@@ -834,6 +893,28 @@ static __init void radix_tree_init_maxin
height_to_maxindex[i] = __maxindex(i);
}

+/*
+ * drain a preload cache back to the slab from whence the nodes came
+ */
+static void radix_tree_preload_drain(struct radix_tree_preload *rtp)
+{
+ while (rtp->nodes) {
+ struct radix_tree_node *node = rtp->nodes;
+ rtp->nodes = node->slots[0];
+ rtp->count--;
+ kmem_cache_free(radix_tree_node_cachep, node);
+ }
+
+ BUG_ON(rtp->count != 0);
+}
+
+void radix_tree_preload_drain_task(void)
+{
+ radix_tree_preload_drain(&current->radix_preload);
+}
+
+EXPORT_SYMBOL(radix_tree_preload_drain_task);
+
#ifdef CONFIG_HOTPLUG_CPU
static int radix_tree_callback(struct notifier_block *nfb,
unsigned long action,
@@ -843,15 +924,8 @@ static int radix_tree_callback(struct no
struct radix_tree_preload *rtp;

/* Free per-cpu pool of perloaded nodes */
- if (action == CPU_DEAD) {
- rtp = &per_cpu(radix_tree_preloads, cpu);
- while (rtp->nr) {
- kmem_cache_free(radix_tree_node_cachep,
- rtp->nodes[rtp->nr-1]);
- rtp->nodes[rtp->nr-1] = NULL;
- rtp->nr--;
- }
- }
+ if (action == CPU_DEAD)
+ radix_tree_preload_drain(&per_cpu(radix_tree_preloads, cpu));
return NOTIFY_OK;
}
#endif /* CONFIG_HOTPLUG_CPU */

2005-11-14 21:59:06

by David Howells

[permalink] [raw]
Subject: [PATCH 5/12] FS-Cache: Release page->private in failed readahead

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 readahead-release-private-2614mm2.diff
mm/readahead.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+)

diff -uNrp linux-2.6.14-mm2/mm/readahead.c linux-2.6.14-mm2-cachefs/mm/readahead.c
--- linux-2.6.14-mm2/mm/readahead.c 2005-11-14 16:18:00.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/mm/readahead.c 2005-11-14 16:23:46.000000000 +0000
@@ -131,6 +131,12 @@ int read_cache_pages(struct address_spac
page = list_to_page(pages);
list_del(&page->lru);
if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
+ if (PagePrivate(page) && mapping->a_ops->releasepage) {
+ page->mapping = mapping;
+ mapping->a_ops->releasepage(page, GFP_KERNEL);
+ page->mapping = NULL;
+ }
+
page_cache_release(page);
continue;
}
@@ -143,6 +149,16 @@ int read_cache_pages(struct address_spac

victim = list_to_page(pages);
list_del(&victim->lru);
+
+ if (PagePrivate(victim) &&
+ mapping->a_ops->releasepage
+ ) {
+ victim->mapping = mapping;
+ mapping->a_ops->releasepage(
+ victim, GFP_KERNEL);
+ victim->mapping = NULL;
+ }
+
page_cache_release(victim);
}
break;

2005-11-14 21:58:12

by David Howells

[permalink] [raw]
Subject: [PATCH 2/12] FS-Cache: Permit multiple inclusion of linux/pagevec.h

The attached patch makes it possible to include linux/pagevec.h multiple times
without incurring errors due to duplicate definitions.

Signed-Off-By: David Howells <[email protected]>
---
warthog>diffstat -p1 pagevec-hdr-ifndef-2614mm2.diff
include/linux/pagevec.h | 5 +++++
1 files changed, 5 insertions(+)

diff -uNrp linux-2.6.14-mm2/include/linux/pagevec.h linux-2.6.14-mm2-cachefs/include/linux/pagevec.h
--- linux-2.6.14-mm2/include/linux/pagevec.h 2005-01-04 11:13:55.000000000 +0000
+++ linux-2.6.14-mm2-cachefs/include/linux/pagevec.h 2005-11-14 16:23:41.000000000 +0000
@@ -5,6 +5,9 @@
* pages. A pagevec is a multipage container which is used for that.
*/

+#ifndef _LINUX_PAGEVEC_H
+#define _LINUX_PAGEVEC_H
+
/* 14 pointers + two long's align the pagevec structure to a power of two */
#define PAGEVEC_SIZE 14

@@ -83,3 +86,5 @@ static inline void pagevec_lru_add(struc
if (pagevec_count(pvec))
__pagevec_lru_add(pvec);
}
+
+#endif /* _LINUX_PAGEVEC_H */

2005-11-14 22:46:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility



On Mon, 14 Nov 2005, David Howells wrote:
>
> This series of patches does four things:

Ok, interesting, and I like most of what I see.. But why do you have that
horrible "FSCACHE_NEGATIVE_COOKIE" thing?

We normally call that thing "NULL", and we test for it with code like "if
(!cookie)" instead of making up really nasty (and apparently misleading)
names.

The reason I say "misleading" is that a real negative cache entry doesn't
mean that the entry isn't cached, it means that it positively does not
exist. Which is something different from what you seem to be saying if I
read the patch right. From my quick reading, it looks like you use that
"NEGATIVE" not as a negative cache, but as a "I don't have this cached"
cache. Which is not negative at all.

(The difference is like the difference between a "hole" in a file and a
"don't know what this page is". One is real knowledge - and in UNIX
means that it's filled with zero - and the other one means that you have
to go look what the contents are).

And if it _is_ properly named (ie it really does mean "this entry
positively does not exist") then it shouldn't have the same representation
as NULL, because NULL really is traditionally used for "unknown" rather
than "known to not exist".

So depending on which it is, I really think you should either have

- just use NULL for "don't know"

or

- use #define FSCACHE_NEGATIVE_COOKIE ((struct fscache_cookie *)-1)
for "this is known to not exist".

(and quite often, you might well want to have both).

True negative caches are fairly unusual, but the "file hole" thing is one
example, and a negative dentry (dentry->d_inode = NULL) is another. It's a
very useful concept, and it's very distinct from "not in the cache".

Linus

2005-11-14 23:03:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

Linus Torvalds <[email protected]> wrote:
>
> On Mon, 14 Nov 2005, David Howells wrote:
> >
> > This series of patches does four things:
>
> Ok, interesting, and I like most of what I see..

Less impressed. It (still) adds a very large amount of tricksy code which
pokes around in core pagecache functions, slows down the radix-tree
hotpath, exports mysterious symbols. And that's on a 60-second scan.

It'll be a sizeable job going through it in detail. Not as sizeable as
writing it though ;)

All of this for an undisclosed speedup of AFS!

I think we need an NFS implementation and some numbers which make it
interesting. Or at least, some AFS numbers, some explanation as to why
they can be extrapolated to NFS and some degree of interest from the NFS
guys. Ditto CIFS.

Because it _is_ a lot of code.

2005-11-14 23:17:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

On Mon, 2005-11-14 at 15:03 -0800, Andrew Morton wrote:

> I think we need an NFS implementation and some numbers which make it
> interesting. Or at least, some AFS numbers, some explanation as to why
> they can be extrapolated to NFS and some degree of interest from the NFS
> guys. Ditto CIFS.

There is a lot of interest from the HPC community for this sort of thing
on NFS. Basically, it will help server scalability for projects that
have large numbers of read-only files accessed by large numbers of
clients.

AFAIK, Steve Dickson ([email protected]) is working on the NFS hooks for
FS-Cache.

Cheers,
Trond

2005-11-15 00:07:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] FS-Cache: Make NFS use FS-Cache

Steve Dickson <[email protected]> wrote:
>
> Here is a NFS patch that incorporates the FS-Cache hooks. The
> caching is done on a per filesystem bases using the new
> -o fsc mount flag (i.e mount -o fsc server:/export /mnt/export)

OK, thanks. What's the maturity of this patch?

Are you in a position to publish performance testing results?

2005-11-15 02:32:04

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] FS-Cache: Make NFS use FS-Cache

Andrew Morton wrote:
> Steve Dickson <[email protected]> wrote:
>
>>Here is a NFS patch that incorporates the FS-Cache hooks. The
>>caching is done on a per filesystem bases using the new
>>-o fsc mount flag (i.e mount -o fsc server:/export /mnt/export)
>
>
> OK, thanks. What's the maturity of this patch?
Well its as mature as the can be as this point. The
patch has followed along with David's cachefs patches since
2.6.10 (I believe) and I've take a number of percussions
and did quite bit of testing to ensure there are not any
regressions when the -o fsc mount is not used.

>
> Are you in a position to publish performance testing results?
Not really... or at least I haven't... maybe David has some...

We've be mostly working on stability and data integrity, in our
copious spare time of course ;-) . And to be quite honest
there are still some stability issue to iron out... But we both felt,
at this point, the best thing to do is get more eyeballs on the
code so we can ensure it moves in the right direction....

steved.

2005-11-15 08:57:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: [Linux-cachefs] Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

On Mon, Nov 14, 2005 at 06:17:33PM -0500, Trond Myklebust wrote:
> On Mon, 2005-11-14 at 15:03 -0800, Andrew Morton wrote:
>
> > I think we need an NFS implementation and some numbers which make it
> > interesting. Or at least, some AFS numbers, some explanation as to why
> > they can be extrapolated to NFS and some degree of interest from the NFS
> > guys. Ditto CIFS.
>
> There is a lot of interest from the HPC community for this sort of thing
> on NFS. Basically, it will help server scalability for projects that
> have large numbers of read-only files accessed by large numbers of
> clients.
>
> AFAIK, Steve Dickson ([email protected]) is working on the NFS hooks for
> FS-Cache.

Well, I'm not in the HPC community, but I have a lot of interest in seeing
cachefs + nfs working in the upstream kernel.

Jeff, misses cachefs from the Solaris days



2005-11-15 12:26:03

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

David Howells wrote:
> This series of patches does four things:
>

Missing patch 11/12 here. Can you post a link or try resending?

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-11-15 13:20:21

by David Howells

[permalink] [raw]
Subject: [PATCHES 0-12/12] FS-Cache: Generic filesystem caching facility


The patches are available from:

http://people.redhat.com/~dhowells/cachefs/

There's a patch-list.txt file in there with the patch descriptions and patch
names in order as well.

Also in there is dump-cachefs.c which can be used to peer into the depths of a
block device that has had cachefs mounted on it to see what's what.

David

2005-11-15 13:51:56

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

Linus Torvalds <[email protected]> wrote:

> > This series of patches does four things:
>
> Ok, interesting, and I like most of what I see.. But why do you have that
> horrible "FSCACHE_NEGATIVE_COOKIE" thing?
>
> We normally call that thing "NULL", and we test for it with code like "if
> (!cookie)" instead of making up really nasty (and apparently misleading)
> names.
>
> The reason I say "misleading" is that a real negative cache entry doesn't
> mean that the entry isn't cached, it means that it positively does not
> exist. Which is something different from what you seem to be saying if I
> read the patch right. From my quick reading, it looks like you use that
> "NEGATIVE" not as a negative cache, but as a "I don't have this cached"
> cache. Which is not negative at all.

Not exactly. Whilst it is a case of what you want is not in the cache (or was
discarded when examined), I'm also refusing to add it to the cache for some
reason or other (ENOMEM, no cache, I/O error on cache, insufficient available
space, etc).

If it was not in the cache, but I am going to let you cache it then I'll
return a cookie instead. All reads on that cookie will return ENODATA until at
such time data has been stored in the cache and can then be retrieved.

All attempts to perform accesses using the "negative" cookie will then fail
gracefully. A "negative" cookie will not be instantiated. You have to get a
new cookie.

> (The difference is like the difference between a "hole" in a file and a
> "don't know what this page is". One is real knowledge - and in UNIX
> means that it's filled with zero - and the other one means that you have
> to go look what the contents are).

But UNIX normally allows you to subsequently go and fill a hole (subject to
space constraints, etc)...

> And if it _is_ properly named (ie it really does mean "this entry
> positively does not exist") then it shouldn't have the same representation
> as NULL, because NULL really is traditionally used for "unknown" rather
> than "known to not exist".
>
> So depending on which it is, I really think you should either have
>
> - just use NULL for "don't know"

It isn't a "don't know" exactly. It's a "no".

> or
>
> - use #define FSCACHE_NEGATIVE_COOKIE ((struct fscache_cookie *)-1)
> for "this is known to not exist".

Hmmm... Which is possibly less efficient because CPUs generally are better at
determining 0 than -1.

> (and quite often, you might well want to have both).

I don't think I have a need for both. Either I give you a cookie (for which
there may be nothing in the cache); or I give you the "negative" cookie for
which there's definitely nothing in the cache, and gracefully refuse to
service it.


So, would you still rather I used NULL? If so, I can change it easily enough.

David

2005-11-15 14:06:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [Linux-cachefs] [PATCHES 0-12/12] FS-Cache: Generic filesystem caching facility

On Tue, Nov 15, 2005 at 01:20:11PM +0000, David Howells wrote:
>
> The patches are available from:
>
> http://people.redhat.com/~dhowells/cachefs/
>
> There's a patch-list.txt file in there with the patch descriptions and patch
> names in order as well.
>
> Also in there is dump-cachefs.c which can be used to peer into the depths of a
> block device that has had cachefs mounted on it to see what's what.

"Forbidden
You don't have permission to access /~dhowells/cachefs/patch-list.txt on
this server."

--b.

2005-11-15 16:24:41

by David Howells

[permalink] [raw]
Subject: Re: [Linux-cachefs] [PATCHES 0-12/12] FS-Cache: Generic filesystem caching facility


J. Bruce Fields <[email protected]> wrote:

> "Forbidden
> You don't have permission to access /~dhowells/cachefs/patch-list.txt on
> this server."

That should be fixed now.

David

2005-11-15 16:33:06

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

Linus Torvalds wrote:
> And if it _is_ properly named (ie it really does mean "this entry
> positively does not exist") then it shouldn't have the same
> representation as NULL, because NULL really is traditionally used
> for "unknown" rather than "known to not exist".

You mean like:

> a negative dentry (dentry->d_inode = NULL) is another.

? :)

-- Jamie

2005-11-15 16:54:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility



On Tue, 15 Nov 2005, Jamie Lokier wrote:

> Linus Torvalds wrote:
> > And if it _is_ properly named (ie it really does mean "this entry
> > positively does not exist") then it shouldn't have the same
> > representation as NULL, because NULL really is traditionally used
> > for "unknown" rather than "known to not exist".
>
> You mean like:
>
> > a negative dentry (dentry->d_inode = NULL) is another.
>
> ? :)

The _dentry_ is negative, and it is not NULL. It has an explicit flag
saying that it's negative.

We do not have negative inode caches.

Linus

2005-11-15 17:05:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility



On Tue, 15 Nov 2005, David Howells wrote:
>
> I don't think I have a need for both. Either I give you a cookie (for which
> there may be nothing in the cache); or I give you the "negative" cookie for
> which there's definitely nothing in the cache, and gracefully refuse to
> service it.
>
> So, would you still rather I used NULL? If so, I can change it easily enough.

Yes, if you don't have real negative cookies, then just use NULL.

Think of malloc(). It doesn't return MALLOC_OUT_OF_MEMORY_COOKIE when it
won't give you any more memory. It returns NULL.

The advantage of NULL is that people know what it is, and that the C
language _defines_ that you can do "if (xyzzy)" to test for non-NULL.
Conversely, the disadvantage of using a special cookie (that just happens
to be NULL) is that the test for NULL still _works_, so now you have two
ways of doing something and the compiler will never warn.

So in a very real sense, NULL _always_ exists. You can't make it go away
by defining it to another name, and by using another name you just confuse
things (if they are in fact the same).

Linus

2005-11-15 17:59:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

Andrew Morton <[email protected]> wrote:

> > > This series of patches does four things:
> >
> > Ok, interesting, and I like most of what I see..
>
> Less impressed. It (still) adds a very large amount of tricksy code which
> pokes around in core pagecache functions,

What I'm trying to do is actually fairly simple in concept:

(1) Have a metadata inode (imeta) that covers the block device.

(2) Metadata pages are attached to imeta at page indexes corresponding to the
block indexes on disk.

(3) Metadata blocks on disk are to be considered invariant[*] once attached
to the on-disk metadata tree rooted in the journal.

[*] atimes and netfs metadata can be updated in place. They fit into a
single sector, and so we assume changing them is atomic.

(4) A new metadata tree is constructed by replacing the disk blocks that need
to be modified with newly allocated blocks, then attaching the old
unchanged branches to that new block. Think of RCU or LISP.

The data then needs to be copied from the old block to the new, and the
old block discarded when the journal is advanced.

(5) If a copy of the old block is resident on a page in memory and is up to
date with respect to the old block on disk, then we really don't want to
have to copy it to another page in memory. This would require allocation
of an extra page, as well as requiring a full-page copy, thus thrashing
the data cache.

What we do is to detach the page from imeta at the old block index and
reattach it at the new block index (having allocated a new block first).

However, this means that we potentially have to allocate new radix tree
bits, and so we're subject to ENOMEM. But once we've allocated a new
block, we don't want to have to try and roll the allocation back or
recycle the block because that would potentially incur ENOMEM also...

In fact, we don't want to take any errors at all (EIO we can deal with
because that means the blockdev is screwed).

So we pre-allocate sufficient radix tree bits and attach them to the task
so that we can (a) sleep and (b) evade ENOMEM between allocating a new
block and attaching it to the tree's superstructure.

Using buffer heads doesn't help, and using the blockdev's inode to hold pages
doesn't help. This way, I can keep my metadata in the pagecache and the VM will
schedule it to be written out. I also want to avoid bufferheads because they
use up a big additional chunk of memory I'd prefer to avoid having to pin.

> slows down the radix-tree hotpath,

What I wanted was to be able to supply sufficient radix tree nodes in advance
that I wouldn't incur ENOMEM from that source, but I also needed to be able to
sleep after having loaded the cache, which meant I couldn't just shove the
extra in the per-CPU cache.

Admittedly, this is going to slow things down, and there's not a lot I can do
about that without adding full rollback support, which would be a lot of work,
particularly coping with the case of there being insufficient memory for the
cause.

Another way to deal with this would be to provide alternate
add_to_page_cache*() and radix_tree_preload() or radix_tree_insert() functions
that could be given a cache from which to allocate radix tree nodes.

I could also separate out the two sorts of cache. I changed the form of the
radix tree cache to a linked list with a counter instead of an array. This uses
less memory at the head (which we want for adding to task_struct), but may well
be slower when dequeuing elements as the dcache can't help. I could revert the
per-CPU cache to the original form, whilst keeping the per-task cache in the
less-intrusive form.

I could even remove the metadata from the pagecache entirely. I'd rather not do
that, though. Using the page cache has a lot of advantages, and they mostly
outweigh its disadvantages. Probably the biggest advantage is that I can leave
a metadata block I'm not using at the moment lying around in the pagecache; the
VM can discard it if it likes, but if not, it'll be there when I need it again.

> exports mysterious symbols. And that's on a 60-second scan.

(*) clear_page_dirty_for_io()

Used in mpage.c, mpage_writepages() which I have a very simplified
version of.

(*) lru_cache_add()

Normally called indirectly via add_to_page_cache_lru(), but I wanted
to call add_to_page_cache(), and sometimes use it for multiple pages
with pagevec_lru_add() which is exported, and sometimes on a single
page, which means using lru_cache_add().

> It'll be a sizeable job going through it in detail. Not as sizeable as
> writing it though ;)

:-)

> All of this for an undisclosed speedup of AFS!

What about NFS?

> I think we need an NFS implementation and some numbers which make it
> interesting. Or at least, some AFS numbers,

I'll generate some, at least for AFS.

> some explanation as to why they can be extrapolated to NFS and some degree of
> interest from the NFS guys. Ditto CIFS.
>
> Because it _is_ a lot of code.

Yes, I noticed that too:-)

David

2005-11-15 19:25:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

David Howells <[email protected]> wrote:
>
> What I'm trying to do is actually fairly simple in concept:
>
> (1) Have a metadata inode (imeta) that covers the block device.
>

Can you remind me again why it requires a blockdev rather than a regular file?

coz people are just going to go and use a loopback mount to get their
blockdev, which is a bit sad.

2005-11-15 23:46:18

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

On Nov 15, 2005, at 14:25, Andrew Morton wrote:
> Can you remind me again why it requires a blockdev rather than a
> regular file?
>
> coz people are just going to go and use a loopback mount to get
> their blockdev, which is a bit sad.

FS-Cache != CacheFS, although the names are a bit confusing. FS-
Cache is a generic cache frontend for filesystems, while CacheFS is a
provider backend that uses a block define internally. You could
_also_ use cache files. If you look at the [0/12] in the list, you
can see a diagram explaining this all in detail.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult.
-- C.A.R. Hoare


2005-11-16 11:27:05

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

Andrew Morton <[email protected]> wrote:

> > What I'm trying to do is actually fairly simple in concept:
> >
> > (1) Have a metadata inode (imeta) that covers the block device.
> >
>
> Can you remind me again why it requires a blockdev rather than a regular file?

That's the third time you've asked:-) You should be able to find the previous
conversations in LKML archives.

I presume directing you to go and look at the CacheFS documentation in patch
12/12 would be out of the question? Particularly the section entitled "Why a
block device? Why not a bunch of files?"...

Look also at Documentation/filesystem/caching/fscache.txt provided by patch
9/12 for the constraints I've set, in particular:

(1) It must be practical to operate without a cache.

(2) The size of any accessible file must not be limited to the size of the
cache.

(3) The combined size of all opened files (this includes mapped libraries)
must not be limited to the size of the cache.

(4) The user should not be forced to download an entire file just to do a
one-off access of a small portion of it (such as might be done with the
"file" program).

To which I wish to add:

(5) The netfs pages must remain owned by the netfs, so that there is no
difference between the netfs operating with a cache and it operating
without a cache. This means I/O must be done to/from the netfs pages
directly from/to the cache.


I have a start of a cache-on-files facility (called, most imaginatively,
CacheFiles) which works as another backend to FS-Cache. Of the underlying
filesystem, it requires:

(*) O_DIRECT

(*) Reads and writes on arbitrary kernel pages

(*) Reads on holes must return short or ENODATA. This requires an extra
O_XXXX flag to be supplied when opening a file or the struct file or
inode to be flagged appropriately.

(*) The ability to issue FS operations such as rename, open, setxattr, mkdir
from kernel space.

This facility isn't well advanced yet, and will initially only be available on
EXT2/3. It will also require a userspace component to clean up dead nodes.


Are you willing to at least carry the FS-Cache core and the AFS usage of it?
They haven't changed for a long time, and hopefully shouldn't need to:

(*) Subject: [PATCH 8/12] FS-Cache: Add generic filesystem cache core module
Patch: fscache-core-2614mm2.diff

(*) Subject: [PATCH 9/12] FS-Cache: Add documentation for FS-Cache and its interfaces
Patch: fscache-docs-2614mm2.diff

(*) Subject: [PATCH 10/12] FS-Cache: Make kAFS use FS-Cache
Patch: fscache-afs-2614mm2.diff

Once I've updated them for Linus's comments, that is...

All the other patches have to do with CacheFS.

David

2005-11-16 11:57:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

David Howells <[email protected]> wrote:
>
> Andrew Morton <[email protected]> wrote:
>
> > > What I'm trying to do is actually fairly simple in concept:
> > >
> > > (1) Have a metadata inode (imeta) that covers the block device.
> > >
> >
> > Can you remind me again why it requires a blockdev rather than a regular file?
>
> That's the third time you've asked:-)

Maybe on the fourth or fifth time it'll occur you to put it into the
changelog.

> ...
> Look also at Documentation/filesystem/caching/fscache.txt provided by patch
> 9/12 for the constraints I've set, in particular:
>
> (1) It must be practical to operate without a cache.
>
> (2) The size of any accessible file must not be limited to the size of the
> cache.
>
> (3) The combined size of all opened files (this includes mapped libraries)
> must not be limited to the size of the cache.
>
> (4) The user should not be forced to download an entire file just to do a
> one-off access of a small portion of it (such as might be done with the
> "file" program).
>
> To which I wish to add:
>
> (5) The netfs pages must remain owned by the netfs, so that there is no
> difference between the netfs operating with a cache and it operating
> without a cache. This means I/O must be done to/from the netfs pages
> directly from/to the cache.

None of that appears to be relevant.

A blockdev is just a big, fixed-sized file. Why cannot it be backed by a
big, fixed-sized file?

<looks>

OK, it's doing submit_bio() directly.

>
> I have a start of a cache-on-files facility (called, most imaginatively,
> CacheFiles) which works as another backend to FS-Cache. Of the underlying
> filesystem, it requires:
>
> (*) O_DIRECT
>
> (*) Reads and writes on arbitrary kernel pages
>
> (*) Reads on holes must return short or ENODATA. This requires an extra
> O_XXXX flag to be supplied when opening a file or the struct file or
> inode to be flagged appropriately.
>
> (*) The ability to issue FS operations such as rename, open, setxattr, mkdir
> from kernel space.
>
> This facility isn't well advanced yet, and will initially only be available on
> EXT2/3. It will also require a userspace component to clean up dead nodes.

I'd have thought that a decent intermediate step would be
cache-on-single-file using a_ops.direct_IO, as you're implying above. Then
all the direct-to-blockdev code can go away. It'll take some tweaking of
the core direct-io code, but nothing terribly serious.

>
> Are you willing to at least carry the FS-Cache core and the AFS usage of it?
>

fs-cache won't do anything without a backing store such as cachefs will it?

Those names are rather confusing, btw.

2005-11-17 19:28:23

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

Andrew Morton <[email protected]> wrote:

> > That's the third time you've asked:-)
>
> Maybe on the fourth or fifth time it'll occur you to put it into the
> changelog.

But that's not what's changed.

So if/when I produce a CacheFiles patch as well, you'll expect a critique of
why that's better than everything else in the changelog for that?

> None of that appears to be relevant.

It rules out the use of i_mapping...

> A blockdev is just a big, fixed-sized file. Why cannot it be backed by a
> big, fixed-sized file?
>
> <looks>
>
> OK, it's doing submit_bio() directly.

Using a big fixed-sized file also means that you've got two layout managers
and two transaction managers and two metadata managers on top of each other.

> > This facility isn't well advanced yet, and will initially only be
> > available on EXT2/3. It will also require a userspace component to clean
> > up dead nodes.
>
> I'd have thought that a decent intermediate step would be
> cache-on-single-file using a_ops.direct_IO, as you're implying above.

That's really the worst of both worlds. If you can access files, then you're
best of doing so on a one cache-file per netfs-file basis, *if* you can get
notification of completion on an asynchronous operation.

If you try to do this, the caching backend will try to lay the blocks out in
a manner that will then be undone because the underlying filesystem will then
put the blocks or parts thereof where *it* wishes.

Furthermore, it would seem that whilst undertaking direct I/O on an inode,
that inode is locked against other direct I/O operations. This could end up
serialising all I/O operations on the cache (see dio_complete() in
fs/direct-io.c).

> Then all the direct-to-blockdev code can go away. It'll take some tweaking
> of the core direct-io code, but nothing terribly serious.

The direct-to-blockdev code should get you better performance than going
through a single file on a filesystem: with your suggestion, you end up adding
the latency of the cache-to-single-file to that of the underlying filesystem.


There are five main problems that need solving for cachefiles that I can see:

(1) Reading of holes must return ENODATA or a short write. I have a patch to
do this for O_DIRECT (attached).

(2) It must be possible to do O_DIRECT reads/writes directly to/from kernel
pages. This may possible without modification, but I'm not certain of
that; looking at dio_refill_pages() it may not be - that accesses the
current->mm to get more pages.

(3) It must be possible to do these reads and writes asynchronously and to
get notification of their completion. I'm not sure how easy this is, but
it looks like it should be possible, perhaps using a kiocb. The routines
in fs/direct-io.c don't seem to be able to do asynchronicity, except
through AIO.

(4) It must be possible to maintain structural integrity in the cache. This
should be possible simply be relying on the underlying filesystem.

(5) It must be possible to maintain a certain level of data integrity in the
cache. We really don't want to have to blow the entire cache away if the
power goes out or the cache isn't laid to rest correctly.

It may end up being necessary to have a parallel to fs/direct-io.c for doing
I/O asynchronously to/from kernel pages.

Also, fs/direct-io.c seems to assume the filesystem on which it's running uses
buffer_heads - but not all of them do.

David


diff -uNr linux-2.6.12-rc2-mm1/fs/direct-io.c linux-2.6.12-rc2-mm1-cachefs/fs/direct-io.c
--- linux-2.6.12-rc2-mm1/fs/direct-io.c 2005-04-06 13:48:23.000000000 +0100
+++ linux-2.6.12-rc2-mm1-cachefs/fs/direct-io.c 2005-04-08 10:34:36.778872220 +0100
@@ -790,7 +790,7 @@
struct page *page;
unsigned block_in_page;
struct buffer_head *map_bh = &dio->map_bh;
- int ret = 0;
+ int ret = 0, sent = 0;

/* The I/O can start at any block offset within the first page */
block_in_page = dio->first_block_in_page;
@@ -861,6 +861,14 @@
page_cache_release(page);
return -ENOTBLK;
}
+ else if (dio->iocb->ki_filp->f_flags &
+ O_NOREADHOLE
+ ) {
+ page_cache_release(page);
+ if (sent)
+ return 0;
+ return -ENODATA;
+ }

if (dio->block_in_file >=
i_size_read(dio->inode)>>blkbits) {
@@ -907,6 +915,7 @@
page_cache_release(page);
goto out;
}
+ sent = 1;
dio->next_block_for_io += this_chunk_blocks;

dio->block_in_file += this_chunk_blocks;
diff -uNr linux-2.6.12-rc2-mm1/include/asm-i386/fcntl.h linux-2.6.12-rc2-mm1-cachefs/include/asm-i386/fcntl.h
--- linux-2.6.12-rc2-mm1/include/asm-i386/fcntl.h 2004-09-16 12:06:17.000000000 +0100
+++ linux-2.6.12-rc2-mm1-cachefs/include/asm-i386/fcntl.h 2005-04-07 15:46:30.000000000 +0100
@@ -21,6 +21,7 @@
#define O_DIRECTORY 0200000 /* must be a directory */
#define O_NOFOLLOW 0400000 /* don't follow links */
#define O_NOATIME 01000000
+#define O_NOREADHOLE 02000000 /* give short read or ENODATA on a hole */

#define F_DUPFD 0 /* dup */
#define F_GETFD 1 /* get close_on_exec */

2005-11-17 21:29:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

David Howells <[email protected]> wrote:
>
> Andrew Morton <[email protected]> wrote:
>
> > > That's the third time you've asked:-)
> >
> > Maybe on the fourth or fifth time it'll occur you to put it into the
> > changelog.
>
> But that's not what's changed.

It's an FAQ and it's an obvious shortcoming of the entire proposal. It
should be covered in the overall description.

> > OK, it's doing submit_bio() directly.
>
> Using a big fixed-sized file also means that you've got two layout managers
> and two transaction managers and two metadata managers on top of each other.

That's a choice users can make. Right now, they'll use loopback, which is
worse. But perhaps acceptably.

> > > This facility isn't well advanced yet, and will initially only be
> > > available on EXT2/3. It will also require a userspace component to clean
> > > up dead nodes.
> >
> > I'd have thought that a decent intermediate step would be
> > cache-on-single-file using a_ops.direct_IO, as you're implying above.
>
> That's really the worst of both worlds. If you can access files, then you're
> best of doing so on a one cache-file per netfs-file basis, *if* you can get
> notification of completion on an asynchronous operation.
>
> If you try to do this, the caching backend will try to lay the blocks out in
> a manner that will then be undone because the underlying filesystem will then
> put the blocks or parts thereof where *it* wishes.

The same will happen with a loopback mount, no?

> Furthermore, it would seem that whilst undertaking direct I/O on an inode,
> that inode is locked against other direct I/O operations. This could end up
> serialising all I/O operations on the cache (see dio_complete() in
> fs/direct-io.c).

No, direct-io has (almost) complete parallelism for readers and writers.
We use locking in there to protect certain filesystem metadata operations
but the locks can be dropped prior to performing the bulk IO.


> > Then all the direct-to-blockdev code can go away. It'll take some tweaking
> > of the core direct-io code, but nothing terribly serious.
>
> The direct-to-blockdev code should get you better performance than going
> through a single file on a filesystem: with your suggestion, you end up adding
> the latency of the cache-to-single-file to that of the underlying filesystem.

hmm? direct-io to a file is only a few percent slower than direct-io to
blockdev iirc.

>
> There are five main problems that need solving for cachefiles that I can see:
>
> (1) Reading of holes must return ENODATA or a short write. I have a patch to
> do this for O_DIRECT (attached).

Should we be exposing O_NOREADHOLE to userspace like this?

> (2) It must be possible to do O_DIRECT reads/writes directly to/from kernel
> pages. This may possible without modification, but I'm not certain of
> that; looking at dio_refill_pages() it may not be - that accesses the
> current->mm to get more pages.

No, we'll need to modify direct-io.

My suggestion would be to modify address_space_operations.direct_IO so that
it no longer takes a zillion arguments - instead, pass in a new `struct
direct_io_cb' thing which has all the args, pass that up and down the
stack.

Within blockdev_direct_IO, stick that direct_io_cb* into struct dio.

Add sufficient fields in direct_io_cb so that dio_refill_pages() can
populate dio->pages with kernel pages rather than using get_user_pages(),
if it was called for in-kernel direct-io.

> (3) It must be possible to do these reads and writes asynchronously and to
> get notification of their completion. I'm not sure how easy this is, but
> it looks like it should be possible, perhaps using a kiocb. The routines
> in fs/direct-io.c don't seem to be able to do asynchronicity, except
> through AIO.

Harder. Yes, perhaps we could use AIO infrastructure.

> (4) It must be possible to maintain structural integrity in the cache. This
> should be possible simply be relying on the underlying filesystem.
>
> (5) It must be possible to maintain a certain level of data integrity in the
> cache. We really don't want to have to blow the entire cache away if the
> power goes out or the cache isn't laid to rest correctly.
>
> It may end up being necessary to have a parallel to fs/direct-io.c for doing
> I/O asynchronously to/from kernel pages.
>
> Also, fs/direct-io.c seems to assume the filesystem on which it's running uses
> buffer_heads - but not all of them do.

Nope, buffer_heads are only in there because it happens to be the container
in which the get_block[s]() callback returns the file offset -> block
number mapping information.

2005-11-18 08:44:16

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

David wrote:
> > Maybe on the fourth or fifth time it'll occur you to put it into the
> > changelog.
>
> But that's not what's changed.

Just a guess - perhaps the following clarifies this point of confusion:

The "changelog" isn't so much for what you've changed, relative to the
previous version of that patch. It is for what will go into the Linux
change history for this patch, when accepted.

To quote Documentation/SubmittingPatches, which calls this the
"explanation body":

The explanation body will be committed to the permanent source
changelog, so should make sense to a competent reader who has long
since forgotten the immediate details of the discussion that might
have led to this patch.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-18 22:07:56

by Troy Benjegerdes

[permalink] [raw]
Subject: Re: [Linux-cachefs] Re: [PATCH 0/12] FS-Cache: Generic filesystem caching facility

On Mon, Nov 14, 2005 at 06:17:33PM -0500, Trond Myklebust wrote:
> On Mon, 2005-11-14 at 15:03 -0800, Andrew Morton wrote:
>
> > I think we need an NFS implementation and some numbers which make it
> > interesting. Or at least, some AFS numbers, some explanation as to why
> > they can be extrapolated to NFS and some degree of interest from the NFS
> > guys. Ditto CIFS.
>
> There is a lot of interest from the HPC community for this sort of thing
> on NFS. Basically, it will help server scalability for projects that
> have large numbers of read-only files accessed by large numbers of
> clients.

I'm currently running a root filesystem for a cluster using the OpenAFS
client for precisely this reason.

And in regards to (other) comments about code size, if you think this is big,
take a look at the OpenAFS kernel module.