2006-11-14 20:08:52

by David Howells

[permalink] [raw]
Subject: [PATCH 00/19] Permit filesystem local caching and NFS superblock sharing



These patches add local caching for network filesystems such as NFS and AFS.

I've addressed all but one of Christoph's objections. The one objection I
haven't yet dealt with is the addition of a new operation for writing a page of
the file, but that's controversial, and I'd like feedback on what I've done so
far.

The main thing that I've dealt with is the security issue in CacheFiles:

CacheFiles now calls the VFS wrappers for invoking filesystem operations, such
as vfs_mkdir(). It does not call any of the inode operations directly anymore.

This means, however, that it has to deal with SELinux, especially when in
enforcing mode.

The cachefilesd RPM installs a policy for labelling and accessing the files in
the cache, and for providing a security ID for the cachefilesd daemon to run
as. It _also_ provides a security ID for the cachefiles module to act as when
it is accessing the filesystem on behalf of the process.

The way this is done is that one of the patches:

CacheFiles: Add an act-as SID override in task_security_struct

permits the module to temporarily change the security ID as which a process
_acts_. It does _not_ change the process's actual security ID, and so does not
interfere with signals, ptraces, /proc/pid/ accesses aimed at that process.

Furthermore, following consultations with Stephen Smalley and others of the
SELinux project, it was deemed correct to override fsuid and fsgid of the host
process whilst accessing the cache, so these are also modified temporarily by
the module during such accesses.

Finally, the file creation SID is also temporarily overridden whilst the module
accesses the cache.

All this permits the ownership and accessibility of files in the cache to
preclude ordinary access by all processes running on the system, with the
exception of cachefilesd. When SELinux is in enforcing mode, the daemon may
not read or write the files in the cache according to the policy.

Thanks to Dan Walsh and Karl MacMillan for helping me get the SELinux policy
working. There is documentation on this in the patches and in the cachefilesd
SRPM/tarball.


Additionally, sysctls are no longer used. Some parameters are modifiable via
sysfs. CacheFiles parameters pertaining to the cache are still only modifiable
by sending commands to the module over its control interface, though the
control interface is now a character device (following GregKH's suggestion).

---
The kernel patches are committed to a GIT tree based on Linus's:

git://git.infradead.org/users/dhowells/fscache-2.6.git

Which can be viewed through:

http://git.infradead.org/?p=users/dhowells/fscache-2.6.git;a=summary

A tarball of patches is available at:

http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-19.tar.bz2


To use this version of CacheFiles, the cachefilesd-0.8 is also required. It
is available as an SRPM:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8-15.fc7.src.rpm

Or as individual bits:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
http://people.redhat.com/~dhowells/fscache/cachefilesd.fc
http://people.redhat.com/~dhowells/fscache/cachefilesd.if
http://people.redhat.com/~dhowells/fscache/cachefilesd.te
http://people.redhat.com/~dhowells/fscache/cachefilesd.spec

David


2006-11-14 20:10:01

by David Howells

[permalink] [raw]
Subject: [PATCH 14/19] CacheFiles: Permit an inode's security ID to be obtained

Permit an inode's security ID to be obtained by the CacheFiles module. This is
then used as the SID with which files and directories will be created in the
cache.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 13 +++++++++++++
security/dummy.c | 6 ++++++
security/selinux/hooks.c | 8 ++++++++
3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 63617e4..5913ae7 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1259,6 +1259,7 @@ struct security_operations {
int (*inode_getsecurity)(const struct inode *inode, const char *name, void *buffer, size_t size, int err);
int (*inode_setsecurity)(struct inode *inode, const char *name, const void *value, size_t size, int flags);
int (*inode_listsecurity)(struct inode *inode, char *buffer, size_t buffer_size);
+ u32 (*inode_get_secid)(struct inode *inode);

int (*file_permission) (struct file * file, int mask);
int (*file_alloc_security) (struct file * file);
@@ -1821,6 +1822,13 @@ static inline int security_inode_listsec
return security_ops->inode_listsecurity(inode, buffer, buffer_size);
}

+static inline u32 security_inode_get_secid(struct inode *inode)
+{
+ if (unlikely(IS_PRIVATE(inode)))
+ return 0;
+ return security_ops->inode_get_secid(inode);
+}
+
static inline int security_file_permission (struct file *file, int mask)
{
return security_ops->file_permission (file, mask);
@@ -2518,6 +2526,11 @@ static inline int security_inode_listsec
return 0;
}

+static inline u32 security_inode_get_secid(struct inode *inode)
+{
+ return 0;
+}
+
static inline int security_file_permission (struct file *file, int mask)
{
return 0;
diff --git a/security/dummy.c b/security/dummy.c
index f7b47a9..3401ea3 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -392,6 +392,11 @@ static int dummy_inode_listsecurity(stru
return 0;
}

+static u32 dummy_inode_get_secid(struct inode *inode)
+{
+ return 0;
+}
+
static const char *dummy_inode_xattr_getsuffix(void)
{
return NULL;
@@ -1039,6 +1044,7 @@ void security_fixup_ops (struct security
set_to_dummy_if_null(ops, inode_getsecurity);
set_to_dummy_if_null(ops, inode_setsecurity);
set_to_dummy_if_null(ops, inode_listsecurity);
+ set_to_dummy_if_null(ops, inode_get_secid);
set_to_dummy_if_null(ops, file_permission);
set_to_dummy_if_null(ops, file_alloc_security);
set_to_dummy_if_null(ops, file_free_security);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 09def09..ddac1bc 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2415,6 +2415,13 @@ static int selinux_inode_listsecurity(st
return len;
}

+static u32 selinux_inode_get_secid(struct inode *inode)
+{
+ struct inode_security_struct *isec = inode->i_security;
+
+ return isec->sid;
+}
+
/* file security operations */

static int selinux_file_permission(struct file *file, int mask)
@@ -4690,6 +4697,7 @@ static struct security_operations selinu
.inode_getsecurity = selinux_inode_getsecurity,
.inode_setsecurity = selinux_inode_setsecurity,
.inode_listsecurity = selinux_inode_listsecurity,
+ .inode_get_secid = selinux_inode_get_secid,

.file_permission = selinux_file_permission,
.file_alloc_security = selinux_file_alloc_security,

2006-11-14 20:09:22

by David Howells

[permalink] [raw]
Subject: [PATCH 02/19] FS-Cache: Provide a filesystem-specific sync'able page bit

The attached patch provides a filesystem-specific page bit that a filesystem
can synchronise upon. This can be used, for example, by a netfs to synchronise
with CacheFS writing its pages to disk.

The PG_checked bit is replaced with PG_fs_misc, and various operations are
provided based upon that. The *PageChecked() macros have also been replaced.

Signed-Off-By: David Howells <[email protected]>
---

fs/afs/dir.c | 5 +----
fs/ext2/dir.c | 6 +++---
fs/ext3/inode.c | 10 +++++-----
fs/freevxfs/vxfs_subr.c | 2 +-
fs/reiserfs/inode.c | 10 +++++-----
fs/ufs/dir.c | 6 +++---
include/linux/page-flags.h | 15 ++++++++++-----
include/linux/pagemap.h | 11 +++++++++++
mm/filemap.c | 17 +++++++++++++++++
mm/migrate.c | 4 ++--
mm/page_alloc.c | 2 +-
11 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index a6ec75c..84a2167 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -155,11 +155,9 @@ #endif
}
}

- SetPageChecked(page);
return;

error:
- SetPageChecked(page);
SetPageError(page);

} /* end afs_dir_check_page() */
@@ -191,8 +189,7 @@ static struct page *afs_dir_get_page(str
kmap(page);
if (!PageUptodate(page))
goto fail;
- if (!PageChecked(page))
- afs_dir_check_page(dir, page);
+ afs_dir_check_page(dir, page);
if (PageError(page))
goto fail;
}
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 3e7a84a..89f8318 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -112,7 +112,7 @@ static void ext2_check_page(struct page
if (offs != limit)
goto Eend;
out:
- SetPageChecked(page);
+ SetPageFsMisc(page);
return;

/* Too bad, we had an error */
@@ -152,7 +152,7 @@ Eend:
dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs,
(unsigned long) le32_to_cpu(p->inode));
fail:
- SetPageChecked(page);
+ SetPageFsMisc(page);
SetPageError(page);
}

@@ -165,7 +165,7 @@ static struct page * ext2_get_page(struc
kmap(page);
if (!PageUptodate(page))
goto fail;
- if (!PageChecked(page))
+ if (!PageFsMisc(page))
ext2_check_page(page);
if (PageError(page))
goto fail;
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 03ba5bc..1d6a61a 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1531,12 +1531,12 @@ static int ext3_journalled_writepage(str
goto no_write;
}

- if (!page_has_buffers(page) || PageChecked(page)) {
+ if (!page_has_buffers(page) || PageFsMisc(page)) {
/*
* It's mmapped pagecache. Add buffers and journal it. There
* doesn't seem much point in redirtying the page here.
*/
- ClearPageChecked(page);
+ ClearPageFsMisc(page);
ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
ext3_get_block);
if (ret != 0) {
@@ -1593,7 +1593,7 @@ static void ext3_invalidatepage(struct p
* If it's a full truncate we just forget about the pending dirtying
*/
if (offset == 0)
- ClearPageChecked(page);
+ ClearPageFsMisc(page);

journal_invalidatepage(journal, page, offset);
}
@@ -1602,7 +1602,7 @@ static int ext3_releasepage(struct page
{
journal_t *journal = EXT3_JOURNAL(page->mapping->host);

- WARN_ON(PageChecked(page));
+ WARN_ON(PageFsMisc(page));
if (!page_has_buffers(page))
return 0;
return journal_try_to_free_buffers(journal, page, wait);
@@ -1698,7 +1698,7 @@ out:
*/
static int ext3_journalled_set_page_dirty(struct page *page)
{
- SetPageChecked(page);
+ SetPageFsMisc(page);
return __set_page_dirty_nobuffers(page);
}

diff --git a/fs/freevxfs/vxfs_subr.c b/fs/freevxfs/vxfs_subr.c
index decac62..805bbb2 100644
--- a/fs/freevxfs/vxfs_subr.c
+++ b/fs/freevxfs/vxfs_subr.c
@@ -78,7 +78,7 @@ vxfs_get_page(struct address_space *mapp
kmap(pp);
if (!PageUptodate(pp))
goto fail;
- /** if (!PageChecked(pp)) **/
+ /** if (!PageFsMisc(pp)) **/
/** vxfs_check_page(pp); **/
if (PageError(pp))
goto fail;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 9c69bca..a290c26 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2342,7 +2342,7 @@ static int reiserfs_write_full_page(stru
struct buffer_head *head, *bh;
int partial = 0;
int nr = 0;
- int checked = PageChecked(page);
+ int checked = PageFsMisc(page);
struct reiserfs_transaction_handle th;
struct super_block *s = inode->i_sb;
int bh_per_page = PAGE_CACHE_SIZE / s->s_blocksize;
@@ -2420,7 +2420,7 @@ static int reiserfs_write_full_page(stru
* blocks we're going to log
*/
if (checked) {
- ClearPageChecked(page);
+ ClearPageFsMisc(page);
reiserfs_write_lock(s);
error = journal_begin(&th, s, bh_per_page + 1);
if (error) {
@@ -2801,7 +2801,7 @@ static void reiserfs_invalidatepage(stru
BUG_ON(!PageLocked(page));

if (offset == 0)
- ClearPageChecked(page);
+ ClearPageFsMisc(page);

if (!page_has_buffers(page))
goto out;
@@ -2842,7 +2842,7 @@ static int reiserfs_set_page_dirty(struc
{
struct inode *inode = page->mapping->host;
if (reiserfs_file_data_log(inode)) {
- SetPageChecked(page);
+ SetPageFsMisc(page);
return __set_page_dirty_nobuffers(page);
}
return __set_page_dirty_buffers(page);
@@ -2865,7 +2865,7 @@ static int reiserfs_releasepage(struct p
struct buffer_head *bh;
int ret = 1;

- WARN_ON(PageChecked(page));
+ WARN_ON(PageFsMisc(page));
spin_lock(&j->j_dirty_buffers_lock);
head = page_buffers(page);
bh = head;
diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 7f0a0aa..e04327c 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -135,7 +135,7 @@ static void ufs_check_page(struct page *
if (offs != limit)
goto Eend;
out:
- SetPageChecked(page);
+ SetPageFsMisc(page);
return;

/* Too bad, we had an error */
@@ -173,7 +173,7 @@ Eend:
"offset=%lu",
dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs);
fail:
- SetPageChecked(page);
+ SetPageFsMisc(page);
SetPageError(page);
}

@@ -187,7 +187,7 @@ static struct page *ufs_get_page(struct
kmap(page);
if (!PageUptodate(page))
goto fail;
- if (!PageChecked(page))
+ if (!PageFsMisc(page))
ufs_check_page(page);
if (PageError(page))
goto fail;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4830a3b..e7b5bbf 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -76,7 +76,7 @@ #define PG_lru 5
#define PG_active 6
#define PG_slab 7 /* slab debug (Suparna wants this) */

-#define PG_checked 8 /* kill me in 2.5.<early>. */
+#define PG_fs_misc 8
#define PG_arch_1 9
#define PG_reserved 10
#define PG_private 11 /* If pagecache, has fs-private data */
@@ -165,10 +165,6 @@ #else
#define PageHighMem(page) 0 /* needed to optimize away at compile time */
#endif

-#define PageChecked(page) test_bit(PG_checked, &(page)->flags)
-#define SetPageChecked(page) set_bit(PG_checked, &(page)->flags)
-#define ClearPageChecked(page) clear_bit(PG_checked, &(page)->flags)
-
#define PageReserved(page) test_bit(PG_reserved, &(page)->flags)
#define SetPageReserved(page) set_bit(PG_reserved, &(page)->flags)
#define ClearPageReserved(page) clear_bit(PG_reserved, &(page)->flags)
@@ -267,4 +263,13 @@ static inline void set_page_writeback(st
test_set_page_writeback(page);
}

+/*
+ * Filesystem-specific page bit testing
+ */
+#define PageFsMisc(page) test_bit(PG_fs_misc, &(page)->flags)
+#define SetPageFsMisc(page) set_bit(PG_fs_misc, &(page)->flags)
+#define TestSetPageFsMisc(page) test_and_set_bit(PG_fs_misc, &(page)->flags)
+#define ClearPageFsMisc(page) clear_bit(PG_fs_misc, &(page)->flags)
+#define TestClearPageFsMisc(page) test_and_clear_bit(PG_fs_misc, &(page)->flags)
+
#endif /* PAGE_FLAGS_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c3e255b..24fdc48 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -189,6 +189,17 @@ static inline void wait_on_page_writebac
extern void end_page_writeback(struct page *page);

/*
+ * Wait for filesystem-specific page synchronisation to complete
+ */
+static inline void wait_on_page_fs_misc(struct page *page)
+{
+ if (PageFsMisc(page))
+ wait_on_page_bit(page, PG_fs_misc);
+}
+
+extern void fastcall end_page_fs_misc(struct page *page);
+
+/*
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
* This assumes that two userspace pages are always sufficient. That's
diff --git a/mm/filemap.c b/mm/filemap.c
index 7b84dc8..1b73d3a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -584,6 +584,23 @@ void fastcall __lock_page_nosync(struct
TASK_UNINTERRUPTIBLE);
}

+/*
+ * Note completion of filesystem specific page synchronisation
+ *
+ * This is used to allow a page to be written to a filesystem cache in the
+ * background without holding up the completion of readpage
+ */
+void fastcall end_page_fs_misc(struct page *page)
+{
+ smp_mb__before_clear_bit();
+ if (!TestClearPageFsMisc(page))
+ BUG();
+ smp_mb__after_clear_bit();
+ __wake_up_bit(page_waitqueue(page), &page->flags, PG_fs_misc);
+}
+
+EXPORT_SYMBOL(end_page_fs_misc);
+
/**
* find_get_page - find and get a page reference
* @mapping: the address_space to search
diff --git a/mm/migrate.c b/mm/migrate.c
index b4979d4..bc278d7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -348,8 +348,8 @@ static void migrate_page_copy(struct pag
SetPageUptodate(newpage);
if (PageActive(page))
SetPageActive(newpage);
- if (PageChecked(page))
- SetPageChecked(newpage);
+ if (PageFsMisc(page))
+ SetPageFsMisc(newpage);
if (PageMappedToDisk(page))
SetPageMappedToDisk(newpage);

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf2f6cf..b4353dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -602,7 +602,7 @@ static int prep_new_page(struct page *pa

page->flags &= ~(1 << PG_uptodate | 1 << PG_error |
1 << PG_referenced | 1 << PG_arch_1 |
- 1 << PG_checked | 1 << PG_mappedtodisk);
+ 1 << PG_fs_misc | 1 << PG_mappedtodisk);
set_page_private(page, 0);
set_page_refcounted(page);
kernel_map_pages(page, 1 << order, 1);

2006-11-14 20:09:25

by David Howells

[permalink] [raw]
Subject: [PATCH 04/19] FS-Cache: Make kAFS use FS-Cache

The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
through it any attached caches. The kAFS filesystem will use caching
automatically if it's available.

Signed-Off-By: David Howells <[email protected]>
---

fs/Kconfig | 11 ++
fs/afs/cache.h | 27 -----
fs/afs/cell.c | 109 +++++++++++++--------
fs/afs/cell.h | 16 +--
fs/afs/cmservice.c | 2
fs/afs/dir.c | 10 +-
fs/afs/file.c | 265 ++++++++++++++++++++++++++++++++++------------------
fs/afs/fsclient.c | 4 +
fs/afs/inode.c | 45 ++++++---
fs/afs/internal.h | 25 ++---
fs/afs/main.c | 24 ++---
fs/afs/mntpt.c | 12 +-
fs/afs/proc.c | 1
fs/afs/server.c | 3 -
fs/afs/vlocation.c | 179 ++++++++++++++++++++++-------------
fs/afs/vnode.c | 250 ++++++++++++++++++++++++++++++++++++++++---------
fs/afs/vnode.h | 10 +-
fs/afs/volume.c | 78 ++++++---------
fs/afs/volume.h | 28 +----
19 files changed, 673 insertions(+), 426 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 9fbbe3e..aa6fad1 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -2075,8 +2075,7 @@ config CODA_FS_OLD_API
For most cases you probably want to say N.

config AFS_FS
-# for fs/nls/Config.in
- tristate "Andrew File System support (AFS) (Experimental)"
+ tristate "Andrew File System support (AFS) (EXPERIMENTAL)"
depends on INET && EXPERIMENTAL
select RXRPC
help
@@ -2087,6 +2086,14 @@ # for fs/nls/Config.in

If unsure, say N.

+config AFS_FSCACHE
+ bool "Provide AFS client caching support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ depends on AFS_FS=m && FSCACHE || AFS_FS=y && FSCACHE=y
+ help
+ Say Y here if you want AFS data to be cached locally on disk through
+ the generic filesystem cache manager
+
config RXRPC
tristate

diff --git a/fs/afs/cache.h b/fs/afs/cache.h
deleted file mode 100644
index 9eb7722..0000000
--- a/fs/afs/cache.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* cache.h: AFS local cache management interface
- *
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells ([email protected])
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#ifndef _LINUX_AFS_CACHE_H
-#define _LINUX_AFS_CACHE_H
-
-#undef AFS_CACHING_SUPPORT
-
-#include <linux/mm.h>
-#ifdef AFS_CACHING_SUPPORT
-#include <linux/cachefs.h>
-#endif
-#include "types.h"
-
-#ifdef __KERNEL__
-
-#endif /* __KERNEL__ */
-
-#endif /* _LINUX_AFS_CACHE_H */
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index bfc1fd2..3aaeada 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -31,17 +31,21 @@ static DEFINE_RWLOCK(afs_cells_lock);
static DECLARE_RWSEM(afs_cells_sem); /* add/remove serialisation */
static struct afs_cell *afs_cell_root;

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
- const void *entry);
-static void afs_cell_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_cache_cell_index_def = {
- .name = "cell_ix",
- .data_size = sizeof(struct afs_cache_cell),
- .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
- .match = afs_cell_cache_match,
- .update = afs_cell_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+
+static struct fscache_cookie_def afs_cell_cache_index_def = {
+ .name = "AFS cell",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = afs_cell_cache_get_key,
+ .get_aux = afs_cell_cache_get_aux,
+ .check_aux = afs_cell_cache_check_aux,
};
#endif

@@ -115,12 +119,11 @@ int afs_cell_create(const char *name, ch
if (ret < 0)
goto error;

-#ifdef AFS_CACHING_SUPPORT
- /* put it up for caching */
- cachefs_acquire_cookie(afs_cache_netfs.primary_index,
- &afs_vlocation_cache_index_def,
- cell,
- &cell->cache);
+#ifdef CONFIG_AFS_FSCACHE
+ /* put it up for caching (this never returns an error) */
+ cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
+ &afs_cell_cache_index_def,
+ cell);
#endif

/* add to the cell lists */
@@ -345,8 +348,8 @@ static void afs_cell_destroy(struct afs_
list_del_init(&cell->proc_link);
up_write(&afs_proc_cells_sem);

-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(cell->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(cell->cache, 0);
#endif

up_write(&afs_cells_sem);
@@ -525,44 +528,62 @@ void afs_cell_purge(void)

/*****************************************************************************/
/*
- * match a cell record obtained from the cache
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_cell_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_cell *ccell = entry;
- struct afs_cell *cell = target;
+ const struct afs_cell *cell = cookie_netfs_data;
+ uint16_t klen;

- _enter("{%s},{%s}", ccell->name, cell->name);
+ _enter("%p,%p,%u", cell, buffer, bufmax);

- if (strncmp(ccell->name, cell->name, sizeof(ccell->name)) == 0) {
- _leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
- }
+ klen = strlen(cell->name);
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, cell->name, klen);
+ return klen;

- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
-} /* end afs_cell_cache_match() */
+} /* end afs_cell_cache_get_key() */
#endif

/*****************************************************************************/
/*
- * update a cell record in the cache
+ * provide new auxilliary cache data
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_cell_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_cell_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- struct afs_cache_cell *ccell = entry;
- struct afs_cell *cell = source;
+ const struct afs_cell *cell = cookie_netfs_data;
+ uint16_t dlen;

- _enter("%p,%p", source, entry);
+ _enter("%p,%p,%u", cell, buffer, bufmax);

- strncpy(ccell->name, cell->name, sizeof(ccell->name));
+ dlen = cell->vl_naddrs * sizeof(cell->vl_addrs[0]);
+ dlen = min(dlen, bufmax);
+ dlen &= ~(sizeof(cell->vl_addrs[0]) - 1);

- memcpy(ccell->vl_servers,
- cell->vl_addrs,
- min(sizeof(ccell->vl_servers), sizeof(cell->vl_addrs)));
+ memcpy(buffer, cell->vl_addrs, dlen);
+
+ return dlen;
+
+} /* end afs_cell_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_cell_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen)
+{
+ _leave(" = OKAY");
+ return FSCACHE_CHECKAUX_OKAY;

-} /* end afs_cell_cache_update() */
+} /* end afs_cell_cache_check_aux() */
#endif
diff --git a/fs/afs/cell.h b/fs/afs/cell.h
index 4834910..d670502 100644
--- a/fs/afs/cell.h
+++ b/fs/afs/cell.h
@@ -13,7 +13,7 @@ #ifndef _LINUX_AFS_CELL_H
#define _LINUX_AFS_CELL_H

#include "types.h"
-#include "cache.h"
+#include <linux/fscache.h>

#define AFS_CELL_MAX_ADDRS 15

@@ -21,16 +21,6 @@ extern volatile int afs_cells_being_purg

/*****************************************************************************/
/*
- * entry in the cached cell catalogue
- */
-struct afs_cache_cell
-{
- char name[64]; /* cell name (padded with NULs) */
- struct in_addr vl_servers[15]; /* cached cell VL servers */
-};
-
-/*****************************************************************************/
-/*
* AFS cell record
*/
struct afs_cell
@@ -39,8 +29,8 @@ struct afs_cell
struct list_head link; /* main cell list link */
struct list_head proc_link; /* /proc cell list link */
struct proc_dir_entry *proc_dir; /* /proc dir for this cell */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif

/* server record management */
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 3d097fd..f87d5a7 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -24,7 +24,7 @@ #include "cmservice.h"
#include "internal.h"

static unsigned afscm_usage; /* AFS cache manager usage count */
-static struct rw_semaphore afscm_sem; /* AFS cache manager start/stop semaphore */
+static DECLARE_RWSEM(afscm_sem); /* AFS cache manager start/stop semaphore */

static int afscm_new_call(struct rxrpc_call *call);
static void afscm_attention(struct rxrpc_call *call);
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 84a2167..2a89c20 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -145,7 +145,7 @@ #endif
qty /= sizeof(union afs_dir_block);

/* check them */
- dbuf = page_address(page);
+ dbuf = kmap_atomic(page, KM_USER0);
for (tmp = 0; tmp < qty; tmp++) {
if (dbuf->blocks[tmp].pagehdr.magic != AFS_DIR_MAGIC) {
printk("kAFS: %s(%lu): bad magic %d/%d is %04hx\n",
@@ -154,10 +154,12 @@ #endif
goto error;
}
}
+ kunmap_atomic(dbuf, KM_USER0);

return;

error:
+ kunmap_atomic(dbuf, KM_USER0);
SetPageError(page);

} /* end afs_dir_check_page() */
@@ -168,7 +170,6 @@ #endif
*/
static inline void afs_dir_put_page(struct page *page)
{
- kunmap(page);
page_cache_release(page);

} /* end afs_dir_put_page() */
@@ -186,7 +187,6 @@ static struct page *afs_dir_get_page(str
page = read_mapping_page(dir->i_mapping, index, NULL);
if (!IS_ERR(page)) {
wait_on_page_locked(page);
- kmap(page);
if (!PageUptodate(page))
goto fail;
afs_dir_check_page(dir, page);
@@ -354,7 +354,7 @@ static int afs_dir_iterate(struct inode

limit = blkoff & ~(PAGE_SIZE - 1);

- dbuf = page_address(page);
+ dbuf = kmap_atomic(page, KM_USER0);

/* deal with the individual blocks stashed on this page */
do {
@@ -363,6 +363,7 @@ static int afs_dir_iterate(struct inode
ret = afs_dir_iterate_block(fpos, dblock, blkoff,
cookie, filldir);
if (ret != 1) {
+ kunmap_atomic(dbuf, KM_USER0);
afs_dir_put_page(page);
goto out;
}
@@ -371,6 +372,7 @@ static int afs_dir_iterate(struct inode

} while (*fpos < dir->i_size && blkoff < limit);

+ kunmap_atomic(dbuf, KM_USER0);
afs_dir_put_page(page);
ret = 0;
}
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 2e8c426..7c6458c 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -21,6 +21,8 @@ #include "vnode.h"
#include <rxrpc/call.h>
#include "internal.h"

+#define list_to_page(head) (list_entry((head)->prev, struct page, lru))
+
#if 0
static int afs_file_open(struct inode *inode, struct file *file);
static int afs_file_release(struct inode *inode, struct file *file);
@@ -29,54 +31,91 @@ #endif
static int afs_file_readpage(struct file *file, struct page *page);
static void afs_file_invalidatepage(struct page *page, unsigned long offset);
static int afs_file_releasepage(struct page *page, gfp_t gfp_flags);
+static int afs_file_mmap(struct file * file, struct vm_area_struct * vma);
+
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages);
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+#endif

struct inode_operations afs_file_inode_operations = {
.getattr = afs_inode_getattr,
};

+const struct file_operations afs_file_file_operations = {
+ .llseek = generic_file_llseek,
+ .mmap = afs_file_mmap,
+ .sendfile = generic_file_sendfile,
+};
+
const struct address_space_operations afs_fs_aops = {
.readpage = afs_file_readpage,
+#ifdef CONFIG_AFS_FSCACHE
+ .readpages = afs_file_readpages,
+#endif
.set_page_dirty = __set_page_dirty_nobuffers,
.releasepage = afs_file_releasepage,
.invalidatepage = afs_file_invalidatepage,
};

+static struct vm_operations_struct afs_fs_vm_operations = {
+ .nopage = filemap_nopage,
+ .populate = filemap_populate,
+#ifdef CONFIG_AFS_FSCACHE
+ .page_mkwrite = afs_file_page_mkwrite,
+#endif
+};
+
+/*****************************************************************************/
+/*
+ * set up a memory mapping on an AFS file
+ * - we set our own VMA ops so that we can catch the page becoming writable for
+ * userspace for shared-writable mmap
+ */
+static int afs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ _enter("");
+
+ file_accessed(file);
+ vma->vm_ops = &afs_fs_vm_operations;
+ return 0;
+}
+
/*****************************************************************************/
/*
* deal with notification that a page was read from the cache
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_read_complete(void *cookie_data,
- struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_read_complete(struct page *page,
void *data,
int error)
{
- _enter("%p,%p,%p,%d", cookie_data, page, data, error);
+ _enter("%p,%p,%d", page, data, error);

- if (error)
- SetPageError(page);
- else
+ /* if the read completes with an error, we just unlock the page and let
+ * the VM reissue the readpage */
+ if (!error)
SetPageUptodate(page);
unlock_page(page);
-
-} /* end afs_file_readpage_read_complete() */
+}
#endif

/*****************************************************************************/
/*
* deal with notification that a page was written to the cache
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_file_readpage_write_complete(void *cookie_data,
- struct page *page,
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_file_readpage_write_complete(struct page *page,
void *data,
int error)
{
- _enter("%p,%p,%p,%d", cookie_data, page, data, error);
-
- unlock_page(page);
+ _enter("%p,%p,%d", page, data, error);

-} /* end afs_file_readpage_write_complete() */
+ /* note that the page has been written to the cache and can now be
+ * modified */
+ end_page_fs_misc(page);
+}
#endif

/*****************************************************************************/
@@ -86,16 +125,13 @@ #endif
static int afs_file_readpage(struct file *file, struct page *page)
{
struct afs_rxfs_fetch_descriptor desc;
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_page *pageio;
-#endif
struct afs_vnode *vnode;
struct inode *inode;
int ret;

inode = page->mapping->host;

- _enter("{%lu},{%lu}", inode->i_ino, page->index);
+ _enter("{%lu},%p{%lu}", inode->i_ino, page, page->index);

vnode = AFS_FS_I(inode);

@@ -105,13 +141,9 @@ #endif
if (vnode->flags & AFS_VNODE_DELETED)
goto error;

-#ifdef AFS_CACHING_SUPPORT
- ret = cachefs_page_get_private(page, &pageio, GFP_NOIO);
- if (ret < 0)
- goto error;
-
+#ifdef CONFIG_AFS_FSCACHE
/* is it cached? */
- ret = cachefs_read_or_alloc_page(vnode->cache,
+ ret = fscache_read_or_alloc_page(vnode->cache,
page,
afs_file_readpage_read_complete,
NULL,
@@ -121,18 +153,20 @@ #else
#endif

switch (ret) {
- /* read BIO submitted and wb-journal entry found */
- case 1:
- BUG(); // TODO - handle wb-journal match
-
/* read BIO submitted (page in cache) */
case 0:
break;

- /* no page available in cache */
- case -ENOBUFS:
+ /* page not yet cached */
case -ENODATA:
+ _debug("cache said ENODATA");
+ goto go_on;
+
+ /* page will not be cached */
+ case -ENOBUFS:
+ _debug("cache said ENOBUFS");
default:
+ go_on:
desc.fid = vnode->fid;
desc.offset = page->index << PAGE_CACHE_SHIFT;
desc.size = min((size_t) (inode->i_size - desc.offset),
@@ -146,34 +180,40 @@ #endif
ret = afs_vnode_fetch_data(vnode, &desc);
kunmap(page);
if (ret < 0) {
- if (ret==-ENOENT) {
- _debug("got NOENT from server"
+ if (ret == -ENOENT) {
+ kdebug("got NOENT from server"
" - marking file deleted and stale");
vnode->flags |= AFS_VNODE_DELETED;
ret = -ESTALE;
}

-#ifdef AFS_CACHING_SUPPORT
- cachefs_uncache_page(vnode->cache, page);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_uncache_page(vnode->cache, page);
+ ClearPagePrivate(page);
#endif
goto error;
}

SetPageUptodate(page);

-#ifdef AFS_CACHING_SUPPORT
- if (cachefs_write_page(vnode->cache,
- page,
- afs_file_readpage_write_complete,
- NULL,
- GFP_KERNEL) != 0
- ) {
- cachefs_uncache_page(vnode->cache, page);
- unlock_page(page);
+ /* send the page to the cache */
+#ifdef CONFIG_AFS_FSCACHE
+ if (PagePrivate(page)) {
+ if (TestSetPageFsMisc(page))
+ BUG();
+ if (fscache_write_page(vnode->cache,
+ page,
+ afs_file_readpage_write_complete,
+ NULL,
+ GFP_KERNEL) != 0
+ ) {
+ fscache_uncache_page(vnode->cache, page);
+ ClearPagePrivate(page);
+ end_page_fs_misc(page);
+ }
}
-#else
- unlock_page(page);
#endif
+ unlock_page(page);
}

_leave(" = 0");
@@ -185,87 +225,124 @@ #endif

_leave(" = %d", ret);
return ret;
-
-} /* end afs_file_readpage() */
+}

/*****************************************************************************/
/*
- * get a page cookie for the specified page
+ * read a set of pages
*/
-#ifdef AFS_CACHING_SUPPORT
-int afs_cache_get_page_cookie(struct page *page,
- struct cachefs_page **_page_cookie)
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_readpages(struct file *filp, struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages)
{
- int ret;
+ struct afs_vnode *vnode;
+ int ret = 0;

- _enter("");
- ret = cachefs_page_get_private(page,_page_cookie, GFP_NOIO);
+ _enter(",{%lu},,%d", mapping->host->i_ino, nr_pages);

- _leave(" = %d", ret);
+ vnode = AFS_FS_I(mapping->host);
+ if (vnode->flags & AFS_VNODE_DELETED) {
+ _leave(" = -ESTALE");
+ return -ESTALE;
+ }
+
+ /* attempt to read as many of the pages as possible */
+ ret = fscache_read_or_alloc_pages(vnode->cache,
+ mapping,
+ pages,
+ &nr_pages,
+ afs_file_readpage_read_complete,
+ NULL,
+ mapping_gfp_mask(mapping));
+
+ switch (ret) {
+ /* all pages are being read from the cache */
+ case 0:
+ BUG_ON(!list_empty(pages));
+ BUG_ON(nr_pages != 0);
+ _leave(" = 0 [reading all]");
+ return 0;
+
+ /* there were pages that couldn't be read from the cache */
+ case -ENODATA:
+ case -ENOBUFS:
+ break;
+
+ /* other error */
+ default:
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ /* load the missing pages from the network */
+ ret = read_cache_pages(mapping, pages,
+ (void *) afs_file_readpage, NULL);
+
+ _leave(" = %d [netting]", ret);
return ret;
-} /* end afs_cache_get_page_cookie() */
+}
#endif

/*****************************************************************************/
/*
* invalidate part or all of a page
+ * - release a page and clean up its private data if offset is 0 (indicating
+ * the entire page)
*/
static void afs_file_invalidatepage(struct page *page, unsigned long offset)
{
- int ret = 1;
-
_enter("{%lu},%lu", page->index, offset);

BUG_ON(!PageLocked(page));

if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- cachefs_uncache_page(vnode->cache,page);
+ /* we clean up only if the entire page is being invalidated */
+ if (offset == 0 && !PageWriteback(page)) {
+#ifdef CONFIG_AFS_FSCACHE
+ wait_on_page_fs_misc(page);
+ fscache_uncache_page(
+ AFS_FS_I(page->mapping->host)->cache, page);
+ ClearPagePrivate(page);
#endif
-
- /* We release buffers only if the entire page is being
- * invalidated.
- * The get_block cached value has been unconditionally
- * invalidated, so real IO is not possible anymore.
- */
- if (offset == 0) {
- BUG_ON(!PageLocked(page));
-
- ret = 0;
- if (!PageWriteback(page))
- ret = page->mapping->a_ops->releasepage(page,
- 0);
- /* possibly should BUG_ON(!ret); - neilb */
}
}

- _leave(" = %d", ret);
-} /* end afs_file_invalidatepage() */
+ _leave("");
+}

/*****************************************************************************/
/*
- * release a page and cleanup its private data
+ * release a page and clean up its private state if it's not busy
+ * - return true if the page can now be released, false if not
*/
static int afs_file_releasepage(struct page *page, gfp_t gfp_flags)
{
- struct cachefs_page *pageio;
-
_enter("{%lu},%x", page->index, gfp_flags);

- if (PagePrivate(page)) {
-#ifdef AFS_CACHING_SUPPORT
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- cachefs_uncache_page(vnode->cache, page);
-#endif
+#ifdef CONFIG_AFS_FSCACHE
+ /* deny if page is being written to the cache */
+ if (PageFsMisc(page)) {
+ _leave(" = F");
+ return 0;
+ }

- pageio = (struct cachefs_page *) page_private(page);
- set_page_private(page, 0);
- ClearPagePrivate(page);
+ fscache_uncache_page(AFS_FS_I(page->mapping->host)->cache, page);
+#endif

- kfree(pageio);
- }
+ /* indicate that the page can be released */
+ _leave(" = T");
+ return 1;
+}

- _leave(" = 0");
+/*****************************************************************************/
+/*
+ * wait for the disc cache to finish writing before permitting modification of
+ * our page in the page cache
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static int afs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+ wait_on_page_fs_misc(page);
return 0;
-} /* end afs_file_releasepage() */
+}
+#endif
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 61bc371..c88c41a 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -398,6 +398,8 @@ int afs_rxfs_fetch_file_status(struct af
bp++; /* spare6 */
}

+ _debug("Data Version %llx\n", vnode->status.version);
+
/* success */
ret = 0;

@@ -408,7 +410,7 @@ int afs_rxfs_fetch_file_status(struct af
out_put_conn:
afs_server_release_callslot(server, &callslot);
out:
- _leave("");
+ _leave(" = %d", ret);
return ret;

abort:
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 6f37754..efbf020 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -49,7 +49,7 @@ static int afs_inode_map_status(struct a
case AFS_FTYPE_FILE:
inode->i_mode = S_IFREG | vnode->status.mode;
inode->i_op = &afs_file_inode_operations;
- inode->i_fop = &generic_ro_fops;
+ inode->i_fop = &afs_file_file_operations;
break;
case AFS_FTYPE_DIR:
inode->i_mode = S_IFDIR | vnode->status.mode;
@@ -65,6 +65,11 @@ static int afs_inode_map_status(struct a
return -EBADMSG;
}

+#ifdef CONFIG_AFS_FSCACHE
+ if (vnode->status.size != inode->i_size)
+ fscache_set_i_size(vnode->cache, vnode->status.size);
+#endif
+
inode->i_nlink = vnode->status.nlink;
inode->i_uid = vnode->status.owner;
inode->i_gid = 0;
@@ -100,13 +105,33 @@ static int afs_inode_fetch_status(struct
struct afs_vnode *vnode;
int ret;

+ _enter("");
+
vnode = AFS_FS_I(inode);

ret = afs_vnode_fetch_status(vnode);

- if (ret == 0)
+ if (ret == 0) {
+#ifdef CONFIG_AFS_FSCACHE
+ if (!vnode->cache) {
+ vnode->cache =
+ fscache_acquire_cookie(vnode->volume->cache,
+ &afs_vnode_cache_index_def,
+ vnode);
+ if (!vnode->cache)
+ printk("Negative\n");
+ }
+#endif
ret = afs_inode_map_status(vnode);
+#ifdef CONFIG_AFS_FSCACHE
+ if (ret < 0) {
+ fscache_relinquish_cookie(vnode->cache, 0);
+ vnode->cache = NULL;
+ }
+#endif
+ }

+ _leave(" = %d", ret);
return ret;

} /* end afs_inode_fetch_status() */
@@ -121,6 +146,7 @@ static int afs_iget5_test(struct inode *

return inode->i_ino == data->fid.vnode &&
inode->i_version == data->fid.unique;
+
} /* end afs_iget5_test() */

/*****************************************************************************/
@@ -178,20 +204,11 @@ inline int afs_iget(struct super_block *
return ret;
}

-#ifdef AFS_CACHING_SUPPORT
- /* set up caching before reading the status, as fetch-status reads the
- * first page of symlinks to see if they're really mntpts */
- cachefs_acquire_cookie(vnode->volume->cache,
- NULL,
- vnode,
- &vnode->cache);
-#endif
-
/* okay... it's a new inode */
inode->i_flags |= S_NOATIME;
vnode->flags |= AFS_VNODE_CHANGED;
ret = afs_inode_fetch_status(inode);
- if (ret<0)
+ if (ret < 0)
goto bad_inode;

/* success */
@@ -277,8 +294,8 @@ void afs_clear_inode(struct inode *inode

afs_vnode_give_up_callback(vnode);

-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(vnode->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(vnode->cache, 0);
vnode->cache = NULL;
#endif

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index e88b3b6..482dbd1 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -16,15 +16,17 @@ #include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/fscache.h>

/*
* debug tracing
*/
-#define kenter(FMT, a...) printk("==> %s("FMT")\n",__FUNCTION__ , ## a)
-#define kleave(FMT, a...) printk("<== %s()"FMT"\n",__FUNCTION__ , ## a)
-#define kdebug(FMT, a...) printk(FMT"\n" , ## a)
-#define kproto(FMT, a...) printk("### "FMT"\n" , ## a)
-#define knet(FMT, a...) printk(FMT"\n" , ## a)
+#define __kdbg(FMT, a...) printk("[%05d] "FMT"\n", current->pid , ## a)
+#define kenter(FMT, a...) __kdbg("==> %s("FMT")", __FUNCTION__ , ## a)
+#define kleave(FMT, a...) __kdbg("<== %s()"FMT, __FUNCTION__ , ## a)
+#define kdebug(FMT, a...) __kdbg(FMT , ## a)
+#define kproto(FMT, a...) __kdbg("### "FMT , ## a)
+#define knet(FMT, a...) __kdbg(FMT , ## a)

#ifdef __KDEBUG
#define _enter(FMT, a...) kenter(FMT , ## a)
@@ -56,9 +58,6 @@ static inline void afs_discard_my_signal
*/
extern struct rw_semaphore afs_proc_cells_sem;
extern struct list_head afs_proc_cells;
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_cache_cell_index_def;
-#endif

/*
* dir.c
@@ -71,11 +70,7 @@ extern const struct file_operations afs_
*/
extern const struct address_space_operations afs_fs_aops;
extern struct inode_operations afs_file_inode_operations;
-
-#ifdef AFS_CACHING_SUPPORT
-extern int afs_cache_get_page_cookie(struct page *page,
- struct cachefs_page **_page_cookie);
-#endif
+extern const struct file_operations afs_file_file_operations;

/*
* inode.c
@@ -97,8 +92,8 @@ #endif
/*
* main.c
*/
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_netfs afs_cache_netfs;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_netfs afs_cache_netfs;
#endif

/*
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 913c689..5840bb2 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -1,6 +1,6 @@
/* main.c: AFS client file system
*
- * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2002,5 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
* This program is free software; you can redistribute it and/or
@@ -14,11 +14,11 @@ #include <linux/moduleparam.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/completion.h>
+#include <linux/fscache.h>
#include <rxrpc/rxrpc.h>
#include <rxrpc/transport.h>
#include <rxrpc/call.h>
#include <rxrpc/peer.h>
-#include "cache.h"
#include "cell.h"
#include "server.h"
#include "fsclient.h"
@@ -51,12 +51,11 @@ static struct rxrpc_peer_ops afs_peer_op
struct list_head afs_cb_hash_tbl[AFS_CB_HASH_COUNT];
DEFINE_SPINLOCK(afs_cb_hash_lock);

-#ifdef AFS_CACHING_SUPPORT
-static struct cachefs_netfs_operations afs_cache_ops = {
- .get_page_cookie = afs_cache_get_page_cookie,
+#ifdef CONFIG_AFS_FSCACHE
+static struct fscache_netfs_operations afs_cache_ops = {
};

-struct cachefs_netfs afs_cache_netfs = {
+struct fscache_netfs afs_cache_netfs = {
.name = "afs",
.version = 0,
.ops = &afs_cache_ops,
@@ -83,10 +82,9 @@ static int __init afs_init(void)
if (ret < 0)
return ret;

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
/* we want to be able to cache */
- ret = cachefs_register_netfs(&afs_cache_netfs,
- &afs_cache_cell_index_def);
+ ret = fscache_register_netfs(&afs_cache_netfs);
if (ret < 0)
goto error;
#endif
@@ -137,8 +135,8 @@ #ifdef CONFIG_KEYS_TURNED_OFF
afs_key_unregister();
error_cache:
#endif
-#ifdef AFS_CACHING_SUPPORT
- cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_unregister_netfs(&afs_cache_netfs);
error:
#endif
afs_cell_purge();
@@ -167,8 +165,8 @@ static void __exit afs_exit(void)
#ifdef CONFIG_KEYS_TURNED_OFF
afs_key_unregister();
#endif
-#ifdef AFS_CACHING_SUPPORT
- cachefs_unregister_netfs(&afs_cache_netfs);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_unregister_netfs(&afs_cache_netfs);
#endif
afs_proc_cleanup();

diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c
index 99785a7..2a53d51 100644
--- a/fs/afs/mntpt.c
+++ b/fs/afs/mntpt.c
@@ -78,7 +78,7 @@ int afs_mntpt_check_symlink(struct afs_v

ret = -EIO;
wait_on_page_locked(page);
- buf = kmap(page);
+ buf = kmap_atomic(page, KM_USER0);
if (!PageUptodate(page))
goto out_free;
if (PageError(page))
@@ -101,7 +101,7 @@ int afs_mntpt_check_symlink(struct afs_v
ret = 0;

out_free:
- kunmap(page);
+ kunmap_atomic(buf, KM_USER0);
page_cache_release(page);
out:
_leave(" = %d", ret);
@@ -188,9 +188,9 @@ static struct vfsmount *afs_mntpt_do_aut
if (!PageUptodate(page) || PageError(page))
goto error;

- buf = kmap(page);
+ buf = kmap_atomic(page, KM_USER0);
memcpy(devname, buf, size);
- kunmap(page);
+ kunmap_atomic(buf, KM_USER0);
page_cache_release(page);
page = NULL;

@@ -269,12 +269,12 @@ static void *afs_mntpt_follow_link(struc
*/
static void afs_mntpt_expiry_timed_out(struct afs_timer *timer)
{
- kenter("");
+// kenter("");

mark_mounts_for_expiry(&afs_vfsmounts);

afs_kafstimod_add_timer(&afs_mntpt_expiry_timer,
afs_mntpt_expiry_timeout * HZ);

- kleave("");
+// kleave("");
} /* end afs_mntpt_expiry_timed_out() */
diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 86463ec..57793aa 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -177,6 +177,7 @@ int afs_proc_init(void)
*/
void afs_proc_cleanup(void)
{
+ remove_proc_entry("rootcell", proc_afs);
remove_proc_entry("cells", proc_afs);

remove_proc_entry("fs/afs", NULL);
diff --git a/fs/afs/server.c b/fs/afs/server.c
index 22afaae..e94628c 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -375,7 +375,6 @@ int afs_server_request_callslot(struct a
else if (list_empty(&server->fs_callq)) {
/* no one waiting */
server->fs_conn_cnt[nconn]++;
- spin_unlock(&server->fs_lock);
}
else {
/* someone's waiting - dequeue them and wake them up */
@@ -393,9 +392,9 @@ int afs_server_request_callslot(struct a
}
pcallslot->ready = 1;
wake_up_process(pcallslot->task);
- spin_unlock(&server->fs_lock);
}

+ spin_unlock(&server->fs_lock);
rxrpc_put_connection(callslot->conn);
callslot->conn = NULL;

diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index 782ee7c..d55205d 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -59,17 +59,21 @@ static LIST_HEAD(afs_vlocation_update_pe
static struct afs_vlocation *afs_vlocation_update; /* VL currently being updated */
static DEFINE_SPINLOCK(afs_vlocation_update_lock); /* lock guarding update queue */

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
- const void *entry);
-static void afs_vlocation_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vlocation_cache_index_def = {
- .name = "vldb",
- .data_size = sizeof(struct afs_cache_vlocation),
- .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
- .match = afs_vlocation_cache_match,
- .update = afs_vlocation_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+
+static struct fscache_cookie_def afs_vlocation_cache_index_def = {
+ .name = "AFS.vldb",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = afs_vlocation_cache_get_key,
+ .get_aux = afs_vlocation_cache_get_aux,
+ .check_aux = afs_vlocation_cache_check_aux,
};
#endif

@@ -299,13 +303,12 @@ int afs_vlocation_lookup(struct afs_cell

list_add_tail(&vlocation->link, &cell->vl_list);

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
/* we want to store it in the cache, plus it might already be
* encached */
- cachefs_acquire_cookie(cell->cache,
- &afs_volume_cache_index_def,
- vlocation,
- &vlocation->cache);
+ vlocation->cache = fscache_acquire_cookie(cell->cache,
+ &afs_vlocation_cache_index_def,
+ vlocation);

if (vlocation->valid)
goto found_in_cache;
@@ -339,7 +342,7 @@ #endif
active:
active = 1;

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
found_in_cache:
#endif
/* try to look up a cached volume in the cell VL databases by ID */
@@ -421,9 +424,9 @@ #endif

afs_kafstimod_add_timer(&vlocation->upd_timer, 10 * HZ);

-#ifdef AFS_CACHING_SUPPORT
+#ifdef CONFIG_AFS_FSCACHE
/* update volume entry in local cache */
- cachefs_update_cookie(vlocation->cache);
+ fscache_update_cookie(vlocation->cache);
#endif

*_vlocation = vlocation;
@@ -437,8 +440,8 @@ #endif
}
else {
list_del(&vlocation->link);
-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(vlocation->cache, 0);
#endif
afs_put_cell(vlocation->cell);
kfree(vlocation);
@@ -535,8 +538,8 @@ void afs_vlocation_do_timeout(struct afs
}

/* we can now destroy it properly */
-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(vlocation->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(vlocation->cache, 0);
#endif
afs_put_cell(cell);

@@ -887,65 +890,103 @@ static void afs_vlocation_update_discard

/*****************************************************************************/
/*
- * match a VLDB record stored in the cache
- * - may also load target from entry
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vlocation_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_vlocation *vldb = entry;
- struct afs_vlocation *vlocation = target;
+ const struct afs_vlocation *vlocation = cookie_netfs_data;
+ uint16_t klen;

- _enter("{%s},{%s}", vlocation->vldb.name, vldb->name);
+ _enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);

- if (strncmp(vlocation->vldb.name, vldb->name, sizeof(vldb->name)) == 0
- ) {
- if (!vlocation->valid ||
- vlocation->vldb.rtime == vldb->rtime
- ) {
- vlocation->vldb = *vldb;
- vlocation->valid = 1;
- _leave(" = SUCCESS [c->m]");
- return CACHEFS_MATCH_SUCCESS;
- }
- /* need to update cache if cached info differs */
- else if (memcmp(&vlocation->vldb, vldb, sizeof(*vldb)) != 0) {
- /* delete if VIDs for this name differ */
- if (memcmp(&vlocation->vldb.vid,
- &vldb->vid,
- sizeof(vldb->vid)) != 0) {
- _leave(" = DELETE");
- return CACHEFS_MATCH_SUCCESS_DELETE;
- }
+ klen = strnlen(vlocation->vldb.name, sizeof(vlocation->vldb.name));
+ if (klen > bufmax)
+ return 0;

- _leave(" = UPDATE");
- return CACHEFS_MATCH_SUCCESS_UPDATE;
- }
- else {
- _leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
- }
- }
+ memcpy(buffer, vlocation->vldb.name, klen);
+
+ _leave(" = %u", klen);
+ return klen;

- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
-} /* end afs_vlocation_cache_match() */
+} /* end afs_vlocation_cache_get_key() */
#endif

/*****************************************************************************/
/*
- * update a VLDB record stored in the cache
+ * provide new auxilliary cache data
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vlocation_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vlocation_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- struct afs_cache_vlocation *vldb = entry;
- struct afs_vlocation *vlocation = source;
+ const struct afs_vlocation *vlocation = cookie_netfs_data;
+ uint16_t dlen;

- _enter("");
+ _enter("{%s},%p,%u", vlocation->vldb.name, buffer, bufmax);
+
+ dlen = sizeof(struct afs_cache_vlocation);
+ dlen -= offsetof(struct afs_cache_vlocation, nservers);
+ if (dlen > bufmax)
+ return 0;
+
+ memcpy(buffer, (uint8_t *)&vlocation->vldb.nservers, dlen);
+
+ _leave(" = %u", dlen);
+ return dlen;
+
+} /* end afs_vlocation_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vlocation_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen)
+{
+ const struct afs_cache_vlocation *cvldb;
+ struct afs_vlocation *vlocation = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%s},%p,%u", vlocation->vldb.name, buffer, buflen);
+
+ /* check the size of the data is what we're expecting */
+ dlen = sizeof(struct afs_cache_vlocation);
+ dlen -= offsetof(struct afs_cache_vlocation, nservers);
+ if (dlen != buflen)
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ cvldb = container_of(buffer, struct afs_cache_vlocation, nservers);
+
+ /* if what's on disk is more valid than what's in memory, then use the
+ * VL record from the cache */
+ if (!vlocation->valid || vlocation->vldb.rtime == cvldb->rtime) {
+ memcpy((uint8_t *)&vlocation->vldb.nservers, buffer, dlen);
+ vlocation->valid = 1;
+ _leave(" = SUCCESS [c->m]");
+ return FSCACHE_CHECKAUX_OKAY;
+ }
+
+ /* need to update the cache if the cached info differs */
+ if (memcmp(&vlocation->vldb, buffer, dlen) != 0) {
+ /* delete if the volume IDs for this name differ */
+ if (memcmp(&vlocation->vldb.vid, &cvldb->vid,
+ sizeof(cvldb->vid)) != 0
+ ) {
+ _leave(" = OBSOLETE");
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ _leave(" = UPDATE");
+ return FSCACHE_CHECKAUX_NEEDS_UPDATE;
+ }

- *vldb = vlocation->vldb;
+ _leave(" = OKAY");
+ return FSCACHE_CHECKAUX_OKAY;

-} /* end afs_vlocation_cache_update() */
+} /* end afs_vlocation_cache_check_aux() */
#endif
diff --git a/fs/afs/vnode.c b/fs/afs/vnode.c
index cf62da5..cd72674 100644
--- a/fs/afs/vnode.c
+++ b/fs/afs/vnode.c
@@ -29,17 +29,30 @@ struct afs_timer_ops afs_vnode_cb_timed_
.timed_out = afs_vnode_cb_timed_out,
};

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
- const void *entry);
-static void afs_vnode_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_vnode_cache_index_def = {
- .name = "vnode",
- .data_size = sizeof(struct afs_cache_vnode),
- .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 4 },
- .match = afs_vnode_cache_match,
- .update = afs_vnode_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+ uint64_t *size);
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen);
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec);
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data);
+
+struct fscache_cookie_def afs_vnode_cache_index_def = {
+ .name = "AFS.vnode",
+ .type = FSCACHE_COOKIE_TYPE_DATAFILE,
+ .get_key = afs_vnode_cache_get_key,
+ .get_attr = afs_vnode_cache_get_attr,
+ .get_aux = afs_vnode_cache_get_aux,
+ .check_aux = afs_vnode_cache_check_aux,
+ .mark_pages_cached = afs_vnode_cache_mark_pages_cached,
+ .now_uncached = afs_vnode_cache_now_uncached,
};
#endif

@@ -188,6 +201,8 @@ int afs_vnode_fetch_status(struct afs_vn

if (vnode->update_cnt > 0) {
/* someone else started a fetch */
+ _debug("conflict");
+
set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&vnode->update_waitq, &myself);

@@ -219,6 +234,7 @@ int afs_vnode_fetch_status(struct afs_vn
spin_unlock(&vnode->lock);
set_current_state(TASK_RUNNING);

+ _leave(" [conflicted, %d", !!(vnode->flags & AFS_VNODE_DELETED));
return vnode->flags & AFS_VNODE_DELETED ? -ENOENT : 0;
}

@@ -341,54 +357,200 @@ int afs_vnode_give_up_callback(struct af

/*****************************************************************************/
/*
- * match a vnode record stored in the cache
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_vnode_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_vnode *cvnode = entry;
- struct afs_vnode *vnode = target;
+ const struct afs_vnode *vnode = cookie_netfs_data;
+ uint16_t klen;

- _enter("{%x,%x,%Lx},{%x,%x,%Lx}",
- vnode->fid.vnode,
- vnode->fid.unique,
- vnode->status.version,
- cvnode->vnode_id,
- cvnode->vnode_unique,
- cvnode->data_version);
-
- if (vnode->fid.vnode != cvnode->vnode_id) {
- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
+ _enter("{%x,%x,%Lx},%p,%u",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+ buffer, bufmax);
+
+ klen = sizeof(vnode->fid.vnode);
+ if (klen > bufmax)
+ return 0;
+
+ memcpy(buffer, &vnode->fid.vnode, sizeof(vnode->fid.vnode));
+
+ _leave(" = %u", klen);
+ return klen;
+
+} /* end afs_vnode_cache_get_key() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide an updated file attributes
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_get_attr(const void *cookie_netfs_data,
+ uint64_t *size)
+{
+ const struct afs_vnode *vnode = cookie_netfs_data;
+
+ _enter("{%x,%x,%Lx},",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+ *size = i_size_read((struct inode *) &vnode->vfs_inode);
+
+} /* end afs_vnode_cache_get_attr() */
+#endif
+
+/*****************************************************************************/
+/*
+ * provide new auxilliary cache data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_vnode_cache_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct afs_vnode *vnode = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%x,%x,%Lx},%p,%u",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+ buffer, bufmax);
+
+ dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+ if (dlen > bufmax)
+ return 0;
+
+ memcpy(buffer, &vnode->fid.unique, sizeof(vnode->fid.unique));
+ buffer += sizeof(vnode->fid.unique);
+ memcpy(buffer, &vnode->status.version, sizeof(vnode->status.version));
+
+ _leave(" = %u", dlen);
+ return dlen;
+
+} /* end afs_vnode_cache_get_aux() */
+#endif
+
+/*****************************************************************************/
+/*
+ * check that the auxilliary data indicates that the entry is still valid
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static fscache_checkaux_t afs_vnode_cache_check_aux(void *cookie_netfs_data,
+ const void *buffer,
+ uint16_t buflen)
+{
+ struct afs_vnode *vnode = cookie_netfs_data;
+ uint16_t dlen;
+
+ _enter("{%x,%x,%Lx},%p,%u",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version,
+ buffer, buflen);
+
+ /* check the size of the data is what we're expecting */
+ dlen = sizeof(vnode->fid.unique) + sizeof(vnode->status.version);
+ if (dlen != buflen) {
+ _leave(" = OBSOLETE [len %hx != %hx]", dlen, buflen);
+ return FSCACHE_CHECKAUX_OBSOLETE;
}

- if (vnode->fid.unique != cvnode->vnode_unique ||
- vnode->status.version != cvnode->data_version) {
- _leave(" = DELETE");
- return CACHEFS_MATCH_SUCCESS_DELETE;
+ if (memcmp(buffer,
+ &vnode->fid.unique,
+ sizeof(vnode->fid.unique)
+ ) != 0
+ ) {
+ unsigned unique;
+
+ memcpy(&unique, buffer, sizeof(unique));
+
+ _leave(" = OBSOLETE [uniq %x != %x]",
+ unique, vnode->fid.unique);
+ return FSCACHE_CHECKAUX_OBSOLETE;
+ }
+
+ if (memcmp(buffer + sizeof(vnode->fid.unique),
+ &vnode->status.version,
+ sizeof(vnode->status.version)
+ ) != 0
+ ) {
+ afs_dataversion_t version;
+
+ memcpy(&version, buffer + sizeof(vnode->fid.unique),
+ sizeof(version));
+
+ _leave(" = OBSOLETE [vers %llx != %llx]",
+ version, vnode->status.version);
+ return FSCACHE_CHECKAUX_OBSOLETE;
}

_leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
-} /* end afs_vnode_cache_match() */
+ return FSCACHE_CHECKAUX_OKAY;
+
+} /* end afs_vnode_cache_check_aux() */
#endif

/*****************************************************************************/
/*
- * update a vnode record stored in the cache
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
*/
-#ifdef AFS_CACHING_SUPPORT
-static void afs_vnode_cache_update(void *source, void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_mark_pages_cached(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec)
{
- struct afs_cache_vnode *cvnode = entry;
- struct afs_vnode *vnode = source;
+ unsigned long loop;
+
+ for (loop = 0; loop < cached_pvec->nr; loop++) {
+ struct page *page = cached_pvec->pages[loop];

- _enter("");
+ _debug("- mark %p{%lx}", page, page->index);

- cvnode->vnode_id = vnode->fid.vnode;
- cvnode->vnode_unique = vnode->fid.unique;
- cvnode->data_version = vnode->status.version;
+ SetPagePrivate(page);
+ }
+
+} /* end afs_vnode_cache_mark_pages_cached() */
+#endif
+
+/*****************************************************************************/
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ * is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+#ifdef CONFIG_AFS_FSCACHE
+static void afs_vnode_cache_now_uncached(void *cookie_netfs_data)
+{
+ struct afs_vnode *vnode = cookie_netfs_data;
+ struct pagevec pvec;
+ pgoff_t first;
+ int loop, nr_pages;
+
+ _enter("{%x,%x,%Lx}",
+ vnode->fid.vnode, vnode->fid.unique, vnode->status.version);
+
+ pagevec_init(&pvec, 0);
+ first = 0;
+
+ for (;;) {
+ /* grab a bunch of pages to clean */
+ nr_pages = pagevec_lookup(&pvec, vnode->vfs_inode.i_mapping,
+ first,
+ PAGEVEC_SIZE - pagevec_count(&pvec));
+ if (!nr_pages)
+ break;
+
+ for (loop = 0; loop < nr_pages; loop++)
+ ClearPagePrivate(pvec.pages[loop]);
+
+ first = pvec.pages[nr_pages - 1]->index + 1;
+
+ pvec.nr = nr_pages;
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+
+ _leave("");

-} /* end afs_vnode_cache_update() */
+} /* end afs_vnode_cache_now_uncached() */
#endif
diff --git a/fs/afs/vnode.h b/fs/afs/vnode.h
index b86a971..3f0602d 100644
--- a/fs/afs/vnode.h
+++ b/fs/afs/vnode.h
@@ -13,9 +13,9 @@ #ifndef _LINUX_AFS_VNODE_H
#define _LINUX_AFS_VNODE_H

#include <linux/fs.h>
+#include <linux/fscache.h>
#include "server.h"
#include "kafstimod.h"
-#include "cache.h"

#ifdef __KERNEL__

@@ -32,8 +32,8 @@ struct afs_cache_vnode
afs_dataversion_t data_version; /* data version */
};

-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vnode_cache_index_def;
+#ifdef CONFIG_AFS_FSCACHE
+extern struct fscache_cookie_def afs_vnode_cache_index_def;
#endif

/*****************************************************************************/
@@ -47,8 +47,8 @@ struct afs_vnode
struct afs_volume *volume; /* volume on which vnode resides */
struct afs_fid fid; /* the file identifier for this inode */
struct afs_file_status status; /* AFS status info for this file */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif

wait_queue_head_t update_waitq; /* status fetch waitqueue */
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 768c6db..7a380ac 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -15,10 +15,10 @@ #include <linux/init.h>
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/fscache.h>
#include "volume.h"
#include "vnode.h"
#include "cell.h"
-#include "cache.h"
#include "cmservice.h"
#include "fsclient.h"
#include "vlclient.h"
@@ -28,18 +28,14 @@ #ifdef __KDEBUG
static const char *afs_voltypes[] = { "R/W", "R/O", "BAK" };
#endif

-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
- const void *entry);
-static void afs_volume_cache_update(void *source, void *entry);
-
-struct cachefs_index_def afs_volume_cache_index_def = {
- .name = "volume",
- .data_size = sizeof(struct afs_cache_vhash),
- .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 1 },
- .keys[1] = { CACHEFS_INDEX_KEYS_BIN, 1 },
- .match = afs_volume_cache_match,
- .update = afs_volume_cache_update,
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t buflen);
+
+static struct fscache_cookie_def afs_volume_cache_index_def = {
+ .name = "AFS.volume",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = afs_volume_cache_get_key,
};
#endif

@@ -213,11 +209,10 @@ int afs_volume_lookup(const char *name,
}

/* attach the cache and volume location */
-#ifdef AFS_CACHING_SUPPORT
- cachefs_acquire_cookie(vlocation->cache,
- &afs_vnode_cache_index_def,
- volume,
- &volume->cache);
+#ifdef CONFIG_AFS_FSCACHE
+ volume->cache = fscache_acquire_cookie(vlocation->cache,
+ &afs_volume_cache_index_def,
+ volume);
#endif

afs_get_vlocation(vlocation);
@@ -285,8 +280,8 @@ void afs_put_volume(struct afs_volume *v
up_write(&vlocation->cell->vl_sem);

/* finish cleaning up the volume */
-#ifdef AFS_CACHING_SUPPORT
- cachefs_relinquish_cookie(volume->cache, 0);
+#ifdef CONFIG_AFS_FSCACHE
+ fscache_relinquish_cookie(volume->cache, 0);
#endif
afs_put_vlocation(vlocation);

@@ -480,40 +475,25 @@ int afs_volume_release_fileserver(struct

/*****************************************************************************/
/*
- * match a volume hash record stored in the cache
+ * set the key for the index entry
*/
-#ifdef AFS_CACHING_SUPPORT
-static cachefs_match_val_t afs_volume_cache_match(void *target,
- const void *entry)
+#ifdef CONFIG_AFS_FSCACHE
+static uint16_t afs_volume_cache_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
{
- const struct afs_cache_vhash *vhash = entry;
- struct afs_volume *volume = target;
-
- _enter("{%u},{%u}", volume->type, vhash->vtype);
+ const struct afs_volume *volume = cookie_netfs_data;
+ uint16_t klen;

- if (volume->type == vhash->vtype) {
- _leave(" = SUCCESS");
- return CACHEFS_MATCH_SUCCESS;
- }
-
- _leave(" = FAILED");
- return CACHEFS_MATCH_FAILED;
-} /* end afs_volume_cache_match() */
-#endif
+ _enter("{%u},%p,%u", volume->type, buffer, bufmax);

-/*****************************************************************************/
-/*
- * update a volume hash record stored in the cache
- */
-#ifdef AFS_CACHING_SUPPORT
-static void afs_volume_cache_update(void *source, void *entry)
-{
- struct afs_cache_vhash *vhash = entry;
- struct afs_volume *volume = source;
+ klen = sizeof(volume->type);
+ if (klen > bufmax)
+ return 0;

- _enter("");
+ memcpy(buffer, &volume->type, sizeof(volume->type));

- vhash->vtype = volume->type;
+ _leave(" = %u", klen);
+ return klen;

-} /* end afs_volume_cache_update() */
+} /* end afs_volume_cache_get_key() */
#endif
diff --git a/fs/afs/volume.h b/fs/afs/volume.h
index bfdcf19..fc9895a 100644
--- a/fs/afs/volume.h
+++ b/fs/afs/volume.h
@@ -12,11 +12,11 @@
#ifndef _LINUX_AFS_VOLUME_H
#define _LINUX_AFS_VOLUME_H

+#include <linux/fscache.h>
#include "types.h"
#include "fsclient.h"
#include "kafstimod.h"
#include "kafsasyncd.h"
-#include "cache.h"

typedef enum {
AFS_VLUPD_SLEEP, /* sleeping waiting for update timer to fire */
@@ -45,24 +45,6 @@ #define AFS_VOL_VTM_BAK 0x04 /* backup v
time_t rtime; /* last retrieval time */
};

-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_vlocation_cache_index_def;
-#endif
-
-/*****************************************************************************/
-/*
- * volume -> vnode hash table entry
- */
-struct afs_cache_vhash
-{
- afs_voltype_t vtype; /* which volume variation */
- uint8_t hash_bucket; /* which hash bucket this represents */
-} __attribute__((packed));
-
-#ifdef AFS_CACHING_SUPPORT
-extern struct cachefs_index_def afs_volume_cache_index_def;
-#endif
-
/*****************************************************************************/
/*
* AFS volume location record
@@ -73,8 +55,8 @@ struct afs_vlocation
struct list_head link; /* link in cell volume location list */
struct afs_timer timeout; /* decaching timer */
struct afs_cell *cell; /* cell to which volume belongs */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif
struct afs_cache_vlocation vldb; /* volume information DB record */
struct afs_volume *vols[3]; /* volume access record pointer (index by type) */
@@ -109,8 +91,8 @@ struct afs_volume
atomic_t usage;
struct afs_cell *cell; /* cell to which belongs (unrefd ptr) */
struct afs_vlocation *vlocation; /* volume location */
-#ifdef AFS_CACHING_SUPPORT
- struct cachefs_cookie *cache; /* caching cookie */
+#ifdef CONFIG_AFS_FSCACHE
+ struct fscache_cookie *cache; /* caching cookie */
#endif
afs_volid_t vid; /* volume ID */
afs_voltype_t type; /* type of volume */

2006-11-14 20:10:00

by David Howells

[permalink] [raw]
Subject: [PATCH 17/19] CacheFiles: Use the VFS wrappers for inode ops

Make CacheFiles use the appropriate VFS wrappers for inode operations (such as
vfs_mkdir()) rather than calling through the ops table directly. This was
being done to bypass security, but now that the cachefiles module has its own
security label and the means to act as that on behalf of another process, this
is no longer necessary.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-bind.c | 2 +
fs/cachefiles/cf-interface.c | 3 +-
fs/cachefiles/cf-namei.c | 44 +++++-------------------------
fs/cachefiles/cf-security.c | 9 ++----
fs/cachefiles/cf-xattr.c | 61 ++++++++++++------------------------------
5 files changed, 31 insertions(+), 88 deletions(-)

diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
index 0ac3a6b..1d1fd14 100644
--- a/fs/cachefiles/cf-bind.c
+++ b/fs/cachefiles/cf-bind.c
@@ -151,7 +151,7 @@ static int cachefiles_daemon_add_cache(s
goto error_unsupported;

/* get the cache size and blocksize */
- ret = root->d_sb->s_op->statfs(root, &stats);
+ ret = vfs_statfs(root, &stats);
if (ret < 0)
goto error_unsupported;

diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
index f467058..7a3d085 100644
--- a/fs/cachefiles/cf-interface.c
+++ b/fs/cachefiles/cf-interface.c
@@ -355,10 +355,11 @@ int cachefiles_has_space(struct cachefil
/* find out how many pages of blockdev are available */
memset(&stats, 0, sizeof(stats));

- ret = cache->mnt->mnt_sb->s_op->statfs(cache->mnt->mnt_root, &stats);
+ ret = vfs_statfs(cache->mnt->mnt_root, &stats);
if (ret < 0) {
if (ret == -EIO)
cachefiles_io_error(cache, "statfs failed");
+ _leave(" = %d", ret);
return ret;
}

diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
index 80c9b66..5508fa2 100644
--- a/fs/cachefiles/cf-namei.c
+++ b/fs/cachefiles/cf-namei.c
@@ -72,7 +72,6 @@ static int cachefiles_bury_object(struct
{
struct dentry *grave, *alt, *trap;
struct qstr name;
- const char *old_name;
char nbuffer[8 + 8 + 1];
int ret;

@@ -83,16 +82,12 @@ static int cachefiles_bury_object(struct
/* non-directories can just be unlinked */
if (!S_ISDIR(rep->d_inode->i_mode)) {
_debug("unlink stale object");
- ret = dir->d_inode->i_op->unlink(dir->d_inode, rep);
+ ret = vfs_unlink(dir->d_inode, rep);

mutex_unlock(&dir->d_inode->i_mutex);

- if (ret == 0) {
- _debug("d_delete");
- d_delete(rep);
- } else if (ret == -EIO) {
+ if (ret == -EIO)
cachefiles_io_error(cache, "Unlink failed");
- }

_leave(" = %d", ret);
return ret;
@@ -213,24 +208,9 @@ try_again:
}

/* attempt the rename */
- DQUOT_INIT(dir->d_inode);
- DQUOT_INIT(cache->graveyard->d_inode);
-
- old_name = fsnotify_oldname_init(rep->d_name.name);
-
- ret = dir->d_inode->i_op->rename(dir->d_inode, rep,
- cache->graveyard->d_inode, grave);
-
- if (ret == 0) {
- d_move(rep, grave);
- fsnotify_move(dir->d_inode, cache->graveyard->d_inode,
- old_name, rep->d_name.name, 1,
- grave->d_inode, rep->d_inode);
- } else if (ret != -ENOMEM) {
+ ret = vfs_rename(dir->d_inode, rep, cache->graveyard->d_inode, grave);
+ if (ret != -ENOMEM)
cachefiles_io_error(cache, "Rename failed with error %d", ret);
- }
-
- fsnotify_oldname_free(old_name);

unlock_rename(cache->graveyard, dir);
dput(grave);
@@ -372,15 +352,12 @@ lookup_again:
if (ret < 0)
goto create_error;

- DQUOT_INIT(dir->d_inode);
- ret = dir->d_inode->i_op->mkdir(dir->d_inode, next, 0);
+ ret = vfs_mkdir(dir->d_inode, next, 0);
if (ret < 0)
goto create_error;

ASSERT(next->d_inode);

- fsnotify_mkdir(dir->d_inode, next);
-
_debug("mkdir -> %p{%p{ino=%lu}}",
next, next->d_inode, next->d_inode->i_ino);

@@ -398,16 +375,12 @@ lookup_again:
if (ret < 0)
goto create_error;

- DQUOT_INIT(dir->d_inode);
- ret = dir->d_inode->i_op->create(dir->d_inode, next,
- S_IFREG, NULL);
+ ret = vfs_create(dir->d_inode, next, S_IFREG, NULL);
if (ret < 0)
goto create_error;

ASSERT(next->d_inode);

- fsnotify_create(dir->d_inode, next);
-
_debug("create -> %p{%p{ino=%lu}}",
next, next->d_inode, next->d_inode->i_ino);

@@ -605,15 +578,12 @@ struct dentry *cachefiles_get_directory(
if (ret < 0)
goto mkdir_error;

- DQUOT_INIT(dir->d_inode);
- ret = dir->d_inode->i_op->mkdir(dir->d_inode, subdir, 0700);
+ ret = vfs_mkdir(dir->d_inode, subdir, 0700);
if (ret < 0)
goto mkdir_error;

ASSERT(subdir->d_inode);

- fsnotify_mkdir(dir->d_inode, subdir);
-
_debug("mkdir -> %p{%p{ino=%lu}}",
subdir,
subdir->d_inode,
diff --git a/fs/cachefiles/cf-security.c b/fs/cachefiles/cf-security.c
index 4c5f052..d7c1473 100644
--- a/fs/cachefiles/cf-security.c
+++ b/fs/cachefiles/cf-security.c
@@ -60,6 +60,7 @@ error:

/*
* check the security details of the on-disk cache
+ * - must be called with security imposed
*/
int cachefiles_check_security(struct cachefiles_cache *cache,
struct dentry *root)
@@ -82,14 +83,12 @@ int cachefiles_check_security(struct cac

/* check that we have permission to create files and directories with
* the security ID we've been given */
- security_act_as_secid(cache->access_sid);
-
ret = security_inode_mkdir(root->d_inode, root, 0);
if (ret < 0) {
printk(KERN_ERR "CacheFiles:"
" Security denies permission to make dirs: error %d",
ret);
- goto error2;
+ goto error;
}

ret = security_inode_create(root->d_inode, root, 0);
@@ -97,11 +96,9 @@ int cachefiles_check_security(struct cac
printk(KERN_ERR "CacheFiles:"
" Security denies permission to create files: error %d",
ret);
- goto error2;
+ goto error;
}

-error2:
- security_act_as_self();
error:
if (ret == -EOPNOTSUPP)
ret = 0;
diff --git a/fs/cachefiles/cf-xattr.c b/fs/cachefiles/cf-xattr.c
index 5017715..209b813 100644
--- a/fs/cachefiles/cf-xattr.c
+++ b/fs/cachefiles/cf-xattr.c
@@ -32,9 +32,6 @@ int cachefiles_check_object_type(struct

ASSERT(dentry);
ASSERT(dentry->d_inode);
- ASSERT(dentry->d_inode->i_op);
- ASSERT(dentry->d_inode->i_op->setxattr);
- ASSERT(dentry->d_inode->i_op->getxattr);

if (!object->fscache.cookie)
strcpy(type, "C3");
@@ -43,14 +40,11 @@ int cachefiles_check_object_type(struct

_enter("%p{%s}", object, type);

- mutex_lock(&dentry->d_inode->i_mutex);
-
/* attempt to install a type label directly */
- ret = dentry->d_inode->i_op->setxattr(dentry, cachefiles_xattr_cache,
- type, 2, XATTR_CREATE);
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache, type, 2,
+ XATTR_CREATE);
if (ret == 0) {
_debug("SET"); /* we succeeded */
- fsnotify_xattr(dentry);
goto error;
}

@@ -63,8 +57,7 @@ int cachefiles_check_object_type(struct
}

/* read the current type label */
- ret = dentry->d_inode->i_op->getxattr(dentry, cachefiles_xattr_cache,
- xtype, 3);
+ ret = vfs_getxattr(dentry, cachefiles_xattr_cache, xtype, 3);
if (ret < 0) {
if (ret == -ERANGE)
goto bad_type_length;
@@ -86,7 +79,6 @@ int cachefiles_check_object_type(struct
ret = 0;

error:
- mutex_unlock(&dentry->d_inode->i_mutex);
_leave(" = %d", ret);
return ret;

@@ -117,26 +109,22 @@ int cachefiles_set_object_xattr(struct c

ASSERT(object->fscache.cookie);
ASSERT(dentry);
- ASSERT(dentry->d_inode->i_op->setxattr);

_enter("%p,#%d", object, auxdata->len);

/* attempt to install the cache metadata directly */
- mutex_lock(&dentry->d_inode->i_mutex);
-
_debug("SET %s #%u", object->fscache.cookie->def->name, auxdata->len);

- ret = dentry->d_inode->i_op->setxattr(dentry, cachefiles_xattr_cache,
- &auxdata->type, auxdata->len,
- XATTR_CREATE);
- if (ret == 0)
- fsnotify_xattr(dentry);
- else if (ret != -ENOMEM)
- cachefiles_io_error_obj(object,
- "Failed to set xattr with error %d",
- ret);
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+ &auxdata->type, auxdata->len,
+ XATTR_CREATE);
+ if (ret < 0) {
+ if (ret != -ENOMEM)
+ cachefiles_io_error_obj(
+ object,
+ "Failed to set xattr with error %d", ret);
+ }

- mutex_unlock(&dentry->d_inode->i_mutex);
_leave(" = %d", ret);
return ret;
}
@@ -156,8 +144,6 @@ int cachefiles_check_object_xattr(struct

ASSERT(dentry);
ASSERT(dentry->d_inode);
- ASSERT(dentry->d_inode->i_op->setxattr);
- ASSERT(dentry->d_inode->i_op->getxattr);

auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL);
if (!auxbuf) {
@@ -165,11 +151,9 @@ int cachefiles_check_object_xattr(struct
return -ENOMEM;
}

- mutex_lock(&dentry->d_inode->i_mutex);
-
/* read the current type label */
- ret = dentry->d_inode->i_op->getxattr(dentry, cachefiles_xattr_cache,
- &auxbuf->type, 512 + 1);
+ ret = vfs_getxattr(dentry, cachefiles_xattr_cache,
+ &auxbuf->type, 512 + 1);
if (ret < 0) {
if (ret == -ENODATA)
goto stale; /* no attribute - power went off
@@ -225,11 +209,9 @@ int cachefiles_check_object_xattr(struct
}

/* update the current label */
- ret = dentry->d_inode->i_op->setxattr(dentry,
- cachefiles_xattr_cache,
- &auxdata->type,
- auxdata->len,
- XATTR_REPLACE);
+ ret = vfs_setxattr(dentry, cachefiles_xattr_cache,
+ &auxdata->type, auxdata->len,
+ XATTR_REPLACE);
if (ret < 0) {
cachefiles_io_error_obj(object,
"Can't update xattr on %lu"
@@ -243,7 +225,6 @@ okay:
ret = 0;

error:
- mutex_unlock(&dentry->d_inode->i_mutex);
kfree(auxbuf);
_leave(" = %d", ret);
return ret;
@@ -267,13 +248,7 @@ int cachefiles_remove_object_xattr(struc
{
int ret;

- mutex_lock(&dentry->d_inode->i_mutex);
-
- ret = dentry->d_inode->i_op->removexattr(dentry,
- cachefiles_xattr_cache);
-
- mutex_unlock(&dentry->d_inode->i_mutex);
-
+ ret = vfs_removexattr(dentry, cachefiles_xattr_cache);
if (ret < 0) {
if (ret == -ENOENT || ret == -ENODATA)
ret = 0;

2006-11-14 20:08:53

by David Howells

[permalink] [raw]
Subject: [PATCH 03/19] FS-Cache: Release page->private after failed readahead

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells <[email protected]>
---

mm/readahead.c | 46 ++++++++++++++++++++++++++++++++++++++--------
1 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 23cb61a..c64f366 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -14,6 +14,7 @@ #include <linux/module.h>
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
+#include <linux/buffer_head.h>

void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -118,6 +119,41 @@ static inline unsigned long get_next_ra_

#define list_to_page(head) (list_entry((head)->prev, struct page, lru))

+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ * such as the NFS fs marking pages that are cached locally on disk, thus we
+ * need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+ struct page *page)
+{
+ if (PagePrivate(page)) {
+ if (TestSetPageLocked(page))
+ BUG();
+ page->mapping = mapping;
+ do_invalidatepage(page, 0);
+ page->mapping = NULL;
+ unlock_page(page);
+ }
+ page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+ struct page *victim;
+
+ while (!list_empty(pages)) {
+ victim = list_to_page(pages);
+ list_del(&victim->lru);
+ read_cache_pages_invalidate_page(mapping, victim);
+ }
+}
+
/**
* read_cache_pages - populate an address space with some pages & start reads against them
* @mapping: the address_space
@@ -141,20 +177,14 @@ int read_cache_pages(struct address_spac
page = list_to_page(pages);
list_del(&page->lru);
if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
- page_cache_release(page);
+ read_cache_pages_invalidate_page(mapping, page);
continue;
}
ret = filler(data, page);
if (!pagevec_add(&lru_pvec, page))
__pagevec_lru_add(&lru_pvec);
if (ret) {
- while (!list_empty(pages)) {
- struct page *victim;
-
- victim = list_to_page(pages);
- list_del(&victim->lru);
- page_cache_release(victim);
- }
+ read_cache_pages_invalidate_pages(mapping, pages);
break;
}
}

2006-11-14 20:10:05

by David Howells

[permalink] [raw]
Subject: [PATCH 07/19] CacheFiles: Add missing copy_page export for ia64

This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches. This patch is not yet upstream, but is required
for cachefile on ia64. It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava <[email protected]>
Signed-Off-By: David Howells <[email protected]>
---

arch/ia64/kernel/ia64_ksyms.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index 879c181..a5a98dc 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -42,6 +42,7 @@ EXPORT_SYMBOL(__do_clear_user);
EXPORT_SYMBOL(__strlen_user);
EXPORT_SYMBOL(__strncpy_from_user);
EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);

/* from arch/ia64/lib */
extern void __divsi3(void);

2006-11-14 20:10:48

by David Howells

[permalink] [raw]
Subject: [PATCH 05/19] NFS: Use local caching

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required. This can be
obtained from:

http://people.redhat.com/steved/cachefs/util-linux/

To mount an NFS filesystem to use caching, add an "fsc" option to the mount:

mount warthog:/ /a -o fsc

Signed-Off-By: David Howells <[email protected]>
---

fs/Kconfig | 8 +
fs/nfs/Makefile | 1
fs/nfs/client.c | 11 +
fs/nfs/file.c | 49 ++++-
fs/nfs/fscache.c | 347 ++++++++++++++++++++++++++++++++
fs/nfs/fscache.h | 471 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/inode.c | 21 ++
fs/nfs/internal.h | 32 +++
fs/nfs/pagelist.c | 3
fs/nfs/read.c | 30 +++
fs/nfs/super.c | 1
fs/nfs/sysctl.c | 43 ++++
fs/nfs/write.c | 11 +
include/linux/nfs4_mount.h | 1
include/linux/nfs_fs.h | 4
include/linux/nfs_fs_sb.h | 5
include/linux/nfs_mount.h | 1
17 files changed, 1029 insertions(+), 10 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index aa6fad1..04bfc27 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1648,6 +1648,14 @@ config NFS_V4

If unsure, say N.

+config NFS_FSCACHE
+ bool "Provide NFS client caching support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+ help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
config NFS_DIRECTIO
bool "Allow direct I/O on NFS files"
depends on NFS_FS
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index f4580b4..2af6f22 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4x
nfs4namespace.o
nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
nfs-objs := $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 5fea638..6e19b28 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -149,6 +149,8 @@ #ifdef CONFIG_NFS_V4
clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
#endif

+ nfs_fscache_get_client_cookie(clp);
+
return clp;

error_3:
@@ -192,6 +194,8 @@ static void nfs_free_client(struct nfs_c

nfs4_shutdown_client(clp);

+ nfs_fscache_release_client_cookie(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -1368,7 +1372,7 @@ static int nfs_volume_list_show(struct s

/* display header on line 1 */
if (v == SEQ_START_TOKEN) {
- seq_puts(m, "NV SERVER PORT DEV FSID\n");
+ seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1382,12 +1386,13 @@ static int nfs_volume_list_show(struct s
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);

- seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+ seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
clp->cl_nfsversion,
NIPQUAD(clp->cl_addr.sin_addr),
ntohs(clp->cl_addr.sin_port),
dev,
- fsid);
+ fsid,
+ nfs_server_fscache_state(server));

return 0;
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index cc93865..9da03ec 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -27,12 +27,14 @@ #include <linux/mm.h>
#include <linux/slab.h>
#include <linux/pagemap.h>
#include <linux/smp_lock.h>
+#include <linux/buffer_head.h>

#include <asm/uaccess.h>
#include <asm/system.h>

#include "delegation.h"
#include "iostat.h"
+#include "internal.h"

#define NFSDBG_FACILITY NFSDBG_FILE

@@ -253,6 +255,10 @@ nfs_file_mmap(struct file * file, struct
status = nfs_revalidate_mapping(inode, file->f_mapping);
if (!status)
status = generic_file_mmap(file, vma);
+
+ if (status == 0)
+ nfs_fscache_install_vm_ops(inode, vma);
+
return status;
}

@@ -305,6 +311,12 @@ static int nfs_commit_write(struct file
return status;
}

+/*
+ * partially or wholly invalidate a page
+ * - release the private state associated with a page if undergoing complete
+ * page invalidation
+ * - caller holds page lock
+ */
static void nfs_invalidate_page(struct page *page, unsigned long offset)
{
struct inode *inode = page->mapping->host;
@@ -312,19 +324,47 @@ static void nfs_invalidate_page(struct p
/* Cancel any unstarted writes on this page */
if (offset == 0)
nfs_sync_inode_wait(inode, page->index, 1, FLUSH_INVALIDATE);
+
+ nfs_fscache_invalidate_page(page, inode, offset);
+
+ /* we can do this here as the bits are only set with the page lock
+ * held, and our caller is holding that */
+ if (!page->private)
+ ClearPagePrivate(page);
}

+/*
+ * release the private state associated with a page, if the page isn't busy
+ * - caller holds page lock
+ * - return true (may release) or false (may not)
+ */
static int nfs_release_page(struct page *page, gfp_t gfp)
{
- if (gfp & __GFP_FS)
- return !nfs_wb_page(page->mapping->host, page);
- else
+ if ((gfp & __GFP_FS) == 0) {
/*
* Avoid deadlock on nfs_wait_on_request().
*/
return 0;
+ }
+
+ if (nfs_wb_page(page->mapping->host, page) < 0)
+ return 0;
+
+ if (nfs_fscache_release_page(page) < 0)
+ return 0;
+
+ /* PG_private may have been set due to either caching or writing */
+ BUG_ON(page->private != 0);
+ ClearPagePrivate(page);
+
+ return 1;
}

+/*
+ * Since we use page->private for our own nefarious purposes when using
+ * fscache, we have to override extra address space ops to prevent fs/buffer.c
+ * from getting confused, even though we may not have asked its opinion
+ */
const struct address_space_operations nfs_file_aops = {
.readpage = nfs_readpage,
.readpages = nfs_readpages,
@@ -338,6 +378,9 @@ const struct address_space_operations nf
#ifdef CONFIG_NFS_DIRECTIO
.direct_IO = nfs_direct_IO,
#endif
+#ifdef CONFIG_NFS_FSCACHE
+ .sync_page = block_sync_page,
+#endif
};

static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
new file mode 100644
index 0000000..81286f6
--- /dev/null
+++ b/fs/nfs/fscache.c
@@ -0,0 +1,347 @@
+/* fscache.c: NFS filesystem cache interface
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_fs_sb.h>
+#include <linux/in6.h>
+
+#include "internal.h"
+
+/*
+ * Sysctl variables
+ */
+atomic_t nfs_fscache_to_pages;
+atomic_t nfs_fscache_from_pages;
+atomic_t nfs_fscache_uncache_page;
+int nfs_fscache_from_error;
+int nfs_fscache_to_error;
+
+#define NFSDBG_FACILITY NFSDBG_FSCACHE
+
+/* the auxiliary data in the cache (used for coherency management) */
+struct nfs_fh_auxdata {
+ struct timespec i_mtime;
+ struct timespec i_ctime;
+ loff_t i_size;
+};
+
+static struct fscache_netfs_operations nfs_cache_ops = {
+};
+
+struct fscache_netfs nfs_cache_netfs = {
+ .name = "nfs",
+ .version = 0,
+ .ops = &nfs_cache_ops,
+};
+
+static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
+ [0 ... 9] = 0x00,
+ [10 ... 11] = 0xff
+};
+
+struct nfs_server_key {
+ uint16_t nfsversion;
+ uint16_t port;
+ union {
+ struct {
+ uint8_t ipv6wrapper[12];
+ struct in_addr addr;
+ } ipv4_addr;
+ struct in6_addr ipv6_addr;
+ };
+};
+
+static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct nfs_client *clp = cookie_netfs_data;
+ struct nfs_server_key *key = buffer;
+ uint16_t len = 0;
+
+ key->nfsversion = clp->cl_nfsversion;
+
+ switch (clp->cl_addr.sin_family) {
+ case AF_INET:
+ key->port = clp->cl_addr.sin_port;
+
+ memcpy(&key->ipv4_addr.ipv6wrapper,
+ &nfs_cache_ipv6_wrapper_for_ipv4,
+ sizeof(key->ipv4_addr.ipv6wrapper));
+ memcpy(&key->ipv4_addr.addr,
+ &clp->cl_addr.sin_addr,
+ sizeof(key->ipv4_addr.addr));
+ len = sizeof(struct nfs_server_key);
+ break;
+
+ case AF_INET6:
+ key->port = clp->cl_addr.sin_port;
+
+ memcpy(&key->ipv6_addr,
+ &clp->cl_addr.sin_addr,
+ sizeof(key->ipv6_addr));
+ len = sizeof(struct nfs_server_key);
+ break;
+
+ default:
+ len = 0;
+ printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
+ clp->cl_addr.sin_family);
+ break;
+ }
+
+ return len;
+}
+
+/*
+ * the root index for the filesystem is defined by nfsd IP address and ports
+ */
+struct fscache_cookie_def nfs_cache_server_index_def = {
+ .name = "NFS.servers",
+ .type = FSCACHE_COOKIE_TYPE_INDEX,
+ .get_key = nfs_server_get_key,
+};
+
+static uint16_t nfs_fh_get_key(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ const struct nfs_inode *nfsi = cookie_netfs_data;
+ uint16_t nsize;
+
+ /* set the file handle */
+ nsize = nfsi->fh.size;
+ memcpy(buffer, nfsi->fh.data, nsize);
+ return nsize;
+}
+
+/*
+ * indication of pages that now have cache metadata retained
+ * - this function should mark the specified pages as now being cached
+ */
+static void nfs_fh_mark_pages_cached(void *cookie_netfs_data,
+ struct address_space *mapping,
+ struct pagevec *cached_pvec)
+{
+ struct nfs_inode *nfsi = cookie_netfs_data;
+ unsigned long loop;
+
+ dprintk("NFS: nfs_fh_mark_pages_cached: nfs_inode 0x%p pages %ld\n",
+ nfsi, cached_pvec->nr);
+
+ BUG_ON(!nfsi->fscache);
+
+ for (loop = 0; loop < cached_pvec->nr; loop++)
+ SetPageNfsCached(cached_pvec->pages[loop]);
+}
+
+/*
+ * get an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ * require a context
+ */
+static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
+{
+ get_nfs_open_context(context);
+}
+
+/*
+ * release an extra reference on a read context
+ * - this function can be absent if the completion function doesn't
+ * require a context
+ */
+static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
+{
+ if (context)
+ put_nfs_open_context(context);
+}
+
+/*
+ * indication the cookie is no longer uncached
+ * - this function is called when the backing store currently caching a cookie
+ * is removed
+ * - the netfs should use this to clean up any markers indicating cached pages
+ * - this is mandatory for any object that may have data
+ */
+static void nfs_fh_now_uncached(void *cookie_netfs_data)
+{
+ struct nfs_inode *nfsi = cookie_netfs_data;
+ struct pagevec pvec;
+ pgoff_t first;
+ int loop, nr_pages;
+
+ pagevec_init(&pvec, 0);
+ first = 0;
+
+ dprintk("NFS: nfs_fh_now_uncached: nfs_inode 0x%p\n", nfsi);
+
+ for (;;) {
+ /* grab a bunch of pages to clean */
+ nr_pages = pagevec_lookup(&pvec,
+ nfsi->vfs_inode.i_mapping,
+ first,
+ PAGEVEC_SIZE - pagevec_count(&pvec));
+ if (!nr_pages)
+ break;
+
+ for (loop = 0; loop < nr_pages; loop++)
+ ClearPageNfsCached(pvec.pages[loop]);
+
+ first = pvec.pages[nr_pages - 1]->index + 1;
+
+ pvec.nr = nr_pages;
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+}
+
+/*
+ * get certain file attributes from the netfs data
+ * - this function can be absent for an index
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+static void nfs_fh_get_attr(const void *cookie_netfs_data, uint64_t *size)
+{
+ const struct nfs_inode *nfsi = cookie_netfs_data;
+
+ *size = nfsi->vfs_inode.i_size;
+}
+
+/*
+ * get the auxilliary data from netfs data
+ * - this function can be absent if the index carries no state data
+ * - should store the auxilliary data in the buffer
+ * - should return the amount of amount stored
+ * - not permitted to return an error
+ * - the netfs data from the cookie being used as the source is
+ * presented
+ */
+static uint16_t nfs_fh_get_aux(const void *cookie_netfs_data,
+ void *buffer, uint16_t bufmax)
+{
+ struct nfs_fh_auxdata auxdata;
+ const struct nfs_inode *nfsi = cookie_netfs_data;
+
+ auxdata.i_size = nfsi->vfs_inode.i_size;
+ auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+ auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+ if (bufmax > sizeof(auxdata))
+ bufmax = sizeof(auxdata);
+
+ memcpy(buffer, &auxdata, bufmax);
+ return bufmax;
+}
+
+/*
+ * consult the netfs about the state of an object
+ * - this function can be absent if the index carries no state data
+ * - the netfs data from the cookie being used as the target is
+ * presented, as is the auxilliary data
+ */
+static fscache_checkaux_t nfs_fh_check_aux(void *cookie_netfs_data,
+ const void *data, uint16_t datalen)
+{
+ struct nfs_fh_auxdata auxdata;
+ struct nfs_inode *nfsi = cookie_netfs_data;
+
+ if (datalen > sizeof(auxdata))
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ auxdata.i_size = nfsi->vfs_inode.i_size;
+ auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
+ auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
+
+ if (memcmp(data, &auxdata, datalen) != 0)
+ return FSCACHE_CHECKAUX_OBSOLETE;
+
+ return FSCACHE_CHECKAUX_OKAY;
+}
+
+/*
+ * the primary index for each server is simply made up of a series of NFS file
+ * handles
+ */
+struct fscache_cookie_def nfs_cache_fh_index_def = {
+ .name = "NFS.fh",
+ .type = FSCACHE_COOKIE_TYPE_DATAFILE,
+ .get_key = nfs_fh_get_key,
+ .get_attr = nfs_fh_get_attr,
+ .get_aux = nfs_fh_get_aux,
+ .check_aux = nfs_fh_check_aux,
+ .get_context = nfs_fh_get_context,
+ .put_context = nfs_fh_put_context,
+ .mark_pages_cached = nfs_fh_mark_pages_cached,
+ .now_uncached = nfs_fh_now_uncached,
+};
+
+static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+{
+ wait_on_page_fs_misc(page);
+ return 0;
+}
+
+struct vm_operations_struct nfs_fs_vm_operations = {
+ .nopage = filemap_nopage,
+ .populate = filemap_populate,
+ .page_mkwrite = nfs_file_page_mkwrite,
+};
+
+/*
+ * handle completion of a page being stored in the cache
+ */
+void nfs_readpage_to_fscache_complete(struct page *page, void *data, int error)
+{
+ dfprintk(FSCACHE,
+ "NFS: readpage_to_fscache_complete (p:%p(i:%lx f:%lx)/%d)\n",
+ page, page->index, page->flags, error);
+
+ end_page_fs_misc(page);
+}
+
+/*
+ * handle completion of a page being read from the cache
+ * - called in process (keventd) context
+ */
+void nfs_readpage_from_fscache_complete(struct page *page,
+ void *context,
+ int error)
+{
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
+ page, context, error);
+
+ /* if the read completes with an error, we just unlock the page and let
+ * the VM reissue the readpage */
+ if (!error) {
+ SetPageUptodate(page);
+ unlock_page(page);
+ } else {
+ error = nfs_readpage_async(context, page->mapping->host, page);
+ if (error)
+ unlock_page(page);
+ }
+}
+
+/*
+ * handle completion of a page being read from the cache
+ * - really need to synchronise the end of writeback, probably using a page
+ * flag, but for the moment we disable caching on writable files
+ */
+void nfs_writepage_to_fscache_complete(struct page *page,
+ void *data,
+ int error)
+{
+}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
new file mode 100644
index 0000000..00a2c07
--- /dev/null
+++ b/fs/nfs/fscache.h
@@ -0,0 +1,471 @@
+/* fscache.h: NFS filesystem cache interface definitions
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NFS_FSCACHE_H
+#define _NFS_FSCACHE_H
+
+#include <linux/nfs_fs.h>
+#include <linux/nfs_mount.h>
+#include <linux/nfs4_mount.h>
+
+#ifdef CONFIG_NFS_FSCACHE
+#include <linux/fscache.h>
+
+extern struct fscache_netfs nfs_cache_netfs;
+extern struct fscache_cookie_def nfs_cache_server_index_def;
+extern struct fscache_cookie_def nfs_cache_fh_index_def;
+extern struct vm_operations_struct nfs_fs_vm_operations;
+
+extern void nfs_invalidatepage(struct page *, unsigned long);
+extern int nfs_releasepage(struct page *, gfp_t);
+
+extern atomic_t nfs_fscache_to_pages;
+extern atomic_t nfs_fscache_from_pages;
+extern atomic_t nfs_fscache_uncache_page;
+extern int nfs_fscache_from_error;
+extern int nfs_fscache_to_error;
+
+/*
+ * register NFS for caching
+ */
+static inline int nfs_fscache_register(void)
+{
+ return fscache_register_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * unregister NFS for caching
+ */
+static inline void nfs_fscache_unregister(void)
+{
+ fscache_unregister_netfs(&nfs_cache_netfs);
+}
+
+/*
+ * get the per-client index cookie for an NFS client if the appropriate mount
+ * flag was set
+ * - we always try and get an index cookie for the client, but get filehandle
+ * cookies on a per-superblock basis, depending on the mount flags
+ */
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp)
+{
+ /* create a cache index for looking up filehandles */
+ clp->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
+ &nfs_cache_server_index_def,
+ clp);
+ dfprintk(FSCACHE,"NFS: get client cookie (0x%p/0x%p)\n",
+ clp, clp->fscache);
+}
+
+/*
+ * dispose of a per-client cookie
+ */
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp)
+{
+ dfprintk(FSCACHE,"NFS: releasing client cookie (0x%p/0x%p)\n",
+ clp, clp->fscache);
+
+ fscache_relinquish_cookie(clp->fscache, 0);
+ clp->fscache = NULL;
+}
+
+/*
+ * indicate the client caching state as readable text
+ */
+static inline const char *nfs_server_fscache_state(struct nfs_server *server)
+{
+ if (server->nfs_client->fscache && (server->flags & NFS_MOUNT_FSCACHE))
+ return "yes";
+ return "no ";
+}
+
+/*
+ * get the per-filehandle cookie for an NFS inode
+ */
+static inline void nfs_fscache_get_fh_cookie(struct inode *inode,
+ int maycache)
+{
+ struct super_block *sb = inode->i_sb;
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ nfsi->fscache = NULL;
+ if (maycache && (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
+ nfsi->fscache = fscache_acquire_cookie(
+ NFS_SB(sb)->nfs_client->fscache,
+ &nfs_cache_fh_index_def,
+ nfsi);
+
+ fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
+
+ dfprintk(FSCACHE, "NFS: get FH cookie (0x%p/0x%p/0x%p)\n",
+ sb, nfsi, nfsi->fscache);
+ }
+}
+
+/*
+ * change the filesize associated with a per-filehandle cookie
+ */
+static inline void nfs_fscache_set_size(struct inode *inode)
+{
+ fscache_set_i_size(NFS_I(inode)->fscache, inode->i_size);
+}
+
+/*
+ * replace a per-filehandle cookie due to revalidation detecting a file having
+ * changed on the server
+ */
+static inline void nfs_fscache_renew_fh_cookie(struct inode *inode)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs_server *server = NFS_SERVER(inode);
+ struct fscache_cookie *old = nfsi->fscache;
+
+ if (nfsi->fscache) {
+ /* retire the current fscache cache and get a new one */
+ fscache_relinquish_cookie(nfsi->fscache, 1);
+
+ nfsi->fscache = fscache_acquire_cookie(
+ server->nfs_client->fscache,
+ &nfs_cache_fh_index_def,
+ nfsi);
+ fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
+
+ dfprintk(FSCACHE,
+ "NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
+ server, nfsi, old, nfsi->fscache);
+ }
+}
+
+/*
+ * release a per-filehandle cookie
+ */
+static inline void nfs_fscache_release_fh_cookie(struct inode *inode)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
+ nfsi, nfsi->fscache);
+
+ fscache_relinquish_cookie(nfsi->fscache, 0);
+ nfsi->fscache = NULL;
+}
+
+/*
+ * retire a per-filehandle cookie, destroying the data attached to it
+ */
+static inline void nfs_fscache_zap_fh_cookie(struct inode *inode)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
+ nfsi, nfsi->fscache);
+
+ fscache_relinquish_cookie(nfsi->fscache, 1);
+ nfsi->fscache = NULL;
+}
+
+/*
+ * turn off the cache with regard to a filehandle cookie if opened for writing,
+ * invalidating all the pages in the page cache relating to the associated
+ * inode to clear the per-page caching
+ */
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
+{
+ if (NFS_I(inode)->fscache) {
+ dfprintk(FSCACHE,
+ "NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
+
+ /* Need to invalided any mapped pages that were read in before
+ * turning off the cache.
+ */
+ if (inode->i_mapping && inode->i_mapping->nrpages)
+ invalidate_inode_pages2(inode->i_mapping);
+
+ nfs_fscache_zap_fh_cookie(inode);
+ }
+}
+
+/*
+ * install the VM ops for mmap() of an NFS file so that we can hold up writes
+ * to pages on shared writable mappings until the store to the cache is
+ * complete
+ */
+static inline void nfs_fscache_install_vm_ops(struct inode *inode,
+ struct vm_area_struct *vma)
+{
+ if (NFS_I(inode)->fscache)
+ vma->vm_ops = &nfs_fs_vm_operations;
+}
+
+/*
+ * release the caching state associated with a page, if the page isn't busy
+ * interacting with the cache
+ */
+static inline int nfs_fscache_release_page(struct page *page)
+{
+ if (PageFsMisc(page))
+ return -EBUSY;
+
+ if (PageNfsCached(page)) {
+ struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+ BUG_ON(!nfsi->fscache);
+
+ dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
+ nfsi->fscache, page, nfsi);
+
+ fscache_uncache_page(nfsi->fscache, page);
+ atomic_inc(&nfs_fscache_uncache_page);
+ ClearPageNfsCached(page);
+ }
+
+ return 0;
+}
+
+/*
+ * release the caching state associated with a page if undergoing complete page
+ * invalidation
+ */
+static inline void nfs_fscache_invalidate_page(struct page *page,
+ struct inode *inode,
+ unsigned long offset)
+{
+ struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+
+ if (PageNfsCached(page)) {
+ BUG_ON(!nfsi->fscache);
+
+ dfprintk(FSCACHE,
+ "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
+ nfsi->fscache, page, nfsi);
+
+ wait_on_page_fs_misc(page);
+
+ if (offset == 0) {
+ BUG_ON(!PageLocked(page));
+ if (!PageWriteback(page)) {
+ fscache_uncache_page(nfsi->fscache, page);
+ atomic_inc(&nfs_fscache_uncache_page);
+ ClearPageNfsCached(page);
+ }
+ }
+ }
+}
+
+/*
+ * store a newly fetched page in fscache
+ */
+extern void nfs_readpage_to_fscache_complete(struct page *, void *, int);
+
+static inline void nfs_readpage_to_fscache(struct inode *inode,
+ struct page *page,
+ int sync)
+{
+ int ret;
+
+ if (PageNfsCached(page)) {
+ dfprintk(FSCACHE,
+ "NFS: "
+ "readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
+ NFS_I(inode)->fscache, page, page->index, page->flags,
+ sync);
+
+ if (TestSetPageFsMisc(page))
+ BUG();
+
+ ret = fscache_write_page(NFS_I(inode)->fscache, page,
+ nfs_readpage_to_fscache_complete,
+ NULL, GFP_KERNEL);
+ dfprintk(FSCACHE,
+ "NFS: "
+ "readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
+ page, page->index, page->flags, ret);
+
+ if (ret != 0) {
+ fscache_uncache_page(NFS_I(inode)->fscache, page);
+ atomic_inc(&nfs_fscache_uncache_page);
+ ClearPageNfsCached(page);
+ end_page_fs_misc(page);
+ nfs_fscache_to_error = ret;
+ } else {
+ atomic_inc(&nfs_fscache_to_pages);
+ }
+ }
+}
+
+/*
+ * retrieve a page from fscache
+ */
+extern void nfs_readpage_from_fscache_complete(struct page *, void *, int);
+
+static inline
+int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode,
+ struct page *page)
+{
+ int ret;
+
+ if (!NFS_I(inode)->fscache)
+ return 1;
+
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
+ NFS_I(inode)->fscache, page, page->index, page->flags, inode);
+
+ ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
+ page,
+ nfs_readpage_from_fscache_complete,
+ ctx,
+ GFP_KERNEL);
+
+ switch (ret) {
+ case 0: /* read BIO submitted (page in fscache) */
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache: BIO submitted\n");
+ atomic_inc(&nfs_fscache_from_pages);
+ return ret;
+
+ case -ENOBUFS: /* inode not in cache */
+ case -ENODATA: /* page not in cache */
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache error %d\n", ret);
+ return 1;
+
+ default:
+ dfprintk(FSCACHE, "NFS: readpage_from_fscache %d\n", ret);
+ nfs_fscache_from_error = ret;
+ }
+ return ret;
+}
+
+/*
+ * retrieve a set of pages from fscache
+ */
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ int ret, npages = *nr_pages;
+
+ if (!NFS_I(inode)->fscache)
+ return 1;
+
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
+ NFS_I(inode)->fscache, *nr_pages, inode);
+
+ ret = fscache_read_or_alloc_pages(NFS_I(inode)->fscache,
+ mapping, pages, nr_pages,
+ nfs_readpage_from_fscache_complete,
+ ctx,
+ mapping_gfp_mask(mapping));
+
+
+ switch (ret) {
+ case 0: /* read BIO submitted (page in fscache) */
+ BUG_ON(!list_empty(pages));
+ BUG_ON(*nr_pages != 0);
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache: BIO submitted\n");
+
+ atomic_add(npages, &nfs_fscache_from_pages);
+ return ret;
+
+ case -ENOBUFS: /* inode not in cache */
+ case -ENODATA: /* page not in cache */
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
+ return 1;
+
+ default:
+ dfprintk(FSCACHE,
+ "NFS: nfs_getpages_from_fscache: ret %d\n", ret);
+ nfs_fscache_from_error = ret;
+ }
+
+ return ret;
+}
+
+/*
+ * store an updated page in fscache
+ */
+extern void nfs_writepage_to_fscache_complete(struct page *page, void *data, int error);
+
+static inline void nfs_writepage_to_fscache(struct inode *inode,
+ struct page *page)
+{
+ int error;
+
+ if (PageNfsCached(page) && NFS_I(inode)->fscache) {
+ dfprintk(FSCACHE,
+ "NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
+ NFS_I(inode)->fscache, page, inode);
+
+ error = fscache_write_page(NFS_I(inode)->fscache, page,
+ nfs_writepage_to_fscache_complete,
+ NULL, GFP_KERNEL);
+ if (error != 0) {
+ dfprintk(FSCACHE,
+ "NFS: fscache_write_page error %d\n",
+ error);
+ fscache_uncache_page(NFS_I(inode)->fscache, page);
+ }
+ }
+}
+
+#else /* CONFIG_NFS_FSCACHE */
+static inline int nfs_fscache_register(void) { return 0; }
+static inline void nfs_fscache_unregister(void) {}
+static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
+static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
+static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }
+
+static inline void nfs_fscache_get_fh_cookie(struct inode *inode, int aycache) {}
+static inline void nfs_fscache_set_size(struct inode *inode) {}
+static inline void nfs_fscache_release_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_zap_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_renew_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
+static inline int nfs_fscache_release_page(struct page *page)
+{
+ return 1; /* True: may release page */
+}
+static inline void nfs_fscache_invalidate_page(struct page *page,
+ struct inode *inode,
+ unsigned long offset)
+{
+}
+static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {}
+static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode, struct page *page)
+{
+ return -ENOBUFS;
+}
+static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
+ struct inode *inode,
+ struct address_space *mapping,
+ struct list_head *pages,
+ unsigned *nr_pages)
+{
+ return -ENOBUFS;
+}
+
+static inline void nfs_writepage_to_fscache(struct inode *inode, struct page *page)
+{
+ BUG_ON(PageNfsCached(page));
+}
+
+#endif /* CONFIG_NFS_FSCACHE */
+#endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 08cc4c5..56acba0 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -84,6 +84,7 @@ void nfs_clear_inode(struct inode *inode
BUG_ON(atomic_read(&NFS_I(inode)->data_updates) != 0);
nfs_zap_acl_cache(inode);
nfs_access_zap_cache(inode);
+ nfs_fscache_release_fh_cookie(inode);
}

/**
@@ -129,6 +130,8 @@ void nfs_zap_caches(struct inode *inode)
spin_lock(&inode->i_lock);
nfs_zap_caches_locked(inode);
spin_unlock(&inode->i_lock);
+
+ nfs_fscache_zap_fh_cookie(inode);
}

void nfs_zap_mapping(struct inode *inode, struct address_space *mapping)
@@ -216,6 +219,7 @@ nfs_fhget(struct super_block *sb, struct
};
struct inode *inode = ERR_PTR(-ENOENT);
unsigned long hash;
+ int maycache = 1;

if ((fattr->valid & NFS_ATTR_FATTR) == 0)
goto out_no_inode;
@@ -264,6 +268,7 @@ nfs_fhget(struct super_block *sb, struct
else
inode->i_op = &nfs_mountpoint_inode_operations;
inode->i_fop = NULL;
+ maycache = 0;
}
} else if (S_ISLNK(inode->i_mode))
inode->i_op = &nfs_symlink_inode_operations;
@@ -294,6 +299,8 @@ nfs_fhget(struct super_block *sb, struct
memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
nfsi->access_cache = RB_ROOT;

+ nfs_fscache_get_fh_cookie(inode, maycache);
+
unlock_new_inode(inode);
} else
nfs_refresh_inode(inode, fattr);
@@ -376,6 +383,7 @@ void nfs_setattr_update_inode(struct ino
if ((attr->ia_valid & ATTR_SIZE) != 0) {
nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
inode->i_size = attr->ia_size;
+ nfs_fscache_set_size(inode);
vmtruncate(inode, attr->ia_size);
}
}
@@ -558,6 +566,8 @@ int nfs_open(struct inode *inode, struct
ctx->mode = filp->f_mode;
nfs_file_set_open_context(filp, ctx);
put_nfs_open_context(ctx);
+ if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
+ nfs_fscache_disable_fh_cookie(inode);
return 0;
}

@@ -704,6 +714,8 @@ int nfs_revalidate_mapping(struct inode
spin_unlock(&inode->i_lock);

nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
+ nfs_fscache_renew_fh_cookie(inode);
+
dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
inode->i_sb->s_id,
(long long)NFS_FILEID(inode));
@@ -942,11 +954,13 @@ static int nfs_update_inode(struct inode
if (data_stable) {
inode->i_size = new_isize;
invalid |= NFS_INO_INVALID_DATA;
+ nfs_fscache_set_size(inode);
}
invalid |= NFS_INO_INVALID_ATTR;
} else if (new_isize > cur_isize) {
inode->i_size = new_isize;
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+ nfs_fscache_set_size(inode);
}
nfsi->cache_change_attribute = jiffies;
dprintk("NFS: isize change on server for file %s/%ld\n",
@@ -1158,6 +1172,10 @@ static int __init init_nfs_fs(void)
{
int err;

+ err = nfs_fscache_register();
+ if (err < 0)
+ goto out6;
+
err = nfs_fs_proc_init();
if (err)
goto out5;
@@ -1204,6 +1222,8 @@ out3:
out4:
nfs_fs_proc_exit();
out5:
+ nfs_fscache_unregister();
+out6:
return err;
}

@@ -1214,6 +1234,7 @@ static void __exit exit_nfs_fs(void)
nfs_destroy_readpagecache();
nfs_destroy_inodecache();
nfs_destroy_nfspagecache();
+ nfs_fscache_unregister();
#ifdef CONFIG_PROC_FS
rpc_proc_unregister("nfs");
#endif
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d205466..51b82d1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -4,6 +4,30 @@

#include <linux/mount.h>

+#define NFS_PAGE_WRITING 0
+#define NFS_PAGE_CACHED 1
+
+#define PageNfsBit(bit, page) test_bit(bit, &(page)->private)
+
+#define SetPageNfsBit(bit, page) \
+do { \
+ SetPagePrivate((page)); \
+ set_bit(bit, &(page)->private); \
+} while(0)
+
+#define ClearPageNfsBit(bit, page) \
+do { \
+ clear_bit(bit, &(page)->private); \
+} while(0)
+
+#define PageNfsWriting(page) PageNfsBit(NFS_PAGE_WRITING, (page))
+#define SetPageNfsWriting(page) SetPageNfsBit(NFS_PAGE_WRITING, (page))
+#define ClearPageNfsWriting(page) ClearPageNfsBit(NFS_PAGE_WRITING, (page))
+
+#define PageNfsCached(page) PageNfsBit(NFS_PAGE_CACHED, (page))
+#define SetPageNfsCached(page) SetPageNfsBit(NFS_PAGE_CACHED, (page))
+#define ClearPageNfsCached(page) ClearPageNfsBit(NFS_PAGE_CACHED, (page))
+
struct nfs_string;
struct nfs_mount_data;
struct nfs4_mount_data;
@@ -27,6 +51,11 @@ struct nfs_clone_mount {
rpc_authflavor_t authflavor;
};

+/*
+ * include filesystem caching stuff here
+ */
+#include "fscache.h"
+
/* client.c */
extern struct rpc_program nfs_program;

@@ -153,6 +182,9 @@ extern int nfs4_path_walk(struct nfs_ser
const char *path);
#endif

+/* read.c */
+extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, struct page *);
+
/*
* Determine the device name as a string
*/
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 829af32..a40c052 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -17,6 +17,7 @@ #include <linux/nfs4.h>
#include <linux/nfs_page.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_mount.h>
+#include "internal.h"

#define NFS_PARANOIA 1

@@ -84,7 +85,7 @@ nfs_create_request(struct nfs_open_conte
atomic_set(&req->wb_complete, 0);
req->wb_index = page->index;
page_cache_get(page);
- BUG_ON(PagePrivate(page));
+ BUG_ON(PageNfsWriting(page));
BUG_ON(!PageLocked(page));
BUG_ON(page->mapping->host != inode);
req->wb_offset = offset;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index c2e49c3..d8e4b3b 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -26,11 +26,13 @@ #include <linux/pagemap.h>
#include <linux/sunrpc/clnt.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
#include <linux/smp_lock.h>

#include <asm/system.h>

#include "iostat.h"
+#include "internal.h"

#define NFSDBG_FACILITY NFSDBG_PAGECACHE

@@ -211,13 +213,18 @@ static int nfs_readpage_sync(struct nfs_
}
result = 0;

+ nfs_readpage_to_fscache(inode, page, 1);
+ unlock_page(page);
+
+ return result;
+
io_error:
unlock_page(page);
nfs_readdata_free(rdata);
return result;
}

-static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
+int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
struct page *page)
{
LIST_HEAD(one_request);
@@ -242,6 +249,11 @@ static int nfs_readpage_async(struct nfs

static void nfs_readpage_release(struct nfs_page *req)
{
+ struct inode *d_inode = req->wb_context->dentry->d_inode;
+
+ if (PageUptodate(req->wb_page))
+ nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
+
unlock_page(req->wb_page);

dprintk("NFS: read done (%s/%Ld %d@%Ld)\n",
@@ -633,6 +645,10 @@ int nfs_readpage(struct file *file, stru
ctx = get_nfs_open_context((struct nfs_open_context *)
file->private_data);
if (!IS_SYNC(inode)) {
+ error = nfs_readpage_from_fscache(ctx, inode, page);
+ if (error == 0)
+ goto out;
+
error = nfs_readpage_async(ctx, inode, page);
goto out;
}
@@ -663,6 +679,7 @@ readpage_async_filler(void *data, struct
unsigned int len;

nfs_wb_page(inode, page);
+
len = nfs_page_length(inode, page);
if (len == 0)
return nfs_return_empty_page(page);
@@ -705,6 +722,17 @@ int nfs_readpages(struct file *filp, str
} else
desc.ctx = get_nfs_open_context((struct nfs_open_context *)
filp->private_data);
+
+ /* attempt to read as many of the pages as possible from the cache
+ * - this returns -ENOBUFS immediately if the cookie is negative
+ */
+ ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
+ pages, &nr_pages);
+ if (ret == 0) {
+ put_nfs_open_context(desc.ctx);
+ return ret; /* all read */
+ }
+
ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
if (!list_empty(&head)) {
int err = nfs_pagein_list(&head, server->rpages);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 28108c8..59b0c33 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -290,6 +290,7 @@ static void nfs_show_mount_options(struc
{ NFS_MOUNT_NOAC, ",noac", "" },
{ NFS_MOUNT_NONLM, ",nolock", "" },
{ NFS_MOUNT_NOACL, ",noacl", "" },
+ { NFS_MOUNT_FSCACHE, ",fsc", "" },
{ 0, NULL, NULL }
};
const struct proc_nfs_info *nfs_infop;
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 3ea50ac..251af9b 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -14,6 +14,7 @@ #include <linux/nfs_idmap.h>
#include <linux/nfs_fs.h>

#include "callback.h"
+#include "internal.h"

static const int nfs_set_port_min = 0;
static const int nfs_set_port_max = 65535;
@@ -50,6 +51,48 @@ #endif
.proc_handler = &proc_dointvec_jiffies,
.strategy = &sysctl_jiffies,
},
+#ifdef CONFIG_NFS_FSCACHE
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_from_error",
+ .data = &nfs_fscache_from_error,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_to_error",
+ .data = &nfs_fscache_to_error,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_uncache_page",
+ .data = &nfs_fscache_uncache_page,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_to_pages",
+ .data = &nfs_fscache_to_pages,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_minmax,
+ },
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "fscache_from_pages",
+ .data = &nfs_fscache_from_pages,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
{ .ctl_name = 0 }
};

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 883dd4a..77d0d9d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -64,6 +64,7 @@ #include <linux/smp_lock.h>

#include "delegation.h"
#include "iostat.h"
+#include "internal.h"

#define NFSDBG_FACILITY NFSDBG_PAGECACHE

@@ -157,6 +158,9 @@ static void nfs_grow_file(struct page *p
return;
nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
i_size_write(inode, end);
+#ifdef FSCACHE_WRITE_SUPPORT
+ nfs_set_fscsize(NFS_SERVER(inode), NFS_I(inode), end);
+#endif
}

/* We can set the PG_uptodate flag if we see that a write request
@@ -336,6 +340,9 @@ do_it:
err = -EBADF;
goto out;
}
+
+ nfs_writepage_to_fscache(inode, page);
+
lock_kernel();
if (!IS_SYNC(inode) && inode_referenced) {
err = nfs_writepage_async(ctx, inode, page, 0, offset);
@@ -419,7 +426,7 @@ static int nfs_inode_add_request(struct
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}
- SetPagePrivate(req->wb_page);
+ SetPageNfsWriting(req->wb_page);
nfsi->npages++;
atomic_inc(&req->wb_count);
return 0;
@@ -436,7 +443,7 @@ static void nfs_inode_remove_request(str
BUG_ON (!NFS_WBACK_BUSY(req));

spin_lock(&nfsi->req_lock);
- ClearPagePrivate(req->wb_page);
+ ClearPageNfsWriting(req->wb_page);
radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
nfsi->npages--;
if (!nfsi->npages) {
diff --git a/include/linux/nfs4_mount.h b/include/linux/nfs4_mount.h
index 26b4c83..15199cc 100644
--- a/include/linux/nfs4_mount.h
+++ b/include/linux/nfs4_mount.h
@@ -65,6 +65,7 @@ #define NFS4_MOUNT_INTR 0x0002 /* 1 */
#define NFS4_MOUNT_NOCTO 0x0010 /* 1 */
#define NFS4_MOUNT_NOAC 0x0020 /* 1 */
#define NFS4_MOUNT_STRICTLOCK 0x1000 /* 1 */
+#define NFS4_MOUNT_FSCACHE 0x4000 /* 1 */
#define NFS4_MOUNT_FLAGMASK 0xFFFF

#endif
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 45228c1..5ead2bf 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -182,6 +182,9 @@ #ifdef CONFIG_NFS_V4
int delegation_state;
struct rw_semaphore rwsem;
#endif /* CONFIG_NFS_V4*/
+#ifdef CONFIG_NFS_FSCACHE
+ struct fscache_cookie *fscache;
+#endif
struct inode vfs_inode;
};

@@ -582,6 +585,7 @@ #define NFSDBG_FILE 0x0040
#define NFSDBG_ROOT 0x0080
#define NFSDBG_CALLBACK 0x0100
#define NFSDBG_CLIENT 0x0200
+#define NFSDBG_FSCACHE 0x0400
#define NFSDBG_ALL 0xFFFF

#ifdef __KERNEL__
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 7ccfc7e..c44be53 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -3,6 +3,7 @@ #define _NFS_FS_SB

#include <linux/list.h>
#include <linux/backing-dev.h>
+#include <linux/fscache.h>

struct nfs_iostats;

@@ -67,6 +68,10 @@ #ifdef CONFIG_NFS_V4
char cl_ipaddr[16];
unsigned char cl_id_uniquifier;
#endif
+
+#ifdef CONFIG_NFS_FSCACHE
+ struct fscache_cookie *fscache; /* client index cache cookie */
+#endif
};

/*
diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
index 659c754..278bb4e 100644
--- a/include/linux/nfs_mount.h
+++ b/include/linux/nfs_mount.h
@@ -61,6 +61,7 @@ #define NFS_MOUNT_BROKEN_SUID 0x0400 /*
#define NFS_MOUNT_NOACL 0x0800 /* 4 */
#define NFS_MOUNT_STRICTLOCK 0x1000 /* reserved for NFSv4 */
#define NFS_MOUNT_SECFLAVOUR 0x2000 /* 5 */
+#define NFS_MOUNT_FSCACHE 0x4000
#define NFS_MOUNT_FLAGMASK 0xFFFF

#endif

2006-11-14 20:11:23

by David Howells

[permalink] [raw]
Subject: [PATCH 15/19] CacheFiles: Get the SID under which the CacheFiles module should operate

Get the SID under which the CacheFiles module should operate so that the
SELinux security system can control the accesses it makes.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 20 ++++++++++++++++++++
security/dummy.c | 7 +++++++
security/selinux/hooks.c | 7 +++++++
3 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 5913ae7..8cfeefc 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1171,6 +1171,14 @@ #ifdef CONFIG_SECURITY
* owning security ID, and return the security ID as which the process was
* previously acting.
*
+ * @cachefiles_get_secid:
+ * Determine the security ID for the CacheFiles module to use when
+ * accessing the filesystem containing the cache.
+ * @secid contains the security ID under which cachefiles daemon is
+ * running.
+ * @modsecid contains the pointer to where the security ID for the module
+ * is to be stored.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1358,6 +1366,7 @@ struct security_operations {
u32 (*set_fscreate_secid)(u32 secid);
u32 (*act_as_secid)(u32 secid);
u32 (*act_as_self)(void);
+ int (*cachefiles_get_secid)(u32 secid, u32 *modsecid);

#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2180,6 +2189,11 @@ static inline u32 security_act_as_self(v
return security_ops->act_as_self();
}

+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ return security_ops->cachefiles_get_secid(secid, modsecid);
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2885,6 +2899,12 @@ static inline u32 security_act_as_self(v
return 0;
}

+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ *modsecid = 0;
+ return 0;
+}
+
#endif /* CONFIG_SECURITY */

#ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index 3401ea3..30096ec 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -952,6 +952,12 @@ static u32 dummy_act_as_self(void)
return 0;
}

+static int dummy_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ *modsecid = 0;
+ return 0;
+}
+
#ifdef CONFIG_KEYS
static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
unsigned long flags)
@@ -1111,6 +1117,7 @@ void security_fixup_ops (struct security
set_to_dummy_if_null(ops, set_fscreate_secid);
set_to_dummy_if_null(ops, act_as_secid);
set_to_dummy_if_null(ops, act_as_self);
+ set_to_dummy_if_null(ops, cachefiles_get_secid);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ddac1bc..3a52698 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4586,6 +4586,12 @@ static u32 selinux_act_as_self(void)
return oldactor_sid;
}

+static int selinux_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+ return security_transition_sid(secid, SECINITSID_KERNEL,
+ SECCLASS_PROCESS, modsecid);
+}
+
#ifdef CONFIG_KEYS

static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4773,6 +4779,7 @@ static struct security_operations selinu
.set_fscreate_secid = selinux_set_fscreate_secid,
.act_as_secid = selinux_act_as_secid,
.act_as_self = selinux_act_as_self,
+ .cachefiles_get_secid = selinux_cachefiles_get_secid,

.unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send = selinux_socket_unix_may_send,

2006-11-14 20:10:05

by David Howells

[permalink] [raw]
Subject: [PATCH 16/19] CacheFiles: Deal with LSM when accessing the cache

Make the Cachefiles module deal with LSM/SELinux security when accessing the
cache.

This is the documentation added to:

Documentation/filesystems/caching/cachefiles.txt

on the subject:

==========================
SECURITY MODEL AND SELINUX
==========================

CacheFiles is implemented to deal properly with the LSM security features of
the Linux kernel and the SELinux facility.

One of the problems that CacheFiles faces is that it is generally acting on
behalf of a process, and running in that process's context, and that includes a
security context that is not appropriate for accessing the cache - either
because the files in the cache are inaccessible to that process, or because if
the process creates a file in the cache, that file may be inaccessible to other
processes.

The way CacheFiles works is to temporarily change the security context (fsuid,
fsgid and actor security ID, file creation ID) that the process acts as -
without changing the security context of the process when it the target of an
operation performed by some other process (so signalling and suchlike still
work correctly).


When the CacheFiles module is asked to bind to its cache, it:

(1) Finds the security label attached to the root cache directory and uses
that as the security label with which it will create files. By default,
this is:

cachefiles_var_t

(2) Finds the security label of the process which issued the bind request
(presumed to be the cachefilesd daemon), which by default will be:

cachefilesd_t

and asks LSM to supply a security ID as which it should act given the
daemon's label. By default, this will be:

cachefiles_kernel_t

SELinux transitions the daemon's security ID to the module's security ID
based on a rule of this form in the policy.

type_transition <daemon's-ID> kernel_t : process <module's-ID>;

For instance:

type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;


The module's security ID gives it permission to create, move and remove files
and directories in the cache, to find and access directories and files in the
cache, to set and access extended attributes on cache objects, and to read and
write files in the cache.

The daemon's security ID gives it only a very restricted set of permissions: it
may scan directories, stat files and erase files and directories. It may
not read or write files in the cache, and so it is precluded from accessing the
data cached therein; nor is it permitted to create new files in the cache.


There are policy source files available in:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2

and later versions. In that tarball, see the files:

cachefilesd.te
cachefilesd.fc
cachefilesd.if

They are built and installed directly by the RPM.

If a non-RPM based system is being used, then copy the above files to their own
directory and run:

make -f /usr/share/selinux/devel/Makefile
semodule -i cachefilesd.pp

You will need checkpolicy and selinux-policy-devel installed prior to the
build.


By default, the cache is located in /var/fscache, but if it is desirable that
it should be elsewhere, than either the above policy files must be altered, or
an auxiliary policy must be installed to label the alternate location of the
cache.

For instructions on how to add an auxiliary policy to enable the cache to be
located elsewhere when SELinux is in enforcing mode, please see:

/usr/share/doc/cachefilesd-*/move-cache.txt

When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.


Signed-Off-By: David Howells <[email protected]>
---

Documentation/filesystems/caching/cachefiles.txt | 97 +++++++++++++++++++
fs/cachefiles/Makefile | 4 +
fs/cachefiles/cf-bind.c | 19 ++++
fs/cachefiles/cf-daemon.c | 5 +
fs/cachefiles/cf-interface.c | 35 +++++++
fs/cachefiles/cf-namei.c | 25 -----
fs/cachefiles/cf-security.c | 110 ++++++++++++++++++++++
fs/cachefiles/internal.h | 37 +++++++
8 files changed, 304 insertions(+), 28 deletions(-)

diff --git a/Documentation/filesystems/caching/cachefiles.txt b/Documentation/filesystems/caching/cachefiles.txt
index af074c4..b502cff 100644
--- a/Documentation/filesystems/caching/cachefiles.txt
+++ b/Documentation/filesystems/caching/cachefiles.txt
@@ -18,6 +18,8 @@ Contents:

(*) Cache structure.

+ (*) Security model and SELinux.
+
========
OVERVIEW
========
@@ -296,3 +298,98 @@ or retire them.

Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).
+
+
+==========================
+SECURITY MODEL AND SELINUX
+==========================
+
+CacheFiles is implemented to deal properly with the LSM security features of
+the Linux kernel and the SELinux facility.
+
+One of the problems that CacheFiles faces is that it is generally acting on
+behalf of a process, and running in that process's context, and that includes a
+security context that is not appropriate for accessing the cache - either
+because the files in the cache are inaccessible to that process, or because if
+the process creates a file in the cache, that file may be inaccessible to other
+processes.
+
+The way CacheFiles works is to temporarily change the security context (fsuid,
+fsgid and actor security label) that the process acts as - without changing the
+security context of the process when it the target of an operation performed by
+some other process (so signalling and suchlike still work correctly).
+
+
+When the CacheFiles module is asked to bind to its cache, it:
+
+ (1) Finds the security label attached to the root cache directory and uses
+ that as the security label with which it will create files. By default,
+ this is:
+
+ cachefiles_var_t
+
+ (2) Finds the security label of the process which issued the bind request
+ (presumed to be the cachefilesd daemon), which by default will be:
+
+ cachefilesd_t
+
+ and asks LSM to supply a security ID as which it should act given the
+ daemon's label. By default, this will be:
+
+ cachefiles_kernel_t
+
+ SELinux transitions the daemon's security ID to the module's security ID
+ based on a rule of this form in the policy.
+
+ type_transition <daemon's-ID> kernel_t : process <module's-ID>;
+
+ For instance:
+
+ type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
+
+
+The module's security ID gives it permission to create, move and remove files
+and directories in the cache, to find and access directories and files in the
+cache, to set and access extended attributes on cache objects, and to read and
+write files in the cache.
+
+The daemon's security ID gives it only a very restricted set of permissions: it
+may scan directories, stat files and erase files and directories. It may
+not read or write files in the cache, and so it is precluded from accessing the
+data cached therein; nor is it permitted to create new files in the cache.
+
+
+There are policy source files available in:
+
+ http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
+
+and later versions. In that tarball, see the files:
+
+ cachefilesd.te
+ cachefilesd.fc
+ cachefilesd.if
+
+They are built and installed directly by the RPM.
+
+If a non-RPM based system is being used, then copy the above files to their own
+directory and run:
+
+ make -f /usr/share/selinux/devel/Makefile
+ semodule -i cachefilesd.pp
+
+You will need checkpolicy and selinux-policy-devel installed prior to the
+build.
+
+
+By default, the cache is located in /var/fscache, but if it is desirable that
+it should be elsewhere, than either the above policy files must be altered, or
+an auxiliary policy must be installed to label the alternate location of the
+cache.
+
+For instructions on how to add an auxiliary policy to enable the cache to be
+located elsewhere when SELinux is in enforcing mode, please see:
+
+ /usr/share/doc/cachefilesd-*/move-cache.txt
+
+When the cachefilesd rpm is installed; alternatively, the document can be found
+in the sources.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 9109b75..08dabdb 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -2,7 +2,7 @@ #
# Makefile for caching in a mounted filesystem
#

-cachefiles-objs := \
+cachefiles-y := \
cf-bind.o \
cf-daemon.o \
cf-interface.o \
@@ -11,4 +11,6 @@ cachefiles-objs := \
cf-namei.o \
cf-xattr.o

+cachefiles-$(CONFIG_SECURITY) += cf-security.o
+
obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
index 13ee6be..0ac3a6b 100644
--- a/fs/cachefiles/cf-bind.c
+++ b/fs/cachefiles/cf-bind.c
@@ -89,10 +89,20 @@ static int cachefiles_daemon_add_cache(s
struct nameidata nd;
struct kstatfs stats;
struct dentry *graveyard, *cachedir, *root;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;
int ret;

_enter("");

+ /* we want to work under the module's security ID */
+ ret = cachefiles_get_security_ID(cache);
+ if (ret < 0)
+ return ret;
+
+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+
/* allocate the root index object */
ret = -ENOMEM;

@@ -134,6 +144,12 @@ static int cachefiles_daemon_add_cache(s
if (root->d_sb->s_flags & MS_RDONLY)
goto error_unsupported;

+ /* determine the security context within which we access the cache from
+ * within the kernel */
+ ret = cachefiles_check_security(cache, root);
+ if (ret < 0)
+ goto error_unsupported;
+
/* get the cache size and blocksize */
ret = root->d_sb->s_op->statfs(root, &stats);
if (ret < 0)
@@ -224,7 +240,7 @@ static int cachefiles_daemon_add_cache(s

/* check how much space the cache has */
cachefiles_has_space(cache, 0, 0);
-
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
return 0;

error_add_cache:
@@ -239,6 +255,7 @@ error_unsupported:
error_open_root:
kmem_cache_free(cachefiles_object_jar, fsdef);
error_root_object:
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
kerror("Failed to register: %d", ret);
return ret;
}
diff --git a/fs/cachefiles/cf-daemon.c b/fs/cachefiles/cf-daemon.c
index ea6fe65..ae82685 100644
--- a/fs/cachefiles/cf-daemon.c
+++ b/fs/cachefiles/cf-daemon.c
@@ -517,6 +517,9 @@ static int cachefiles_daemon_cull(struct
{
struct dentry *dir;
struct file *dirfile;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;
int dirfd, fput_needed, ret;

_enter(",%s", args);
@@ -559,7 +562,9 @@ static int cachefiles_daemon_cull(struct
if (!S_ISDIR(dir->d_inode->i_mode))
goto notdir;

+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
ret = cachefiles_cull(cache, dir, args);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);

dput(dir);
_leave(" = %d", ret);
diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
index fd6eb90..f467058 100644
--- a/fs/cachefiles/cf-interface.c
+++ b/fs/cachefiles/cf-interface.c
@@ -33,8 +33,11 @@ static struct fscache_object *cachefiles
struct cachefiles_cache *cache;
struct cachefiles_xattr *auxdata;
unsigned keylen, auxlen;
+ uid_t fsuid;
+ gid_t fsgid;
void *buffer;
char *key;
+ u32 fscreatesid;
int ret;

ASSERT(_parent);
@@ -92,7 +95,9 @@ static struct fscache_object *cachefiles
auxdata->type = cookie->def->type;

/* look up the key, creating any missing bits */
+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
ret = cachefiles_walk_to_object(parent, object, key, auxdata);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
if (ret < 0)
goto lookup_failed;

@@ -176,13 +181,18 @@ static void cachefiles_update_object(str
{
struct cachefiles_object *object;
struct cachefiles_cache *cache;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;

_enter("%p", _object);

object = container_of(_object, struct cachefiles_object, fscache);
cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);

+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
//cachefiles_tree_update_object(super, object);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
}

/*
@@ -192,6 +202,9 @@ static void cachefiles_put_object(struct
{
struct cachefiles_object *object;
struct cachefiles_cache *cache;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;

ASSERT(_object);

@@ -217,7 +230,9 @@ #endif
_object != cache->cache.fsdef
) {
_debug("- retire object %p", object);
+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
cachefiles_delete_object(cache, object);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
}

/* close the filesystem stuff attached to the object */
@@ -251,6 +266,9 @@ #endif
static void cachefiles_sync_cache(struct fscache_cache *_cache)
{
struct cachefiles_cache *cache;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;
int ret;

_enter("%p", _cache);
@@ -259,7 +277,10 @@ static void cachefiles_sync_cache(struct

/* make sure all pages pinned by operations on behalf of the netfs are
* written to disc */
+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
ret = fsync_super(cache->mnt->mnt_sb);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+
if (ret == -EIO)
cachefiles_io_error(cache,
"Attempt to sync backing fs superblock"
@@ -273,12 +294,18 @@ static void cachefiles_sync_cache(struct
static int cachefiles_set_i_size(struct fscache_object *_object, loff_t i_size)
{
struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
struct iattr newattrs;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;
int ret;

_enter("%p,%llu", _object, i_size);

object = container_of(_object, struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);

if (i_size == object->i_size)
return 0;
@@ -291,9 +318,11 @@ static int cachefiles_set_i_size(struct
newattrs.ia_size = i_size;
newattrs.ia_valid = ATTR_SIZE;

+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
mutex_lock(&object->backer->d_inode->i_mutex);
ret = notify_change(object->backer, &newattrs);
mutex_unlock(&object->backer->d_inode->i_mutex);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);

if (ret == -EIO) {
cachefiles_io_error_obj(object, "Size set failed");
@@ -683,7 +712,8 @@ static int cachefiles_read_or_alloc_page
int ret;

object = container_of(_object, struct cachefiles_object, fscache);
- cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);

_enter("{%p},{%lx},,,", object, page->index);

@@ -994,7 +1024,8 @@ static int cachefiles_read_or_alloc_page
int ret, ret2, space;

object = container_of(_object, struct cachefiles_object, fscache);
- cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);

_enter("{%p},,%d,,", object, *nr_pages);

diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
index 1bd5d27..80c9b66 100644
--- a/fs/cachefiles/cf-namei.c
+++ b/fs/cachefiles/cf-namei.c
@@ -18,6 +18,7 @@ #include <linux/quotaops.h>
#include <linux/xattr.h>
#include <linux/mount.h>
#include <linux/namei.h>
+#include <linux/security.h>
#include "internal.h"

/*
@@ -274,8 +275,6 @@ int cachefiles_walk_to_object(struct cac
struct cachefiles_cache *cache;
struct dentry *dir, *next = NULL, *new;
struct qstr name;
- uid_t fsuid;
- gid_t fsgid;
int ret;

_enter("{%p}", parent->dentry);
@@ -292,11 +291,6 @@ int cachefiles_walk_to_object(struct cac
return -ENOBUFS;
}

- fsuid = current->fsuid;
- fsgid = current->fsgid;
- current->fsuid = 0;
- current->fsgid = 0;
-
dir = dget(parent->dentry);

advance:
@@ -502,8 +496,6 @@ lookup_again:
}
}

- current->fsuid = fsuid;
- current->fsgid = fsgid;
object->new = 0;

_leave(" = 0 [%lu]", object->dentry->d_inode->i_ino);
@@ -546,9 +538,6 @@ error:
error_out2:
dput(dir);
error_out:
- current->fsuid = fsuid;
- current->fsgid = fsgid;
-
if (ret == -ENOSPC)
ret = -ENOBUFS;

@@ -565,8 +554,6 @@ struct dentry *cachefiles_get_directory(
{
struct dentry *subdir, *new;
struct qstr name;
- uid_t fsuid;
- gid_t fsgid;
int ret;

_enter("");
@@ -589,11 +576,6 @@ struct dentry *cachefiles_get_directory(
/* search the current directory for the element name */
_debug("lookup '%s' %x", name.name, name.hash);

- fsuid = current->fsuid;
- fsgid = current->fsgid;
- current->fsuid = 0;
- current->fsgid = 0;
-
mutex_lock(&dir->d_inode->i_mutex);

subdir = d_lookup(dir, &name);
@@ -640,9 +622,6 @@ struct dentry *cachefiles_get_directory(

mutex_unlock(&dir->d_inode->i_mutex);

- current->fsuid = fsuid;
- current->fsgid = fsgid;
-
/* we need to make sure the subdir is a directory */
ASSERT(subdir->d_inode);

@@ -691,8 +670,6 @@ nomem_d_alloc:
goto error_out;

error_out:
- current->fsuid = fsuid;
- current->fsgid = fsgid;
_leave(" = %d", ret);
return ERR_PTR(ret);
}
diff --git a/fs/cachefiles/cf-security.c b/fs/cachefiles/cf-security.c
new file mode 100644
index 0000000..4c5f052
--- /dev/null
+++ b/fs/cachefiles/cf-security.c
@@ -0,0 +1,110 @@
+/* CacheFiles security management
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/fs.h>
+#include "internal.h"
+
+/*
+ * determine the security context within which we access the cache from within
+ * the kernel
+ */
+int cachefiles_get_security_ID(struct cachefiles_cache *cache)
+{
+ char *seclabel;
+ u32 seclen, daemon_sid;
+ int ret;
+
+ _enter("");
+
+ cache->access_sid = 0;
+
+ /* ask the security policy to tell us what security ID we should be
+ * using to access the cache, given the security ID that our daemon is
+ * using */
+ security_task_getsecid(current, &daemon_sid);
+
+ ret = security_secid_to_secctx(daemon_sid, &seclabel, &seclen);
+ if (ret < 0)
+ goto error;
+ _debug("Cache Daemon SID: %x '%s'", daemon_sid, seclabel);
+ kfree(seclabel);
+
+ ret = security_cachefiles_get_secid(daemon_sid, &cache->access_sid);
+ if (ret < 0) {
+ printk(KERN_ERR "CacheFiles:"
+ " Security can't provide module SID: error %d",
+ ret);
+ goto error;
+ }
+
+ ret = security_secid_to_secctx(cache->access_sid, &seclabel, &seclen);
+ if (ret < 0)
+ goto error;
+ _debug("Cache Module SID: %x '%s'", cache->access_sid, seclabel);
+ kfree(seclabel);
+
+error:
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * check the security details of the on-disk cache
+ */
+int cachefiles_check_security(struct cachefiles_cache *cache,
+ struct dentry *root)
+{
+ char *seclabel;
+ u32 seclen;
+ int ret;
+
+ _enter("");
+
+ /* use the cache root dir's security ID as the SID with which to create
+ * files */
+ cache->cache_sid = security_inode_get_secid(root->d_inode);
+
+ ret = security_secid_to_secctx(cache->cache_sid, &seclabel, &seclen);
+ if (ret < 0)
+ goto error;
+ _debug("Cache SID: %x '%s'", cache->cache_sid, seclabel);
+ kfree(seclabel);
+
+ /* check that we have permission to create files and directories with
+ * the security ID we've been given */
+ security_act_as_secid(cache->access_sid);
+
+ ret = security_inode_mkdir(root->d_inode, root, 0);
+ if (ret < 0) {
+ printk(KERN_ERR "CacheFiles:"
+ " Security denies permission to make dirs: error %d",
+ ret);
+ goto error2;
+ }
+
+ ret = security_inode_create(root->d_inode, root, 0);
+ if (ret < 0) {
+ printk(KERN_ERR "CacheFiles:"
+ " Security denies permission to create files: error %d",
+ ret);
+ goto error2;
+ }
+
+error2:
+ security_act_as_self();
+error:
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
+ _leave(" = %d", ret);
+ return ret;
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 29c79a3..d56b443 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -30,6 +30,7 @@ #include <linux/fscache-cache.h>
#include <linux/timer.h>
#include <linux/wait.h>
#include <linux/workqueue.h>
+#include <linux/security.h>

struct cachefiles_cache;
struct cachefiles_object;
@@ -80,6 +81,8 @@ struct cachefiles_cache {
struct rb_root active_nodes; /* active nodes (can't be culled) */
rwlock_t active_lock; /* lock for active_nodes */
atomic_t gravecounter; /* graveyard uniquifier */
+ u32 access_sid; /* cache access SID */
+ u32 cache_sid; /* cache fs object SID */
unsigned frun_percent; /* when to stop culling (% files) */
unsigned fcull_percent; /* when to start culling (% files) */
unsigned fstop_percent; /* when to stop allocating (% files) */
@@ -181,6 +184,40 @@ extern int cachefiles_cull(struct cachef
char *filename);

/*
+ * cf-security.c
+ */
+#ifdef CONFIG_SECURITY
+extern int cachefiles_get_security_ID(struct cachefiles_cache *cache);
+extern int cachefiles_check_security(struct cachefiles_cache *cache,
+ struct dentry *root);
+#else
+#define cachefiles_get_security_ID(cache) (0)
+#define cachefiles_check_security(cache, root) (0)
+#endif
+
+static inline void cachefiles_begin_secure(struct cachefiles_cache *cache,
+ uid_t *fsuid, gid_t *fsgid,
+ u32 *fscreatesid)
+{
+ security_act_as_secid(cache->access_sid);
+ *fscreatesid = security_set_fscreate_secid(cache->cache_sid);
+ *fsuid = current->fsuid;
+ *fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+}
+
+static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
+ uid_t fsuid, gid_t fsgid,
+ u32 fscreatesid)
+{
+ current->fsuid = fsuid;
+ current->fsgid = fsgid;
+ security_set_fscreate_secid(fscreatesid);
+ security_act_as_self();
+}
+
+/*
* cf-xattr.c
*/
extern int cachefiles_check_object_type(struct cachefiles_object *object);

2006-11-14 20:11:23

by David Howells

[permalink] [raw]
Subject: [PATCH 08/19] CacheFiles: Add a function to write a single page of data to an inode

Add a function to write one single whole page of data to an inode at a
page-aligned location (thus permitting the function to be highly optimised).
The function uses the prepare_write() and commit_write() address_space
operations to bound the actual write.

This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/fs.h | 2 +
mm/filemap.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2fe6e3f..2bb027f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1748,6 +1748,8 @@ extern ssize_t generic_file_direct_write
unsigned long *, loff_t, loff_t *, size_t, size_t);
extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_kernel_page(struct address_space *,
+ pgoff_t, struct page *);
extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos);
extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
extern ssize_t generic_file_sendfile(struct file *, loff_t *, size_t, read_actor_t, void *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 1b73d3a..b9eb8b2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2236,6 +2236,93 @@ zero_length_segment:
}
EXPORT_SYMBOL(generic_file_buffered_write);

+/**
+ * generic_file_buffered_write_one_kernel_page - Write a single page of data to
+ * an inode
+ * @mapping - The address space of the target inode
+ * @index - The target page in the target inode to fill
+ * @source - The data to write into the target page
+ *
+ * Write the data from the source page to the page in the nominated address
+ * space at the @index specified
+ *
+ * The @source page does not need to have any association with the file or the
+ * target page offset
+ */
+int
+generic_file_buffered_write_one_kernel_page(struct address_space *mapping,
+ pgoff_t index,
+ struct page *source)
+{
+ const struct address_space_operations *a_ops = mapping->a_ops;
+ struct pagevec lru_pvec;
+ struct page *page, *cached_page = NULL;
+ long status = 0;
+
+ pagevec_init(&lru_pvec, 0);
+
+ page = __grab_cache_page(mapping, index, &cached_page, &lru_pvec);
+ if (!page) {
+ BUG_ON(cached_page);
+ return -ENOMEM;
+ }
+
+ status = a_ops->prepare_write(NULL, page, 0, PAGE_CACHE_SIZE);
+ if (unlikely(status)) {
+ loff_t isize = i_size_read(mapping->host);
+
+ if (status != AOP_TRUNCATED_PAGE)
+ unlock_page(page);
+ page_cache_release(page);
+ if (status == AOP_TRUNCATED_PAGE)
+ goto sync;
+
+ /* prepare_write() may have instantiated a few blocks outside
+ * i_size. Trim these off again.
+ */
+ if ((1ULL << (index + 1)) > isize)
+ vmtruncate(mapping->host, isize);
+ goto sync;
+ }
+
+ copy_highpage(page, source);
+ flush_dcache_page(page);
+
+ status = a_ops->commit_write(NULL, page, 0, PAGE_CACHE_SIZE);
+ if (status == AOP_TRUNCATED_PAGE) {
+ page_cache_release(page);
+ goto sync;
+ }
+
+ if (status > 0)
+ status = 0;
+
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+ if (status < 0)
+ return status;
+
+ balance_dirty_pages_ratelimited(mapping);
+ cond_resched();
+
+sync:
+ if (cached_page)
+ page_cache_release(cached_page);
+
+ /* the caller must handle O_SYNC themselves, but we handle S_SYNC and
+ * MS_SYNCHRONOUS here */
+ if (unlikely(IS_SYNC(mapping->host)) && !a_ops->writepage)
+ status = generic_osync_inode(mapping->host, mapping,
+ OSYNC_METADATA | OSYNC_DATA);
+
+ /* the caller must handle O_DIRECT for themselves */
+
+ pagevec_lru_add(&lru_pvec);
+ return status;
+}
+EXPORT_SYMBOL(generic_file_buffered_write_one_kernel_page);
+
static ssize_t
__generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t *ppos)

2006-11-14 20:12:25

by David Howells

[permalink] [raw]
Subject: [PATCH 09/19] CacheFiles: Permit the page lock state to be monitored

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/pagemap.h | 5 +++++
mm/filemap.c | 19 +++++++++++++++++++
2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 24fdc48..a86693c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -200,6 +200,11 @@ static inline void wait_on_page_fs_misc(
extern void fastcall end_page_fs_misc(struct page *page);

/*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
* This assumes that two userspace pages are always sufficient. That's
diff --git a/mm/filemap.c b/mm/filemap.c
index b9eb8b2..4c9c1ac 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -517,6 +517,25 @@ void fastcall wait_on_page_bit(struct pa
EXPORT_SYMBOL(wait_on_page_bit);

/**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+ wait_queue_head_t *q = page_waitqueue(page);
+ unsigned long flags;
+
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue(q, waiter);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
* unlock_page - unlock a locked page
* @page: the page
*

2006-11-14 20:12:59

by David Howells

[permalink] [raw]
Subject: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

Make it possible for a process's file creation SID to be temporarily overridden
by CacheFiles so that files created in the cache have the right label attached.

Without this facility, files created in the cache will be given the current
file creation SID of whatever process happens to have invoked CacheFiles
indirectly by means of opening a netfs file at the time the cache file is
created.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 35 +++++++++++++++++++++++++++++++++++
security/dummy.c | 12 ++++++++++++
security/selinux/hooks.c | 18 ++++++++++++++++++
3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index b200b98..7955017 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1154,6 +1154,13 @@ #ifdef CONFIG_SECURITY
* @secdata contains the security context.
* @seclen contains the length of the security context.
*
+ * @get_fscreate_secid:
+ * Get the current FS security ID.
+ *
+ * @set_fscreate_secid:
+ * Set the current FS security ID.
+ * @secid contains the security ID to set.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1336,6 +1343,8 @@ struct security_operations {
int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size);
int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen);
void (*release_secctx)(char *secdata, u32 seclen);
+ u32 (*get_fscreate_secid)(void);
+ u32 (*set_fscreate_secid)(u32 secid);

#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2131,6 +2140,16 @@ static inline void security_release_secc
return security_ops->release_secctx(secdata, seclen);
}

+static inline u32 security_get_fscreate_secid(void)
+{
+ return security_ops->get_fscreate_secid();
+}
+
+static inline u32 security_set_fscreate_secid(u32 secid)
+{
+ return security_ops->set_fscreate_secid(secid);
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2797,6 +2816,11 @@ static inline void securityfs_remove(str
{
}

+static inline int security_to_secctx_secid(char *secdata, u32 seclen, u32 *secid)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
{
return -EOPNOTSUPP;
@@ -2805,6 +2829,17 @@ static inline int security_secid_to_secc
static inline void security_release_secctx(char *secdata, u32 seclen)
{
}
+
+static inline u32 security_get_fscreate_secid(void)
+{
+ return 0;
+}
+
+static inline u32 security_set_fscreate_secid(u32 secid)
+{
+ return 0;
+}
+
#endif /* CONFIG_SECURITY */

#ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index 43874c1..ee3c886 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -927,6 +927,16 @@ static void dummy_release_secctx(char *s
{
}

+static u32 dummy_get_fscreate_secid(void)
+{
+ return 0;
+}
+
+static u32 dummy_set_fscreate_secid(u32 secid)
+{
+ return 0;
+}
+
#ifdef CONFIG_KEYS
static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
unsigned long flags)
@@ -1081,6 +1091,8 @@ void security_fixup_ops (struct security
set_to_dummy_if_null(ops, setprocattr);
set_to_dummy_if_null(ops, secid_to_secctx);
set_to_dummy_if_null(ops, release_secctx);
+ set_to_dummy_if_null(ops, get_fscreate_secid);
+ set_to_dummy_if_null(ops, set_fscreate_secid);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 8ab5679..7f5ec86 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4529,6 +4529,22 @@ static void selinux_release_secctx(char
kfree(secdata);
}

+static u32 selinux_get_fscreate_secid(void)
+{
+ struct task_security_struct *tsec = current->security;
+
+ return tsec->create_sid;
+}
+
+static u32 selinux_set_fscreate_secid(u32 secid)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldsid = tsec->create_sid;
+
+ tsec->create_sid = secid;
+ return oldsid;
+}
+
#ifdef CONFIG_KEYS

static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4711,6 +4727,8 @@ static struct security_operations selinu

.secid_to_secctx = selinux_secid_to_secctx,
.release_secctx = selinux_release_secctx,
+ .get_fscreate_secid = selinux_get_fscreate_secid,
+ .set_fscreate_secid = selinux_set_fscreate_secid,

.unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send = selinux_socket_unix_may_send,

2006-11-14 20:12:25

by David Howells

[permalink] [raw]
Subject: [PATCH 10/19] CacheFiles: Export things for CacheFiles

Export a number of functions for CacheFiles's use.

Signed-Off-By: David Howells <[email protected]>
---

fs/file_table.c | 1 +
fs/super.c | 2 ++
kernel/auditsc.c | 2 ++
3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 24f25a0..10dec73 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -235,6 +235,7 @@ struct file fastcall *fget_light(unsigne
return file;
}

+EXPORT_SYMBOL_GPL(fget_light);

void put_filp(struct file *file)
{
diff --git a/fs/super.c b/fs/super.c
index 47e554c..da8020d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -251,6 +251,8 @@ int fsync_super(struct super_block *sb)
return sync_blockdev(sb->s_bdev);
}

+EXPORT_SYMBOL_GPL(fsync_super);
+
/**
* generic_shutdown_super - common helper for ->kill_sb()
* @sb: superblock to kill
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 42f2f11..05908b9 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1425,6 +1425,8 @@ #endif
audit_copy_inode(&context->names[idx], inode);
}

+EXPORT_SYMBOL_GPL(__audit_inode_child);
+
/**
* auditsc_get_stamp - get local copies of audit_context values
* @ctx: audit_context for the task

2006-11-14 20:11:52

by David Howells

[permalink] [raw]
Subject: [PATCH 06/19] FS-Cache: NFS: Only obtain cache cookies on file open, not on inode read

Make the NFS filesystem only obtain a cache cookie for a regular file when it's
actually opened rather than when the inode is fetched in. Directories and
special files aren't currently cached on NFS.

Normally, in a filesystem, an inode would be instantiated only when it's
actually going to be used, but in the case of NFS it will be created by readdir
listing a directory entry referring to it too.

This meant that ls -lR or find would attempt to load all the regular file
inodes in a tree into the cache, rather than none of them. With this patch,
none of them would be loaded.

Signed-Off-By: David Howells <[email protected]>
---

fs/nfs/fscache.h | 41 ++++++++++++++++++++++++++++++++++++-----
fs/nfs/inode.c | 5 ++---
include/linux/nfs_fs.h | 2 ++
3 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 00a2c07..0be6ffe 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -90,14 +90,25 @@ static inline const char *nfs_server_fsc
/*
* get the per-filehandle cookie for an NFS inode
*/
-static inline void nfs_fscache_get_fh_cookie(struct inode *inode,
- int maycache)
+static inline void nfs_fscache_init_fh_cookie(struct inode *inode)
+{
+ NFS_I(inode)->fscache = NULL;
+ if (S_ISREG(inode->i_mode))
+ set_bit(NFS_INO_CACHEABLE, &NFS_I(inode)->flags);
+}
+
+/*
+ * get the per-filehandle cookie for an NFS inode
+ */
+static inline void nfs_fscache_enable_fh_cookie(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
struct nfs_inode *nfsi = NFS_I(inode);

- nfsi->fscache = NULL;
- if (maycache && (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
+ if (nfsi->fscache || !NFS_CACHEABLE(inode))
+ return;
+
+ if ((NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
nfsi->fscache = fscache_acquire_cookie(
NFS_SB(sb)->nfs_client->fscache,
&nfs_cache_fh_index_def,
@@ -179,6 +190,8 @@ static inline void nfs_fscache_zap_fh_co
*/
static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
{
+ clear_bit(NFS_INO_CACHEABLE, &NFS_I(inode)->flags);
+
if (NFS_I(inode)->fscache) {
dfprintk(FSCACHE,
"NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
@@ -194,6 +207,22 @@ static inline void nfs_fscache_disable_f
}

/*
+ * decide if we should enable or disable the FS cache for this inode
+ * - for now, only regular files that are open read-only will be able to use
+ * the cache
+ */
+static inline void nfs_fscache_set_fh_cookie(struct inode *inode,
+ struct file *filp)
+{
+ if (NFS_CACHEABLE(inode)) {
+ if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
+ nfs_fscache_disable_fh_cookie(inode);
+ else
+ nfs_fscache_enable_fh_cookie(inode);
+ }
+}
+
+/*
* install the VM ops for mmap() of an NFS file so that we can hold up writes
* to pages on shared writable mappings until the store to the cache is
* complete
@@ -431,12 +460,14 @@ static inline void nfs4_fscache_get_clie
static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }

-static inline void nfs_fscache_get_fh_cookie(struct inode *inode, int aycache) {}
+static inline void nfs_fscache_init_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_enable_fh_cookie(struct inode *inode) {}
static inline void nfs_fscache_set_size(struct inode *inode) {}
static inline void nfs_fscache_release_fh_cookie(struct inode *inode) {}
static inline void nfs_fscache_zap_fh_cookie(struct inode *inode) {}
static inline void nfs_fscache_renew_fh_cookie(struct inode *inode) {}
static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
+static inline void nfs_fscache_set_fh_cookie(struct inode *inode, struct file *filp) {}
static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
static inline int nfs_fscache_release_page(struct page *page)
{
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 56acba0..0d683eb 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -299,7 +299,7 @@ nfs_fhget(struct super_block *sb, struct
memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
nfsi->access_cache = RB_ROOT;

- nfs_fscache_get_fh_cookie(inode, maycache);
+ nfs_fscache_init_fh_cookie(inode);

unlock_new_inode(inode);
} else
@@ -566,8 +566,7 @@ int nfs_open(struct inode *inode, struct
ctx->mode = filp->f_mode;
nfs_file_set_open_context(filp, ctx);
put_nfs_open_context(ctx);
- if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
- nfs_fscache_disable_fh_cookie(inode);
+ nfs_fscache_set_fh_cookie(inode, filp);
return 0;
}

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 5ead2bf..b2e5e86 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -205,6 +205,7 @@ #define NFS_INO_REVALIDATING (0) /* rev
#define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */
#define NFS_INO_STALE (2) /* possible stale inode */
#define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */
+#define NFS_INO_CACHEABLE (4) /* inode can be cached by FS-Cache */

static inline struct nfs_inode *NFS_I(struct inode *inode)
{
@@ -230,6 +231,7 @@ #define NFS_ATTRTIMEO_UPDATE(inode) (NFS

#define NFS_FLAGS(inode) (NFS_I(inode)->flags)
#define NFS_STALE(inode) (test_bit(NFS_INO_STALE, &NFS_FLAGS(inode)))
+#define NFS_CACHEABLE(inode) (test_bit(NFS_INO_CACHEABLE, &NFS_FLAGS(inode)))

#define NFS_FILEID(inode) (NFS_I(inode)->fileid)

2006-11-14 20:13:39

by David Howells

[permalink] [raw]
Subject: [PATCH 19/19] CacheFiles: Permit daemon to probe inuseness of a cache file

Permit the daemon to probe to see whether a cache file is in use by a netfs or
not.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-daemon.c | 73 +++++++++++++++++++
fs/cachefiles/cf-namei.c | 170 +++++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 3 +
3 files changed, 246 insertions(+), 0 deletions(-)

diff --git a/fs/cachefiles/cf-daemon.c b/fs/cachefiles/cf-daemon.c
index ae82685..ee07865 100644
--- a/fs/cachefiles/cf-daemon.c
+++ b/fs/cachefiles/cf-daemon.c
@@ -38,6 +38,7 @@ static int cachefiles_daemon_cull(struct
static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args);
static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args);
static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args);
+static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args);

static unsigned long cachefiles_open;

@@ -66,6 +67,7 @@ static const struct cachefiles_daemon_cm
{ "frun", cachefiles_daemon_frun },
{ "fcull", cachefiles_daemon_fcull },
{ "fstop", cachefiles_daemon_fstop },
+ { "inuse", cachefiles_daemon_inuse },
{ "tag", cachefiles_daemon_tag },
{ "", NULL }
};
@@ -602,3 +604,74 @@ inval:
kerror("debug command requires mask");
return -EINVAL;
}
+
+/*
+ * find out whether an object is in use or not
+ * - command: "inuse <dirfd> <name>"
+ */
+static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
+{
+ struct dentry *dir;
+ struct file *dirfile;
+ uid_t fsuid;
+ gid_t fsgid;
+ u32 fscreatesid;
+ int dirfd, fput_needed, ret;
+
+ _enter(",%s", args);
+
+ dirfd = simple_strtoul(args, &args, 0);
+
+ if (!isspace(*args))
+ goto inval;
+
+ while (isspace(*args))
+ args++;
+
+ if (!*args)
+ goto inval;
+
+ if (strchr(args, '/'))
+ goto inval;
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags)) {
+ kerror("inuse applied to unready cache");
+ return -EIO;
+ }
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ kerror("inuse applied to dead cache");
+ return -EIO;
+ }
+
+ /* extract the directory dentry from the fd */
+ dirfile = fget_light(dirfd, &fput_needed);
+ if (!dirfile) {
+ kerror("cull dirfd not open");
+ return -EBADF;
+ }
+
+ dir = dget(dirfile->f_dentry);
+ fput_light(dirfile, fput_needed);
+ dirfile = NULL;
+
+ if (!S_ISDIR(dir->d_inode->i_mode))
+ goto notdir;
+
+ cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ ret = cachefiles_check_in_use(cache, dir, args);
+ cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+
+ dput(dir);
+ _leave(" = %d", ret);
+ return ret;
+
+notdir:
+ dput(dir);
+ kerror("inuse command requires dirfd to be a directory");
+ return -ENOTDIR;
+
+inval:
+ kerror("inuse command requires dirfd and filename");
+ return -EINVAL;
+}
diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
index a3df94a..d0db9b3 100644
--- a/fs/cachefiles/cf-namei.c
+++ b/fs/cachefiles/cf-namei.c
@@ -524,6 +524,7 @@ nomem_d_alloc:
return ERR_PTR(-ENOMEM);
}

+#if 0
/*
* cull an object if it's not in use
* - called only by cache manager daemon
@@ -631,3 +632,172 @@ choose_error:
_leave(" = %d", ret);
return ret;
}
+#endif
+
+/*
+ * find out if an object is in use or not
+ * - if finds object and it's not in use:
+ * - returns a pointer to the object and a reference on it
+ * - returns with the directory locked
+ */
+static struct dentry *cachefiles_check_active(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ char *filename)
+{
+ struct cachefiles_object *object;
+ struct rb_node *_n;
+ struct dentry *victim;
+ int ret;
+
+ _enter(",%*.*s/,%s",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+ /* look up the victim */
+ mutex_lock(&dir->d_inode->i_mutex);
+
+ victim = lookup_one_len(filename, dir, strlen(filename));
+ if (IS_ERR(victim))
+ goto lookup_error;
+
+ _debug("victim -> %p %s",
+ victim, victim->d_inode ? "positive" : "negative");
+
+ /* if the object is no longer there then we probably retired the object
+ * at the netfs's request whilst the cull was in progress
+ */
+ if (!victim->d_inode) {
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(victim);
+ _leave(" = -ENOENT [absent]");
+ return ERR_PTR(-ENOENT);
+ }
+
+ /* check to see if we're using this object */
+ read_lock(&cache->active_lock);
+
+ _n = cache->active_nodes.rb_node;
+
+ while (_n) {
+ object = rb_entry(_n, struct cachefiles_object, active_node);
+
+ if (object->dentry > victim)
+ _n = _n->rb_left;
+ else if (object->dentry < victim)
+ _n = _n->rb_right;
+ else
+ goto object_in_use;
+ }
+
+ read_unlock(&cache->active_lock);
+
+ _leave(" = %p", victim);
+ return victim;
+
+object_in_use:
+ read_unlock(&cache->active_lock);
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(victim);
+ _leave(" = -EBUSY [in use]");
+ return ERR_PTR(-EBUSY);
+
+lookup_error:
+ mutex_unlock(&dir->d_inode->i_mutex);
+ ret = PTR_ERR(victim);
+ if (ret == -ENOENT) {
+ /* file or dir now absent - probably retired by netfs */
+ _leave(" = -ESTALE [absent]");
+ return ERR_PTR(-ESTALE);
+ }
+
+ if (ret == -EIO) {
+ cachefiles_io_error(cache, "Lookup failed");
+ } else if (ret != -ENOMEM) {
+ kerror("Internal error: %d", ret);
+ ret = -EIO;
+ }
+
+ _leave(" = %d", ret);
+ return ERR_PTR(ret);
+}
+
+/*
+ * cull an object if it's not in use
+ * - called only by cache manager daemon
+ */
+int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+ int ret;
+
+ _enter(",%*.*s/,%s",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+ victim = cachefiles_check_active(cache, dir, filename);
+ if (IS_ERR(victim))
+ return PTR_ERR(victim);
+
+ _debug("victim -> %p %s",
+ victim, victim->d_inode ? "positive" : "negative");
+
+ /* okay... the victim is not being used so we can cull it
+ * - start by marking it as stale
+ */
+ _debug("victim is cullable");
+
+ ret = cachefiles_remove_object_xattr(cache, victim);
+ if (ret < 0)
+ goto error_unlock;
+
+ /* actually remove the victim (drops the dir mutex) */
+ _debug("bury");
+
+ ret = cachefiles_bury_object(cache, dir, victim);
+ if (ret < 0)
+ goto error;
+
+ dput(victim);
+ _leave(" = 0");
+ return 0;
+
+error_unlock:
+ mutex_unlock(&dir->d_inode->i_mutex);
+error:
+ dput(victim);
+ if (ret == -ENOENT) {
+ /* file or dir now absent - probably retired by netfs */
+ _leave(" = -ESTALE [absent]");
+ return -ESTALE;
+ }
+
+ if (ret != -ENOMEM) {
+ kerror("Internal error: %d", ret);
+ ret = -EIO;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * find out if an object is in use or not
+ * - called only by cache manager daemon
+ * - returns -EBUSY or 0 to indicate whether an object is in use or not
+ */
+int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+
+ _enter(",%*.*s/,%s",
+ dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
+
+ victim = cachefiles_check_active(cache, dir, filename);
+ if (IS_ERR(victim))
+ return PTR_ERR(victim);
+
+ mutex_unlock(&dir->d_inode->i_mutex);
+ dput(victim);
+ _leave(" = 0");
+ return 0;
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index d56b443..1b7ada2 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -183,6 +183,9 @@ extern struct dentry *cachefiles_get_dir
extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
char *filename);

+extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
+ struct dentry *dir, char *filename);
+
/*
* cf-security.c
*/

2006-11-14 20:14:06

by David Howells

[permalink] [raw]
Subject: [PATCH 13/19] CacheFiles: Add an act-as SID override in task_security_struct

Add an act-as SID to task_security_struct that is equivalent to fsuid/fsgid in
task_struct. This permits a task to perform operations as if it is the
overriding SID, without changing its own SID as that might be needed to control
access to the process by ptrace, signals, /proc, etc.

This is useful for CacheFiles in that it allows CacheFiles to access the cache
files and directories using the cache's security context rather than the
security context of the process on whose behalf it is working, and in the
context of which it is running.

Signed-Off-By: David Howells <[email protected]>
---

include/linux/security.h | 32 +++++++
security/dummy.c | 12 +++
security/selinux/exports.c | 2
security/selinux/hooks.c | 160 +++++++++++++++++++++++--------------
security/selinux/include/objsec.h | 1
security/selinux/selinuxfs.c | 2
security/selinux/xfrm.c | 6 +
7 files changed, 148 insertions(+), 67 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 7955017..63617e4 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1161,6 +1161,16 @@ #ifdef CONFIG_SECURITY
* Set the current FS security ID.
* @secid contains the security ID to set.
*
+ * @act_as_secid:
+ * Set the security ID as which to act, returning the security ID as which
+ * the process was previously acting.
+ * @secid contains the security ID to act as.
+ *
+ * @act_as_self:
+ * Reset the security ID as which to act to be the same as the process's
+ * owning security ID, and return the security ID as which the process was
+ * previously acting.
+ *
* This is the main security structure.
*/
struct security_operations {
@@ -1345,6 +1355,8 @@ struct security_operations {
void (*release_secctx)(char *secdata, u32 seclen);
u32 (*get_fscreate_secid)(void);
u32 (*set_fscreate_secid)(u32 secid);
+ u32 (*act_as_secid)(u32 secid);
+ u32 (*act_as_self)(void);

#ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2150,6 +2162,16 @@ static inline u32 security_set_fscreate_
return security_ops->set_fscreate_secid(secid);
}

+static inline u32 security_act_as_secid(u32 secid)
+{
+ return security_ops->act_as_secid(secid);
+}
+
+static inline u32 security_act_as_self(void)
+{
+ return security_ops->act_as_self();
+}
+
/* prototypes */
extern int security_init (void);
extern int register_security (struct security_operations *ops);
@@ -2840,6 +2862,16 @@ static inline u32 security_set_fscreate_
return 0;
}

+static inline u32 security_act_as_secid(u32 secid)
+{
+ return 0;
+}
+
+static inline u32 security_act_as_self(void)
+{
+ return 0;
+}
+
#endif /* CONFIG_SECURITY */

#ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index ee3c886..f7b47a9 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -937,6 +937,16 @@ static u32 dummy_set_fscreate_secid(u32
return 0;
}

+static u32 dummy_act_as_secid(u32 secid)
+{
+ return 0;
+}
+
+static u32 dummy_act_as_self(void)
+{
+ return 0;
+}
+
#ifdef CONFIG_KEYS
static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
unsigned long flags)
@@ -1093,6 +1103,8 @@ void security_fixup_ops (struct security
set_to_dummy_if_null(ops, release_secctx);
set_to_dummy_if_null(ops, get_fscreate_secid);
set_to_dummy_if_null(ops, set_fscreate_secid);
+ set_to_dummy_if_null(ops, act_as_secid);
+ set_to_dummy_if_null(ops, act_as_self);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/exports.c b/security/selinux/exports.c
index b6f9694..b559699 100644
--- a/security/selinux/exports.c
+++ b/security/selinux/exports.c
@@ -79,7 +79,7 @@ int selinux_relabel_packet_permission(u3
if (selinux_enabled) {
struct task_security_struct *tsec = current->security;

- return avc_has_perm(tsec->sid, sid, SECCLASS_PACKET,
+ return avc_has_perm(tsec->actor_sid, sid, SECCLASS_PACKET,
PACKET__RELABELTO, NULL);
}
return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 7f5ec86..09def09 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -162,7 +162,8 @@ static int task_alloc_security(struct ta
return -ENOMEM;

tsec->task = task;
- tsec->osid = tsec->sid = tsec->ptrace_sid = SECINITSID_UNLABELED;
+ tsec->osid = tsec->actor_sid = tsec->sid = tsec->ptrace_sid =
+ SECINITSID_UNLABELED;
task->security = tsec;

return 0;
@@ -190,7 +191,7 @@ static int inode_alloc_security(struct i
isec->inode = inode;
isec->sid = SECINITSID_UNLABELED;
isec->sclass = SECCLASS_FILE;
- isec->task_sid = tsec->sid;
+ isec->task_sid = tsec->actor_sid;
inode->i_security = isec;

return 0;
@@ -220,8 +221,8 @@ static int file_alloc_security(struct fi
return -ENOMEM;

fsec->file = file;
- fsec->sid = tsec->sid;
- fsec->fown_sid = tsec->sid;
+ fsec->sid = tsec->actor_sid;
+ fsec->fown_sid = tsec->actor_sid;
file->f_security = fsec;

return 0;
@@ -338,12 +339,12 @@ static int may_context_mount_sb_relabel(
{
int rc;

- rc = avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
+ rc = avc_has_perm(tsec->actor_sid, sbsec->sid, SECCLASS_FILESYSTEM,
FILESYSTEM__RELABELFROM, NULL);
if (rc)
return rc;

- rc = avc_has_perm(tsec->sid, sid, SECCLASS_FILESYSTEM,
+ rc = avc_has_perm(tsec->actor_sid, sid, SECCLASS_FILESYSTEM,
FILESYSTEM__RELABELTO, NULL);
return rc;
}
@@ -353,7 +354,7 @@ static int may_context_mount_inode_relab
struct task_security_struct *tsec)
{
int rc;
- rc = avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
+ rc = avc_has_perm(tsec->actor_sid, sbsec->sid, SECCLASS_FILESYSTEM,
FILESYSTEM__RELABELFROM, NULL);
if (rc)
return rc;
@@ -1030,7 +1031,7 @@ static int task_has_perm(struct task_str

tsec1 = tsk1->security;
tsec2 = tsk2->security;
- return avc_has_perm(tsec1->sid, tsec2->sid,
+ return avc_has_perm(tsec1->actor_sid, tsec2->sid,
SECCLASS_PROCESS, perms, NULL);
}

@@ -1047,7 +1048,7 @@ static int task_has_capability(struct ta
ad.tsk = tsk;
ad.u.cap = cap;

- return avc_has_perm(tsec->sid, tsec->sid,
+ return avc_has_perm(tsec->actor_sid, tsec->actor_sid,
SECCLASS_CAPABILITY, CAP_TO_MASK(cap), &ad);
}

@@ -1059,7 +1060,7 @@ static int task_has_system(struct task_s

tsec = tsk->security;

- return avc_has_perm(tsec->sid, SECINITSID_KERNEL,
+ return avc_has_perm(tsec->actor_sid, SECINITSID_KERNEL,
SECCLASS_SYSTEM, perms, NULL);
}

@@ -1084,7 +1085,8 @@ static int inode_has_perm(struct task_st
ad.u.fs.inode = inode;
}

- return avc_has_perm(tsec->sid, isec->sid, isec->sclass, perms, adp);
+ return avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, perms,
+ adp);
}

/* Same as inode_has_perm, but pass explicit audit data containing
@@ -1127,8 +1129,8 @@ static int file_has_perm(struct task_str
ad.u.fs.mnt = mnt;
ad.u.fs.dentry = dentry;

- if (tsec->sid != fsec->sid) {
- rc = avc_has_perm(tsec->sid, fsec->sid,
+ if (tsec->actor_sid != fsec->sid) {
+ rc = avc_has_perm(tsec->actor_sid, fsec->sid,
SECCLASS_FD,
FD__USE,
&ad);
@@ -1162,7 +1164,7 @@ static int may_create(struct inode *dir,
AVC_AUDIT_DATA_INIT(&ad, FS);
ad.u.fs.dentry = dentry;

- rc = avc_has_perm(tsec->sid, dsec->sid, SECCLASS_DIR,
+ rc = avc_has_perm(tsec->actor_sid, dsec->sid, SECCLASS_DIR,
DIR__ADD_NAME | DIR__SEARCH,
&ad);
if (rc)
@@ -1171,13 +1173,13 @@ static int may_create(struct inode *dir,
if (tsec->create_sid && sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
newsid = tsec->create_sid;
} else {
- rc = security_transition_sid(tsec->sid, dsec->sid, tclass,
- &newsid);
+ rc = security_transition_sid(tsec->actor_sid, dsec->sid,
+ tclass, &newsid);
if (rc)
return rc;
}

- rc = avc_has_perm(tsec->sid, newsid, tclass, FILE__CREATE, &ad);
+ rc = avc_has_perm(tsec->actor_sid, newsid, tclass, FILE__CREATE, &ad);
if (rc)
return rc;

@@ -1194,7 +1196,8 @@ static int may_create_key(u32 ksid,

tsec = ctx->security;

- return avc_has_perm(tsec->sid, ksid, SECCLASS_KEY, KEY__CREATE, NULL);
+ return avc_has_perm(tsec->actor_sid, ksid, SECCLASS_KEY, KEY__CREATE,
+ NULL);
}

#define MAY_LINK 0
@@ -1222,7 +1225,7 @@ static int may_link(struct inode *dir,

av = DIR__SEARCH;
av |= (kind ? DIR__REMOVE_NAME : DIR__ADD_NAME);
- rc = avc_has_perm(tsec->sid, dsec->sid, SECCLASS_DIR, av, &ad);
+ rc = avc_has_perm(tsec->actor_sid, dsec->sid, SECCLASS_DIR, av, &ad);
if (rc)
return rc;

@@ -1241,7 +1244,7 @@ static int may_link(struct inode *dir,
return 0;
}

- rc = avc_has_perm(tsec->sid, isec->sid, isec->sclass, av, &ad);
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, av, &ad);
return rc;
}

@@ -1266,16 +1269,16 @@ static inline int may_rename(struct inod
AVC_AUDIT_DATA_INIT(&ad, FS);

ad.u.fs.dentry = old_dentry;
- rc = avc_has_perm(tsec->sid, old_dsec->sid, SECCLASS_DIR,
+ rc = avc_has_perm(tsec->actor_sid, old_dsec->sid, SECCLASS_DIR,
DIR__REMOVE_NAME | DIR__SEARCH, &ad);
if (rc)
return rc;
- rc = avc_has_perm(tsec->sid, old_isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, old_isec->sid,
old_isec->sclass, FILE__RENAME, &ad);
if (rc)
return rc;
if (old_is_dir && new_dir != old_dir) {
- rc = avc_has_perm(tsec->sid, old_isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, old_isec->sid,
old_isec->sclass, DIR__REPARENT, &ad);
if (rc)
return rc;
@@ -1285,15 +1288,17 @@ static inline int may_rename(struct inod
av = DIR__ADD_NAME | DIR__SEARCH;
if (new_dentry->d_inode)
av |= DIR__REMOVE_NAME;
- rc = avc_has_perm(tsec->sid, new_dsec->sid, SECCLASS_DIR, av, &ad);
+ rc = avc_has_perm(tsec->actor_sid, new_dsec->sid, SECCLASS_DIR, av,
+ &ad);
if (rc)
return rc;
if (new_dentry->d_inode) {
new_isec = new_dentry->d_inode->i_security;
new_is_dir = S_ISDIR(new_dentry->d_inode->i_mode);
- rc = avc_has_perm(tsec->sid, new_isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, new_isec->sid,
new_isec->sclass,
- (new_is_dir ? DIR__RMDIR : FILE__UNLINK), &ad);
+ (new_is_dir ? DIR__RMDIR : FILE__UNLINK),
+ &ad);
if (rc)
return rc;
}
@@ -1312,7 +1317,7 @@ static int superblock_has_perm(struct ta

tsec = tsk->security;
sbsec = sb->s_security;
- return avc_has_perm(tsec->sid, sbsec->sid, SECCLASS_FILESYSTEM,
+ return avc_has_perm(tsec->actor_sid, sbsec->sid, SECCLASS_FILESYSTEM,
perms, ad);
}

@@ -1376,7 +1381,7 @@ static int selinux_ptrace(struct task_st
rc = task_has_perm(parent, child, PROCESS__PTRACE);
/* Save the SID of the tracing process for later use in apply_creds. */
if (!(child->ptrace & PT_PTRACED) && !rc)
- csec->ptrace_sid = psec->sid;
+ csec->ptrace_sid = psec->actor_sid;
return rc;
}

@@ -1445,7 +1450,7 @@ static int selinux_sysctl(ctl_table *tab
/* The op values are "defined" in sysctl.c, thereby creating
* a bad coupling between this module and sysctl.c */
if(op == 001) {
- error = avc_has_perm(tsec->sid, tsid,
+ error = avc_has_perm(tsec->actor_sid, tsid,
SECCLASS_DIR, DIR__SEARCH, NULL);
} else {
av = 0;
@@ -1454,7 +1459,7 @@ static int selinux_sysctl(ctl_table *tab
if (op & 002)
av |= FILE__WRITE;
if (av)
- error = avc_has_perm(tsec->sid, tsid,
+ error = avc_has_perm(tsec->actor_sid, tsid,
SECCLASS_FILE, av, NULL);
}

@@ -1546,7 +1551,7 @@ static int selinux_vm_enough_memory(long

rc = secondary_ops->capable(current, CAP_SYS_ADMIN);
if (rc == 0)
- rc = avc_has_perm_noaudit(tsec->sid, tsec->sid,
+ rc = avc_has_perm_noaudit(tsec->actor_sid, tsec->actor_sid,
SECCLASS_CAPABILITY,
CAP_TO_MASK(CAP_SYS_ADMIN),
NULL);
@@ -1598,7 +1603,7 @@ static int selinux_bprm_set_security(str
isec = inode->i_security;

/* Default to the current task SID. */
- bsec->sid = tsec->sid;
+ bsec->sid = tsec->actor_sid;

/* Reset fs, key, and sock SIDs on execve. */
tsec->create_sid = 0;
@@ -1611,7 +1616,7 @@ static int selinux_bprm_set_security(str
tsec->exec_sid = 0;
} else {
/* Check for a default transition on this program. */
- rc = security_transition_sid(tsec->sid, isec->sid,
+ rc = security_transition_sid(tsec->actor_sid, isec->sid,
SECCLASS_PROCESS, &newsid);
if (rc)
return rc;
@@ -1622,16 +1627,16 @@ static int selinux_bprm_set_security(str
ad.u.fs.dentry = bprm->file->f_dentry;

if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
- newsid = tsec->sid;
+ newsid = tsec->actor_sid;

- if (tsec->sid == newsid) {
- rc = avc_has_perm(tsec->sid, isec->sid,
+ if (tsec->actor_sid == newsid) {
+ rc = avc_has_perm(tsec->actor_sid, isec->sid,
SECCLASS_FILE, FILE__EXECUTE_NO_TRANS, &ad);
if (rc)
return rc;
} else {
/* Check permissions for the transition. */
- rc = avc_has_perm(tsec->sid, newsid,
+ rc = avc_has_perm(tsec->actor_sid, newsid,
SECCLASS_PROCESS, PROCESS__TRANSITION, &ad);
if (rc)
return rc;
@@ -1810,6 +1815,8 @@ static void selinux_bprm_apply_creds(str
return;
}
}
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = sid;
tsec->sid = sid;
}
}
@@ -2084,7 +2091,7 @@ static int selinux_inode_init_security(s
if (tsec->create_sid && sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
newsid = tsec->create_sid;
} else {
- rc = security_transition_sid(tsec->sid, dsec->sid,
+ rc = security_transition_sid(tsec->actor_sid, dsec->sid,
inode_mode_to_security_class(inode->i_mode),
&newsid);
if (rc) {
@@ -2275,7 +2282,7 @@ static int selinux_inode_setxattr(struct
AVC_AUDIT_DATA_INIT(&ad,FS);
ad.u.fs.dentry = dentry;

- rc = avc_has_perm(tsec->sid, isec->sid, isec->sclass,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass,
FILE__RELABELFROM, &ad);
if (rc)
return rc;
@@ -2284,12 +2291,12 @@ static int selinux_inode_setxattr(struct
if (rc)
return rc;

- rc = avc_has_perm(tsec->sid, newsid, isec->sclass,
+ rc = avc_has_perm(tsec->actor_sid, newsid, isec->sclass,
FILE__RELABELTO, &ad);
if (rc)
return rc;

- rc = security_validate_transition(isec->sid, newsid, tsec->sid,
+ rc = security_validate_transition(isec->sid, newsid, tsec->actor_sid,
isec->sclass);
if (rc)
return rc;
@@ -2693,7 +2700,7 @@ static int selinux_task_alloc_security(s
tsec2 = tsk->security;

tsec2->osid = tsec1->osid;
- tsec2->sid = tsec1->sid;
+ tsec2->actor_sid = tsec2->sid = tsec1->sid;

/* Retain the exec, fs, key, and sock SIDs across fork */
tsec2->exec_sid = tsec1->exec_sid;
@@ -2872,6 +2879,8 @@ static void selinux_task_reparent_to_ini

tsec = p->security;
tsec->osid = tsec->sid;
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = SECINITSID_KERNEL;
tsec->sid = SECINITSID_KERNEL;
return;
}
@@ -3054,7 +3063,8 @@ static int socket_has_perm(struct task_s

AVC_AUDIT_DATA_INIT(&ad,NET);
ad.u.net.sk = sock->sk;
- err = avc_has_perm(tsec->sid, isec->sid, isec->sclass, perms, &ad);
+ err = avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, perms,
+ &ad);

out:
return err;
@@ -3071,8 +3081,8 @@ static int selinux_socket_create(int fam
goto out;

tsec = current->security;
- newsid = tsec->sockcreate_sid ? : tsec->sid;
- err = avc_has_perm(tsec->sid, newsid,
+ newsid = tsec->sockcreate_sid ? : tsec->actor_sid;
+ err = avc_has_perm(tsec->actor_sid, newsid,
socket_type_to_security_class(family, type,
protocol), SOCKET__CREATE, NULL);

@@ -3092,7 +3102,7 @@ static int selinux_socket_post_create(st
isec = SOCK_INODE(sock)->i_security;

tsec = current->security;
- newsid = tsec->sockcreate_sid ? : tsec->sid;
+ newsid = tsec->sockcreate_sid ? : tsec->actor_sid;
isec->sclass = socket_type_to_security_class(family, type, protocol);
isec->sid = kern ? SECINITSID_KERNEL : newsid;
isec->initialized = 1;
@@ -3904,7 +3914,7 @@ static int ipc_alloc_security(struct tas

isec->sclass = sclass;
isec->ipc_perm = perm;
- isec->sid = tsec->sid;
+ isec->sid = tsec->actor_sid;
perm->security = isec;

return 0;
@@ -3953,7 +3963,8 @@ static int ipc_has_perm(struct kern_ipc_
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = ipc_perms->key;

- return avc_has_perm(tsec->sid, isec->sid, isec->sclass, perms, &ad);
+ return avc_has_perm(tsec->actor_sid, isec->sid, isec->sclass, perms,
+ &ad);
}

static int selinux_msg_msg_alloc_security(struct msg_msg *msg)
@@ -3984,7 +3995,7 @@ static int selinux_msg_queue_alloc_secur
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = msq->q_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_MSGQ,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_MSGQ,
MSGQ__CREATE, &ad);
if (rc) {
ipc_free_security(&msq->q_perm);
@@ -4010,7 +4021,7 @@ static int selinux_msg_queue_associate(s
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = msq->q_perm.key;

- return avc_has_perm(tsec->sid, isec->sid, SECCLASS_MSGQ,
+ return avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_MSGQ,
MSGQ__ASSOCIATE, &ad);
}

@@ -4062,7 +4073,7 @@ static int selinux_msg_queue_msgsnd(stru
* Compute new sid based on current process and
* message queue this message will be stored in
*/
- rc = security_transition_sid(tsec->sid,
+ rc = security_transition_sid(tsec->actor_sid,
isec->sid,
SECCLASS_MSG,
&msec->sid);
@@ -4074,11 +4085,11 @@ static int selinux_msg_queue_msgsnd(stru
ad.u.ipc_id = msq->q_perm.key;

/* Can this process write to the queue? */
- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_MSGQ,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_MSGQ,
MSGQ__WRITE, &ad);
if (!rc)
/* Can this process send the message */
- rc = avc_has_perm(tsec->sid, msec->sid,
+ rc = avc_has_perm(tsec->actor_sid, msec->sid,
SECCLASS_MSG, MSG__SEND, &ad);
if (!rc)
/* Can the message be put in the queue? */
@@ -4105,10 +4116,10 @@ static int selinux_msg_queue_msgrcv(stru
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = msq->q_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid,
SECCLASS_MSGQ, MSGQ__READ, &ad);
if (!rc)
- rc = avc_has_perm(tsec->sid, msec->sid,
+ rc = avc_has_perm(tsec->actor_sid, msec->sid,
SECCLASS_MSG, MSG__RECEIVE, &ad);
return rc;
}
@@ -4131,7 +4142,7 @@ static int selinux_shm_alloc_security(st
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = shp->shm_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_SHM,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SHM,
SHM__CREATE, &ad);
if (rc) {
ipc_free_security(&shp->shm_perm);
@@ -4157,7 +4168,7 @@ static int selinux_shm_associate(struct
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = shp->shm_perm.key;

- return avc_has_perm(tsec->sid, isec->sid, SECCLASS_SHM,
+ return avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SHM,
SHM__ASSOCIATE, &ad);
}

@@ -4230,7 +4241,7 @@ static int selinux_sem_alloc_security(st
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = sma->sem_perm.key;

- rc = avc_has_perm(tsec->sid, isec->sid, SECCLASS_SEM,
+ rc = avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SEM,
SEM__CREATE, &ad);
if (rc) {
ipc_free_security(&sma->sem_perm);
@@ -4256,7 +4267,7 @@ static int selinux_sem_associate(struct
AVC_AUDIT_DATA_INIT(&ad, IPC);
ad.u.ipc_id = sma->sem_perm.key;

- return avc_has_perm(tsec->sid, isec->sid, SECCLASS_SEM,
+ return avc_has_perm(tsec->actor_sid, isec->sid, SECCLASS_SEM,
SEM__ASSOCIATE, &ad);
}

@@ -4500,14 +4511,19 @@ static int selinux_setprocattr(struct ta
error = avc_has_perm_noaudit(tsec->ptrace_sid, sid,
SECCLASS_PROCESS,
PROCESS__PTRACE, &avd);
- if (!error)
+ if (!error) {
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = sid;
tsec->sid = sid;
+ }
task_unlock(p);
avc_audit(tsec->ptrace_sid, sid, SECCLASS_PROCESS,
PROCESS__PTRACE, &avd, error, NULL);
if (error)
return error;
} else {
+ if (tsec->actor_sid == tsec->sid)
+ tsec->actor_sid = sid;
tsec->sid = sid;
task_unlock(p);
}
@@ -4545,6 +4561,24 @@ static u32 selinux_set_fscreate_secid(u3
return oldsid;
}

+static u32 selinux_act_as_secid(u32 secid)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldactor_sid = tsec->actor_sid;
+
+ tsec->actor_sid = secid;
+ return oldactor_sid;
+}
+
+static u32 selinux_act_as_self(void)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldactor_sid = tsec->actor_sid;
+
+ tsec->actor_sid = tsec->sid;
+ return oldactor_sid;
+}
+
#ifdef CONFIG_KEYS

static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4594,7 +4628,7 @@ static int selinux_key_permission(key_re
if (perm == 0)
return 0;

- return avc_has_perm(tsec->sid, ksec->sid,
+ return avc_has_perm(tsec->actor_sid, ksec->sid,
SECCLASS_KEY, perm, NULL);
}

@@ -4729,6 +4763,8 @@ static struct security_operations selinu
.release_secctx = selinux_release_secctx,
.get_fscreate_secid = selinux_get_fscreate_secid,
.set_fscreate_secid = selinux_set_fscreate_secid,
+ .act_as_secid = selinux_act_as_secid,
+ .act_as_self = selinux_act_as_self,

.unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send = selinux_socket_unix_may_send,
@@ -4794,7 +4830,7 @@ static __init int selinux_init(void)
if (task_alloc_security(current))
panic("SELinux: Failed to initialize initial task.\n");
tsec = current->security;
- tsec->osid = tsec->sid = SECINITSID_KERNEL;
+ tsec->osid = tsec->actor_sid = tsec->sid = SECINITSID_KERNEL;

sel_inode_cache = kmem_cache_create("selinux_inode_security",
sizeof(struct inode_security_struct),
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index ef2267f..4e8da30 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -30,6 +30,7 @@ struct task_security_struct {
struct task_struct *task; /* back pointer to task object */
u32 osid; /* SID prior to last execve */
u32 sid; /* current SID */
+ u32 actor_sid; /* act-as SID (normally == sid) */
u32 exec_sid; /* exec SID */
u32 create_sid; /* fscreate SID */
u32 keycreate_sid; /* keycreate SID */
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index cd24441..c676b4b 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -79,7 +79,7 @@ static int task_has_security(struct task
if (!tsec)
return -EACCES;

- return avc_has_perm(tsec->sid, SECINITSID_SECURITY,
+ return avc_has_perm(tsec->actor_sid, SECINITSID_SECURITY,
SECCLASS_SECURITY, perms, NULL);
}

diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index 675b995..4ea1f58 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -270,7 +270,7 @@ static int selinux_xfrm_sec_ctx_alloc(st
/*
* Does the subject have permission to set security context?
*/
- rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
+ rc = avc_has_perm(tsec->actor_sid, ctx->ctx_sid,
SECCLASS_ASSOCIATION,
ASSOCIATION__SETCONTEXT, NULL);
if (rc)
@@ -387,7 +387,7 @@ int selinux_xfrm_policy_delete(struct xf
int rc = 0;

if (ctx)
- rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
+ rc = avc_has_perm(tsec->actor_sid, ctx->ctx_sid,
SECCLASS_ASSOCIATION,
ASSOCIATION__SETCONTEXT, NULL);

@@ -497,7 +497,7 @@ int selinux_xfrm_state_delete(struct xfr
int rc = 0;

if (ctx)
- rc = avc_has_perm(tsec->sid, ctx->ctx_sid,
+ rc = avc_has_perm(tsec->actor_sid, ctx->ctx_sid,
SECCLASS_ASSOCIATION,
ASSOCIATION__SETCONTEXT, NULL);

2006-11-14 20:12:58

by David Howells

[permalink] [raw]
Subject: [PATCH 18/19] CacheFiles: Use VFS lookup services

Make CacheFiles use the VFS's lookup services for each step of path resolution
rather than doing the hashing, dcache lookup and calling the inode lookup op
itself.

This is possible now that CacheFiles can temporarily override the security
context set by SELinux.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-namei.c | 224 +++++++---------------------------------------
1 files changed, 34 insertions(+), 190 deletions(-)

diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
index 5508fa2..a3df94a 100644
--- a/fs/cachefiles/cf-namei.c
+++ b/fs/cachefiles/cf-namei.c
@@ -70,8 +70,7 @@ static int cachefiles_bury_object(struct
struct dentry *dir,
struct dentry *rep)
{
- struct dentry *grave, *alt, *trap;
- struct qstr name;
+ struct dentry *grave, *trap;
char nbuffer[8 + 8 + 1];
int ret;

@@ -103,23 +102,6 @@ try_again:
(uint32_t) xtime.tv_sec,
(uint32_t) atomic_inc_return(&cache->gravecounter));

- name.name = nbuffer;
- name.len = strlen(name.name);
-
- /* hash the name */
- name.hash = full_name_hash(name.name, name.len);
-
- if (dir->d_op && dir->d_op->d_hash) {
- ret = dir->d_op->d_hash(dir, &name);
- if (ret < 0) {
- if (ret == -EIO)
- cachefiles_io_error(cache, "Hash failed");
-
- _leave(" = %d", ret);
- return ret;
- }
- }
-
/* do the multiway lock magic */
trap = lock_rename(cache->graveyard, dir);

@@ -150,38 +132,18 @@ try_again:
return -EIO;
}

- /* see if there's a dentry already there for this name */
- grave = d_lookup(cache->graveyard, &name);
- if (!grave) {
- _debug("not found");
+ grave = lookup_one_len(nbuffer, cache->graveyard, strlen(nbuffer));
+ if (IS_ERR(grave)) {
+ unlock_rename(cache->graveyard, dir);

- grave = d_alloc(cache->graveyard, &name);
- if (!grave) {
- unlock_rename(cache->graveyard, dir);
+ if (PTR_ERR(grave) == -ENOMEM) {
_leave(" = -ENOMEM");
return -ENOMEM;
}

- alt = cache->graveyard->d_inode->i_op->lookup(
- cache->graveyard->d_inode, grave, NULL);
- if (IS_ERR(alt)) {
- unlock_rename(cache->graveyard, dir);
- dput(grave);
-
- if (PTR_ERR(alt) == -ENOMEM) {
- _leave(" = -ENOMEM");
- return -ENOMEM;
- }
-
- cachefiles_io_error(cache, "Lookup error %ld",
- PTR_ERR(alt));
- return -EIO;
- }
-
- if (alt) {
- dput(grave);
- grave = alt;
- }
+ cachefiles_io_error(cache, "Lookup error %ld",
+ PTR_ERR(grave));
+ return -EIO;
}

if (grave->d_inode) {
@@ -253,9 +215,9 @@ int cachefiles_walk_to_object(struct cac
struct cachefiles_xattr *auxdata)
{
struct cachefiles_cache *cache;
- struct dentry *dir, *next = NULL, *new;
- struct qstr name;
- int ret;
+ struct dentry *dir, *next = NULL;
+ char *name;
+ int ret, nlen;

_enter("{%p}", parent->dentry);

@@ -275,69 +237,24 @@ int cachefiles_walk_to_object(struct cac

advance:
/* attempt to transit the first directory component */
- name.name = key;
+ name = key;
key = strchr(key, '/');
if (key) {
- name.len = key - (char *) name.name;
+ nlen = key - name;
*key++ = 0;
} else {
- name.len = strlen(name.name);
- }
-
- /* hash the name */
- name.hash = full_name_hash(name.name, name.len);
-
- if (dir->d_op && dir->d_op->d_hash) {
- ret = dir->d_op->d_hash(dir, &name);
- if (ret < 0) {
- cachefiles_io_error(cache, "Hash failed");
- goto error_out2;
- }
+ nlen = strlen(name);
}

lookup_again:
/* search the current directory for the element name */
- _debug("lookup '%s' %x", name.name, name.hash);
+ _debug("lookup '%s'", name);

mutex_lock(&dir->d_inode->i_mutex);

- next = d_lookup(dir, &name);
- if (!next) {
- _debug("not found");
-
- new = d_alloc(dir, &name);
- if (!new)
- goto nomem_d_alloc;
-
- ASSERT(dir->d_inode->i_op);
- ASSERT(dir->d_inode->i_op->lookup);
-
- next = dir->d_inode->i_op->lookup(dir->d_inode, new, NULL);
- if (IS_ERR(next))
- goto lookup_error;
-
- if (!next)
- next = new;
- else
- dput(new);
-
- if (next->d_inode) {
- ret = -EPERM;
- if (!next->d_inode->i_op ||
- !next->d_inode->i_op->setxattr ||
- !next->d_inode->i_op->getxattr ||
- !next->d_inode->i_op->removexattr)
- goto error;
-
- if (key && (!next->d_inode->i_op->lookup ||
- !next->d_inode->i_op->mkdir ||
- !next->d_inode->i_op->create ||
- !next->d_inode->i_op->rename ||
- !next->d_inode->i_op->rmdir ||
- !next->d_inode->i_op->unlink))
- goto error;
- }
- }
+ next = lookup_one_len(name, dir, nlen);
+ if (IS_ERR(next))
+ goto lookup_error;

_debug("next -> %p %s", next, next->d_inode ? "positive" : "negative");

@@ -496,15 +413,10 @@ delete_error:

lookup_error:
_debug("lookup error %ld", PTR_ERR(next));
- dput(new);
ret = PTR_ERR(next);
if (ret == -EIO)
cachefiles_io_error(cache, "Lookup failed");
next = NULL;
- goto error;
-
-nomem_d_alloc:
- ret = -ENOMEM;
error:
mutex_unlock(&dir->d_inode->i_mutex);
dput(next);
@@ -525,48 +437,19 @@ struct dentry *cachefiles_get_directory(
struct dentry *dir,
const char *dirname)
{
- struct dentry *subdir, *new;
- struct qstr name;
+ struct dentry *subdir;
int ret;

- _enter("");
-
- /* set up the name */
- name.name = dirname;
- name.len = strlen(dirname);
- name.hash = full_name_hash(name.name, name.len);
-
- if (dir->d_op && dir->d_op->d_hash) {
- ret = dir->d_op->d_hash(dir, &name);
- if (ret < 0) {
- if (ret == -EIO)
- kerror("Hash failed");
- _leave(" = %d", ret);
- return ERR_PTR(ret);
- }
- }
+ _enter(",,%s", dirname);

/* search the current directory for the element name */
- _debug("lookup '%s' %x", name.name, name.hash);
-
mutex_lock(&dir->d_inode->i_mutex);

- subdir = d_lookup(dir, &name);
- if (!subdir) {
- _debug("not found");
-
- new = d_alloc(dir, &name);
- if (!new)
+ subdir = lookup_one_len(dirname, dir, strlen(dirname));
+ if (IS_ERR(subdir)) {
+ if (PTR_ERR(subdir) == -ENOMEM)
goto nomem_d_alloc;
-
- subdir = dir->d_inode->i_op->lookup(dir->d_inode, new, NULL);
- if (IS_ERR(subdir))
- goto lookup_error;
-
- if (!subdir)
- subdir = new;
- else
- dput(new);
+ goto lookup_error;
}

_debug("subdir -> %p %s",
@@ -578,6 +461,8 @@ struct dentry *cachefiles_get_directory(
if (ret < 0)
goto mkdir_error;

+ _debug("attempt mkdir");
+
ret = vfs_mkdir(dir->d_inode, subdir, 0700);
if (ret < 0)
goto mkdir_error;
@@ -625,23 +510,18 @@ mkdir_error:
mutex_unlock(&dir->d_inode->i_mutex);
dput(subdir);
kerror("mkdir %s failed with error %d", dirname, ret);
- goto error_out;
+ return ERR_PTR(ret);

lookup_error:
mutex_unlock(&dir->d_inode->i_mutex);
- dput(new);
ret = PTR_ERR(subdir);
kerror("Lookup %s failed with error %d", dirname, ret);
- goto error_out;
+ return ERR_PTR(ret);

nomem_d_alloc:
mutex_unlock(&dir->d_inode->i_mutex);
- ret = -ENOMEM;
- goto error_out;
-
-error_out:
- _leave(" = %d", ret);
- return ERR_PTR(ret);
+ _leave(" = -ENOMEM");
+ return ERR_PTR(-ENOMEM);
}

/*
@@ -653,48 +533,18 @@ int cachefiles_cull(struct cachefiles_ca
{
struct cachefiles_object *object;
struct rb_node *_n;
- struct dentry *victim, *new;
- struct qstr name;
+ struct dentry *victim;
int ret;

_enter(",%*.*s/,%s",
dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);

- /* set up the name */
- name.name = filename;
- name.len = strlen(filename);
- name.hash = full_name_hash(name.name, name.len);
-
- if (dir->d_op && dir->d_op->d_hash) {
- ret = dir->d_op->d_hash(dir, &name);
- if (ret < 0) {
- if (ret == -EIO)
- cachefiles_io_error(cache, "Hash failed");
- _leave(" = %d", ret);
- return ret;
- }
- }
-
/* look up the victim */
mutex_lock(&dir->d_inode->i_mutex);

- victim = d_lookup(dir, &name);
- if (!victim) {
- _debug("not found");
-
- new = d_alloc(dir, &name);
- if (!new)
- goto nomem_d_alloc;
-
- victim = dir->d_inode->i_op->lookup(dir->d_inode, new, NULL);
- if (IS_ERR(victim))
- goto lookup_error;
-
- if (!victim)
- victim = new;
- else
- dput(new);
- }
+ victim = lookup_one_len(filename, dir, strlen(filename));
+ if (IS_ERR(victim))
+ goto lookup_error;

_debug("victim -> %p %s",
victim, victim->d_inode ? "positive" : "negative");
@@ -755,14 +605,8 @@ object_in_use:
_leave(" = -EBUSY [in use]");
return -EBUSY;

-nomem_d_alloc:
- mutex_unlock(&dir->d_inode->i_mutex);
- _leave(" = -ENOMEM");
- return -ENOMEM;
-
lookup_error:
mutex_unlock(&dir->d_inode->i_mutex);
- dput(new);
ret = PTR_ERR(victim);
if (ret == -EIO)
cachefiles_io_error(cache, "Lookup failed");

2006-11-14 21:19:51

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

On Tue, 14 Nov 2006, David Howells wrote:

> +static u32 selinux_set_fscreate_secid(u32 secid)
> +{
> + struct task_security_struct *tsec = current->security;
> + u32 oldsid = tsec->create_sid;
> +
> + tsec->create_sid = secid;
> + return oldsid;
> +}

The ability to set this needs to be mediated via MAC policy.

See selinux_setprocattr()



- James
--
James Morris
<[email protected]>

2006-11-14 21:27:07

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 16/19] CacheFiles: Deal with LSM when accessing the cache

On Tue, 14 Nov 2006, David Howells wrote:

> @@ -80,6 +81,8 @@ struct cachefiles_cache {
> struct rb_root active_nodes; /* active nodes (can't be culled) */
> rwlock_t active_lock; /* lock for active_nodes */
> atomic_t gravecounter; /* graveyard uniquifier */
> + u32 access_sid; /* cache access SID */
> + u32 cache_sid; /* cache fs object SID */

Please uniformly name these security IDs 'secids' in the main kernel, to
avoid confusion with session IDs.



- James
--
James Morris
<[email protected]>

2006-11-15 10:13:22

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 20/19] CacheFiles: Use secid not sid lest confusion arise with session IDs


Use "secid" not "sid" to refer to security IDs lest confusion arise with
session IDs. Also condense the saved security state into a single structure.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-bind.c | 10 ++++------
fs/cachefiles/cf-daemon.c | 16 ++++++----------
fs/cachefiles/cf-interface.c | 40 +++++++++++++++-------------------------
fs/cachefiles/cf-security.c | 26 +++++++++++++-------------
fs/cachefiles/internal.h | 36 +++++++++++++++++++++++-------------
5 files changed, 61 insertions(+), 67 deletions(-)

diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
index 1d1fd14..3daf140 100644
--- a/fs/cachefiles/cf-bind.c
+++ b/fs/cachefiles/cf-bind.c
@@ -85,13 +85,11 @@ int cachefiles_daemon_bind(struct cachef
*/
static int cachefiles_daemon_add_cache(struct cachefiles_cache *cache)
{
+ struct cachefiles_secctx secctx;
struct cachefiles_object *fsdef;
struct nameidata nd;
struct kstatfs stats;
struct dentry *graveyard, *cachedir, *root;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;
int ret;

_enter("");
@@ -101,7 +99,7 @@ static int cachefiles_daemon_add_cache(s
if (ret < 0)
return ret;

- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);

/* allocate the root index object */
ret = -ENOMEM;
@@ -240,7 +238,7 @@ static int cachefiles_daemon_add_cache(s

/* check how much space the cache has */
cachefiles_has_space(cache, 0, 0);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);
return 0;

error_add_cache:
@@ -255,7 +253,7 @@ error_unsupported:
error_open_root:
kmem_cache_free(cachefiles_object_jar, fsdef);
error_root_object:
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);
kerror("Failed to register: %d", ret);
return ret;
}
diff --git a/fs/cachefiles/cf-daemon.c b/fs/cachefiles/cf-daemon.c
index ee07865..86cf23b 100644
--- a/fs/cachefiles/cf-daemon.c
+++ b/fs/cachefiles/cf-daemon.c
@@ -517,11 +517,9 @@ static int cachefiles_daemon_tag(struct
*/
static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args)
{
+ struct cachefiles_secctx secctx;
struct dentry *dir;
struct file *dirfile;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;
int dirfd, fput_needed, ret;

_enter(",%s", args);
@@ -564,9 +562,9 @@ static int cachefiles_daemon_cull(struct
if (!S_ISDIR(dir->d_inode->i_mode))
goto notdir;

- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
ret = cachefiles_cull(cache, dir, args);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);

dput(dir);
_leave(" = %d", ret);
@@ -611,11 +609,9 @@ inval:
*/
static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
{
+ struct cachefiles_secctx secctx;
struct dentry *dir;
struct file *dirfile;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;
int dirfd, fput_needed, ret;

_enter(",%s", args);
@@ -658,9 +654,9 @@ static int cachefiles_daemon_inuse(struc
if (!S_ISDIR(dir->d_inode->i_mode))
goto notdir;

- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
ret = cachefiles_check_in_use(cache, dir, args);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);

dput(dir);
_leave(" = %d", ret);
diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
index 7a3d085..e96e63a 100644
--- a/fs/cachefiles/cf-interface.c
+++ b/fs/cachefiles/cf-interface.c
@@ -29,15 +29,13 @@ static struct fscache_object *cachefiles
struct fscache_object *_parent,
struct fscache_cookie *cookie)
{
+ struct cachefiles_secctx secctx;
struct cachefiles_object *parent, *object;
struct cachefiles_cache *cache;
struct cachefiles_xattr *auxdata;
unsigned keylen, auxlen;
- uid_t fsuid;
- gid_t fsgid;
void *buffer;
char *key;
- u32 fscreatesid;
int ret;

ASSERT(_parent);
@@ -95,9 +93,9 @@ static struct fscache_object *cachefiles
auxdata->type = cookie->def->type;

/* look up the key, creating any missing bits */
- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
ret = cachefiles_walk_to_object(parent, object, key, auxdata);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);
if (ret < 0)
goto lookup_failed;

@@ -179,20 +177,18 @@ static void cachefiles_unlock_object(str
*/
static void cachefiles_update_object(struct fscache_object *_object)
{
+ struct cachefiles_secctx secctx;
struct cachefiles_object *object;
struct cachefiles_cache *cache;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;

_enter("%p", _object);

object = container_of(_object, struct cachefiles_object, fscache);
cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);

- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
//cachefiles_tree_update_object(super, object);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);
}

/*
@@ -200,11 +196,9 @@ static void cachefiles_update_object(str
*/
static void cachefiles_put_object(struct fscache_object *_object)
{
+ struct cachefiles_secctx secctx;
struct cachefiles_object *object;
struct cachefiles_cache *cache;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;

ASSERT(_object);

@@ -230,9 +224,9 @@ #endif
_object != cache->cache.fsdef
) {
_debug("- retire object %p", object);
- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
cachefiles_delete_object(cache, object);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);
}

/* close the filesystem stuff attached to the object */
@@ -265,10 +259,8 @@ #endif
*/
static void cachefiles_sync_cache(struct fscache_cache *_cache)
{
+ struct cachefiles_secctx secctx;
struct cachefiles_cache *cache;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;
int ret;

_enter("%p", _cache);
@@ -277,9 +269,9 @@ static void cachefiles_sync_cache(struct

/* make sure all pages pinned by operations on behalf of the netfs are
* written to disc */
- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
ret = fsync_super(cache->mnt->mnt_sb);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);

if (ret == -EIO)
cachefiles_io_error(cache,
@@ -293,12 +285,10 @@ static void cachefiles_sync_cache(struct
*/
static int cachefiles_set_i_size(struct fscache_object *_object, loff_t i_size)
{
+ struct cachefiles_secctx secctx;
struct cachefiles_object *object;
struct cachefiles_cache *cache;
struct iattr newattrs;
- uid_t fsuid;
- gid_t fsgid;
- u32 fscreatesid;
int ret;

_enter("%p,%llu", _object, i_size);
@@ -318,11 +308,11 @@ static int cachefiles_set_i_size(struct
newattrs.ia_size = i_size;
newattrs.ia_valid = ATTR_SIZE;

- cachefiles_begin_secure(cache, &fsuid, &fsgid, &fscreatesid);
+ cachefiles_begin_secure(cache, &secctx);
mutex_lock(&object->backer->d_inode->i_mutex);
ret = notify_change(object->backer, &newattrs);
mutex_unlock(&object->backer->d_inode->i_mutex);
- cachefiles_end_secure(cache, fsuid, fsgid, fscreatesid);
+ cachefiles_end_secure(cache, &secctx);

if (ret == -EIO) {
cachefiles_io_error_obj(object, "Size set failed");
diff --git a/fs/cachefiles/cf-security.c b/fs/cachefiles/cf-security.c
index d7c1473..c142172 100644
--- a/fs/cachefiles/cf-security.c
+++ b/fs/cachefiles/cf-security.c
@@ -19,36 +19,36 @@ #include "internal.h"
int cachefiles_get_security_ID(struct cachefiles_cache *cache)
{
char *seclabel;
- u32 seclen, daemon_sid;
+ u32 seclen, daemon_secid;
int ret;

_enter("");

- cache->access_sid = 0;
+ cache->access_secid = 0;

/* ask the security policy to tell us what security ID we should be
* using to access the cache, given the security ID that our daemon is
* using */
- security_task_getsecid(current, &daemon_sid);
+ security_task_getsecid(current, &daemon_secid);

- ret = security_secid_to_secctx(daemon_sid, &seclabel, &seclen);
+ ret = security_secid_to_secctx(daemon_secid, &seclabel, &seclen);
if (ret < 0)
goto error;
- _debug("Cache Daemon SID: %x '%s'", daemon_sid, seclabel);
+ _debug("Cache Daemon SecID: %x '%s'", daemon_secid, seclabel);
kfree(seclabel);

- ret = security_cachefiles_get_secid(daemon_sid, &cache->access_sid);
+ ret = security_cachefiles_get_secid(daemon_secid, &cache->access_secid);
if (ret < 0) {
printk(KERN_ERR "CacheFiles:"
- " Security can't provide module SID: error %d",
+ " Security can't provide module SecID: error %d",
ret);
goto error;
}

- ret = security_secid_to_secctx(cache->access_sid, &seclabel, &seclen);
+ ret = security_secid_to_secctx(cache->access_secid, &seclabel, &seclen);
if (ret < 0)
goto error;
- _debug("Cache Module SID: %x '%s'", cache->access_sid, seclabel);
+ _debug("Cache Module SecID: %x '%s'", cache->access_secid, seclabel);
kfree(seclabel);

error:
@@ -71,14 +71,14 @@ int cachefiles_check_security(struct cac

_enter("");

- /* use the cache root dir's security ID as the SID with which to create
+ /* use the cache root dir's security ID as the SECID with which to create
* files */
- cache->cache_sid = security_inode_get_secid(root->d_inode);
+ cache->cache_secid = security_inode_get_secid(root->d_inode);

- ret = security_secid_to_secctx(cache->cache_sid, &seclabel, &seclen);
+ ret = security_secid_to_secctx(cache->cache_secid, &seclabel, &seclen);
if (ret < 0)
goto error;
- _debug("Cache SID: %x '%s'", cache->cache_sid, seclabel);
+ _debug("Cache SecID: %x '%s'", cache->cache_secid, seclabel);
kfree(seclabel);

/* check that we have permission to create files and directories with
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 1b7ada2..90590de 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -81,8 +81,8 @@ struct cachefiles_cache {
struct rb_root active_nodes; /* active nodes (can't be culled) */
rwlock_t active_lock; /* lock for active_nodes */
atomic_t gravecounter; /* graveyard uniquifier */
- u32 access_sid; /* cache access SID */
- u32 cache_sid; /* cache fs object SID */
+ u32 access_secid; /* cache access security ID */
+ u32 cache_secid; /* cache fs object security ID */
unsigned frun_percent; /* when to stop culling (% files) */
unsigned fcull_percent; /* when to start culling (% files) */
unsigned fstop_percent; /* when to stop allocating (% files) */
@@ -198,26 +198,36 @@ #define cachefiles_get_security_ID(cache
#define cachefiles_check_security(cache, root) (0)
#endif

+struct cachefiles_secctx {
+ uid_t fsuid; /* save for current->fsuid */
+ gid_t fsgid; /* save for current->fsgid */
+#ifdef CONFIG_SECURITY
+ u32 fscreate_secid; /* save for current fscreate security ID */
+#endif
+};
+
static inline void cachefiles_begin_secure(struct cachefiles_cache *cache,
- uid_t *fsuid, gid_t *fsgid,
- u32 *fscreatesid)
+ struct cachefiles_secctx *ctx)
{
- security_act_as_secid(cache->access_sid);
- *fscreatesid = security_set_fscreate_secid(cache->cache_sid);
- *fsuid = current->fsuid;
- *fsgid = current->fsgid;
+#ifdef CONFIG_SECURITY
+ security_act_as_secid(cache->access_secid);
+ ctx->fscreate_secid = security_set_fscreate_secid(cache->cache_secid);
+#endif
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
current->fsuid = 0;
current->fsgid = 0;
}

static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
- uid_t fsuid, gid_t fsgid,
- u32 fscreatesid)
+ const struct cachefiles_secctx *ctx)
{
- current->fsuid = fsuid;
- current->fsgid = fsgid;
- security_set_fscreate_secid(fscreatesid);
+ current->fsuid = ctx->fsuid;
+ current->fsgid = ctx->fsgid;
+#ifdef CONFIG_SECURITY
+ security_set_fscreate_secid(ctx->fscreate_secid);
security_act_as_self();
+#endif
}

/*

2006-11-15 11:23:36

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 06/19] FS-Cache: NFS: Only obtain cache cookies on file open, not on inode read



David Howells wrote:
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 5ead2bf..b2e5e86 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -205,6 +205,7 @@ #define NFS_INO_REVALIDATING (0) /* rev
> #define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */
> #define NFS_INO_STALE (2) /* possible stale inode */
> #define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */
> +#define NFS_INO_CACHEABLE (4) /* inode can be cached by FS-Cache */
>
> static inline struct nfs_inode *NFS_I(struct inode *inode)
> {
> @@ -230,6 +231,7 @@ #define NFS_ATTRTIMEO_UPDATE(inode) (NFS
>
> #define NFS_FLAGS(inode) (NFS_I(inode)->flags)
> #define NFS_STALE(inode) (test_bit(NFS_INO_STALE, &NFS_FLAGS(inode)))
> +#define NFS_CACHEABLE(inode) (test_bit(NFS_INO_CACHEABLE, &NFS_FLAGS(inode)))
A small nit..

To stay with the coding style NFS uses, could you please changes
these variables to:

+#define NFS_INO_FSCACHE (4) /* inode can be cached by FS-Cache */
and
+#define NFS_FSCACHE(inode) (test_bit(NFS_INO_FSCACHE, &NFS_FLAGS(inode))


steved.

2006-11-15 12:28:49

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

James Morris <[email protected]> wrote:

> > +static u32 selinux_set_fscreate_secid(u32 secid)
> ...
> The ability to set this needs to be mediated via MAC policy.

There could a problem with that... Is it possible for there to be a race? I
have to call the function twice per cache op: once to set the file creation
security ID and once to restore it back to what it was.

However, what happens if I can't restore the original security ID (perhaps the
rules changed between the two invocations)? I can't let the task continue as
it's now running with the wrong security...

David

2006-11-15 12:39:06

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH 05/19] NFS: Use local caching

David Howells wrote:
> The attached patch makes it possible for the NFS filesystem to make use of the
> network filesystem local caching service (FS-Cache).
>
> To be able to use this, an updated mount program is required. This can be
> obtained from:
>
> http://people.redhat.com/steved/cachefs/util-linux/
>
> To mount an NFS filesystem to use caching, add an "fsc" option to the mount:
>
> mount warthog:/ /a -o fsc
Note: the nfs mounting code has recently moved from util-linux
into nfs-utils but the functionality is off by default (hopefully that
will change soon). In Fedora Core 6 we've decided to go ahead and
turn on the mount code which in turned allowed us to added
the '-o fsc' mounting flag. So with FC6, there is no need
to download a modified util-linux.

> +static inline void nfs_fscache_get_fh_cookie(struct inode *inode, int aycache) {}
> +static inline void nfs_fscache_release_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_zap_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_renew_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}

To create a cleaner and more scalable "cookie" interface into NFS,
I suggest that we remove the type of cookie from the name of the
cookie routines (meaning remove the _fh_ from the names) and bury
that type of information in the actual cookie routines. The last
thing the NFS code should care about is the type of cookie it needs
to use.. those decisions should exist in the cookie routines, not
in the mainline code... imho...

So resulting in routines would like:

static inline void nfs_fscache_get_cookie(struct inode *inode) {}
static inline void nfs_fscache_release_cookie(struct inode *inode) {}
static inline void nfs_fscache_zap_cookie(struct inode *inode) {}
static inline void nfs_fscache_renew_cookie(struct inode *inode) {}
static inline void nfs_fscache_disable_cookie(struct inode *inode) {}

Then instead of just having a fscache_cookie hang off the NFS inode,
have a pointer to a nfs_fscache_cookie structure:

struct nfs_fscache_cookie {
int type; /* the type cookie: FILE, READDIR, XATTR, etc */
ulong flags; /* Doesn't all interfaces need flags :-) */
void *cookie; /* the actual cookie */
};

Using an interface like this, would allow all the ugly cookie processing
to stay far away from the mainline NFS code, also makes the interface
into NFS much cleaner, simpler and scalable. Finally all changes (i.e.
adding another cookie type) would be isolated away from the mainline
code and confined to a couple files.

Comments?

steved.

2006-11-15 13:19:33

by David Howells

[permalink] [raw]
Subject: [PATCH 21/19] CacheFiles: Set the file creation security ID whilst binding the cache


Set the file create security ID whilst binding a cache. Currently, though the
cachefiles_daemon_add_cache() functions calls cachefiles_begin_secure(), at
that point the file creation security ID is not known.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-bind.c | 10 ++++++----
fs/cachefiles/cf-security.c | 4 ++--
fs/cachefiles/internal.h | 25 ++++++++++++++++++++++---
3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
index 3daf140..0c055a9 100644
--- a/fs/cachefiles/cf-bind.c
+++ b/fs/cachefiles/cf-bind.c
@@ -99,7 +99,7 @@ static int cachefiles_daemon_add_cache(s
if (ret < 0)
return ret;

- cachefiles_begin_secure(cache, &secctx);
+ cachefiles_begin_secure_nofs(cache, &secctx);

/* allocate the root index object */
ret = -ENOMEM;
@@ -142,12 +142,14 @@ static int cachefiles_daemon_add_cache(s
if (root->d_sb->s_flags & MS_RDONLY)
goto error_unsupported;

- /* determine the security context within which we access the cache from
- * within the kernel */
- ret = cachefiles_check_security(cache, root);
+ /* determine the security of the on-disk cache as this governs
+ * security ID of files we create */
+ ret = cachefiles_determine_cache_secid(cache, root);
if (ret < 0)
goto error_unsupported;

+ cachefiles_set_fscreate_secid(cache);
+
/* get the cache size and blocksize */
ret = vfs_statfs(root, &stats);
if (ret < 0)
diff --git a/fs/cachefiles/cf-security.c b/fs/cachefiles/cf-security.c
index c142172..e070bb3 100644
--- a/fs/cachefiles/cf-security.c
+++ b/fs/cachefiles/cf-security.c
@@ -62,8 +62,8 @@ error:
* check the security details of the on-disk cache
* - must be called with security imposed
*/
-int cachefiles_check_security(struct cachefiles_cache *cache,
- struct dentry *root)
+int cachefiles_determine_cache_secid(struct cachefiles_cache *cache,
+ struct dentry *root)
{
char *seclabel;
u32 seclen;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 90590de..4715de5 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -191,11 +191,17 @@ extern int cachefiles_check_in_use(struc
*/
#ifdef CONFIG_SECURITY
extern int cachefiles_get_security_ID(struct cachefiles_cache *cache);
-extern int cachefiles_check_security(struct cachefiles_cache *cache,
- struct dentry *root);
+extern int cachefiles_determine_cache_secid(struct cachefiles_cache *cache,
+ struct dentry *root);
+static inline
+void cachefiles_set_fscreate_secid(struct cachefiles_cache *cache)
+{
+ security_set_fscreate_secid(cache->cache_secid);
+}
#else
#define cachefiles_get_security_ID(cache) (0)
-#define cachefiles_check_security(cache, root) (0)
+#define cachefiles_determine_cache_secid(cache, root) (0)
+#define cachefiles_set_fscreate_secid(cache) do {} while(0)
#endif

struct cachefiles_secctx {
@@ -206,6 +212,19 @@ #ifdef CONFIG_SECURITY
#endif
};

+static inline void cachefiles_begin_secure_nofs(struct cachefiles_cache *cache,
+ struct cachefiles_secctx *ctx)
+{
+#ifdef CONFIG_SECURITY
+ security_act_as_secid(cache->access_secid);
+ ctx->fscreate_secid = security_get_fscreate_secid();
+#endif
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+}
+
static inline void cachefiles_begin_secure(struct cachefiles_cache *cache,
struct cachefiles_secctx *ctx)
{

2006-11-15 13:25:48

by David Howells

[permalink] [raw]
Subject: [PATCH 22/19] FS-Cache: NFS: Rename NFS_INO_CACHEABLE


Rename NFS_INO_CACHEABLE and NFS_CACHEABLE to be NFS_INO_FSCACHE and
NFS_FSCACHE.

Signed-Off-By: David Howells <[email protected]>
---

fs/nfs/fscache.h | 8 ++++----
include/linux/nfs_fs.h | 4 ++--
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 0be6ffe..b82b896 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -94,7 +94,7 @@ static inline void nfs_fscache_init_fh_c
{
NFS_I(inode)->fscache = NULL;
if (S_ISREG(inode->i_mode))
- set_bit(NFS_INO_CACHEABLE, &NFS_I(inode)->flags);
+ set_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);
}

/*
@@ -105,7 +105,7 @@ static inline void nfs_fscache_enable_fh
struct super_block *sb = inode->i_sb;
struct nfs_inode *nfsi = NFS_I(inode);

- if (nfsi->fscache || !NFS_CACHEABLE(inode))
+ if (nfsi->fscache || !NFS_FSCACHE(inode))
return;

if ((NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
@@ -190,7 +190,7 @@ static inline void nfs_fscache_zap_fh_co
*/
static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
{
- clear_bit(NFS_INO_CACHEABLE, &NFS_I(inode)->flags);
+ clear_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);

if (NFS_I(inode)->fscache) {
dfprintk(FSCACHE,
@@ -214,7 +214,7 @@ static inline void nfs_fscache_disable_f
static inline void nfs_fscache_set_fh_cookie(struct inode *inode,
struct file *filp)
{
- if (NFS_CACHEABLE(inode)) {
+ if (NFS_FSCACHE(inode)) {
if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
nfs_fscache_disable_fh_cookie(inode);
else
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index b2e5e86..59e433f 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -205,7 +205,7 @@ #define NFS_INO_REVALIDATING (0) /* rev
#define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */
#define NFS_INO_STALE (2) /* possible stale inode */
#define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */
-#define NFS_INO_CACHEABLE (4) /* inode can be cached by FS-Cache */
+#define NFS_INO_FSCACHE (4) /* inode can be cached by FS-Cache */

static inline struct nfs_inode *NFS_I(struct inode *inode)
{
@@ -231,7 +231,7 @@ #define NFS_ATTRTIMEO_UPDATE(inode) (NFS

#define NFS_FLAGS(inode) (NFS_I(inode)->flags)
#define NFS_STALE(inode) (test_bit(NFS_INO_STALE, &NFS_FLAGS(inode)))
-#define NFS_CACHEABLE(inode) (test_bit(NFS_INO_CACHEABLE, &NFS_FLAGS(inode)))
+#define NFS_FSCACHE(inode) (test_bit(NFS_INO_FSCACHE, &NFS_FLAGS(inode)))

#define NFS_FILEID(inode) (NFS_I(inode)->fileid)

2006-11-15 13:53:13

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

James Morris <[email protected]> wrote:

> The ability to set this needs to be mediated via MAC policy.

Something like this, you mean?

David
---

security/selinux/hooks.c | 12 +++-
security/dummy.c | 2 -
include/linux/security.h | 9 ++-
fs/cachefiles/cf-bind.c | 3 +
fs/cachefiles/cf-daemon.c | 16 +++--
fs/cachefiles/cf-interface.c | 40 ++++++++-----
fs/cachefiles/cf-security.c | 36 +++++++++++
fs/cachefiles/cf-security.h | 134 ++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 61 +------------------
9 files changed, 225 insertions(+), 88 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3a52698..5bfae9b 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4559,13 +4559,19 @@ static u32 selinux_get_fscreate_secid(vo
return tsec->create_sid;
}

-static u32 selinux_set_fscreate_secid(u32 secid)
+static int selinux_set_fscreate_secid(u32 secid, u32 *oldsecid)
{
struct task_security_struct *tsec = current->security;
- u32 oldsid = tsec->create_sid;
+ int error;
+
+ error = task_has_perm(current, current, PROCESS__SETFSCREATE);
+ if (error < 0)
+ return error;

+ if (oldsecid)
+ *oldsecid = tsec->create_sid;
tsec->create_sid = secid;
- return oldsid;
+ return 0;
}

static u32 selinux_act_as_secid(u32 secid)
diff --git a/security/dummy.c b/security/dummy.c
index 30096ec..471b369 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -937,7 +937,7 @@ static u32 dummy_get_fscreate_secid(void
return 0;
}

-static u32 dummy_set_fscreate_secid(u32 secid)
+static int dummy_set_fscreate_secid(u32 secid, u32 *oldsecid)
{
return 0;
}
diff --git a/include/linux/security.h b/include/linux/security.h
index 8cfeefc..7aa86a3 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1160,6 +1160,7 @@ #ifdef CONFIG_SECURITY
* @set_fscreate_secid:
* Set the current FS security ID.
* @secid contains the security ID to set.
+ * @oldsecid points to where the old security ID will be placed (or NULL).
*
* @act_as_secid:
* Set the security ID as which to act, returning the security ID as which
@@ -1363,7 +1364,7 @@ struct security_operations {
int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen);
void (*release_secctx)(char *secdata, u32 seclen);
u32 (*get_fscreate_secid)(void);
- u32 (*set_fscreate_secid)(u32 secid);
+ int (*set_fscreate_secid)(u32 secid, u32 *oldsecid);
u32 (*act_as_secid)(u32 secid);
u32 (*act_as_self)(void);
int (*cachefiles_get_secid)(u32 secid, u32 *modsecid);
@@ -2174,9 +2175,9 @@ static inline u32 security_get_fscreate_
return security_ops->get_fscreate_secid();
}

-static inline u32 security_set_fscreate_secid(u32 secid)
+static inline int security_set_fscreate_secid(u32 secid, u32 *oldsecid)
{
- return security_ops->set_fscreate_secid(secid);
+ return security_ops->set_fscreate_secid(secid, oldsecid);
}

static inline u32 security_act_as_secid(u32 secid)
@@ -2884,7 +2885,7 @@ static inline u32 security_get_fscreate_
return 0;
}

-static inline u32 security_set_fscreate_secid(u32 secid)
+static inline int security_set_fscreate_secid(u32 secid, u32 *oldsecid)
{
return 0;
}
diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
index 0c055a9..c8e68a4 100644
--- a/fs/cachefiles/cf-bind.c
+++ b/fs/cachefiles/cf-bind.c
@@ -226,6 +226,8 @@ static int cachefiles_daemon_add_cache(s
MAJOR(fsdef->dentry->d_sb->s_dev),
MINOR(fsdef->dentry->d_sb->s_dev));

+ set_bit(CACHEFILES_FSCACHE_INITED, &cache->flags);
+
ret = fscache_add_cache(&cache->cache, &fsdef->fscache, cache->tag);
if (ret < 0)
goto error_add_cache;
@@ -273,6 +275,7 @@ void cachefiles_daemon_unbind(struct cac
cache->cache.identifier);

fscache_withdraw_cache(&cache->cache);
+ clear_bit(CACHEFILES_FSCACHE_INITED, &cache->flags);
}

if (cache->cache.fsdef)
diff --git a/fs/cachefiles/cf-daemon.c b/fs/cachefiles/cf-daemon.c
index 86cf23b..a1888ee 100644
--- a/fs/cachefiles/cf-daemon.c
+++ b/fs/cachefiles/cf-daemon.c
@@ -562,9 +562,11 @@ static int cachefiles_daemon_cull(struct
if (!S_ISDIR(dir->d_inode->i_mode))
goto notdir;

- cachefiles_begin_secure(cache, &secctx);
- ret = cachefiles_cull(cache, dir, args);
- cachefiles_end_secure(cache, &secctx);
+ ret = cachefiles_begin_secure(cache, &secctx);
+ if (ret == 0) {
+ ret = cachefiles_cull(cache, dir, args);
+ cachefiles_end_secure(cache, &secctx);
+ }

dput(dir);
_leave(" = %d", ret);
@@ -654,9 +656,11 @@ static int cachefiles_daemon_inuse(struc
if (!S_ISDIR(dir->d_inode->i_mode))
goto notdir;

- cachefiles_begin_secure(cache, &secctx);
- ret = cachefiles_check_in_use(cache, dir, args);
- cachefiles_end_secure(cache, &secctx);
+ ret = cachefiles_begin_secure(cache, &secctx);
+ if (ret == 0) {
+ ret = cachefiles_check_in_use(cache, dir, args);
+ cachefiles_end_secure(cache, &secctx);
+ }

dput(dir);
_leave(" = %d", ret);
diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
index e96e63a..6ce14f5 100644
--- a/fs/cachefiles/cf-interface.c
+++ b/fs/cachefiles/cf-interface.c
@@ -93,7 +93,9 @@ static struct fscache_object *cachefiles
auxdata->type = cookie->def->type;

/* look up the key, creating any missing bits */
- cachefiles_begin_secure(cache, &secctx);
+ ret = cachefiles_begin_secure(cache, &secctx);
+ if (ret < 0)
+ goto lookup_failed;
ret = cachefiles_walk_to_object(parent, object, key, auxdata);
cachefiles_end_secure(cache, &secctx);
if (ret < 0)
@@ -186,9 +188,10 @@ static void cachefiles_update_object(str
object = container_of(_object, struct cachefiles_object, fscache);
cache = container_of(object->fscache.cache, struct cachefiles_cache, cache);

- cachefiles_begin_secure(cache, &secctx);
- //cachefiles_tree_update_object(super, object);
- cachefiles_end_secure(cache, &secctx);
+ if (cachefiles_begin_secure(cache, &secctx) == 0) {
+ //cachefiles_tree_update_object(super, object);
+ cachefiles_end_secure(cache, &secctx);
+ }
}

/*
@@ -199,6 +202,7 @@ static void cachefiles_put_object(struct
struct cachefiles_secctx secctx;
struct cachefiles_object *object;
struct cachefiles_cache *cache;
+ int ret;

ASSERT(_object);

@@ -224,9 +228,11 @@ #endif
_object != cache->cache.fsdef
) {
_debug("- retire object %p", object);
- cachefiles_begin_secure(cache, &secctx);
- cachefiles_delete_object(cache, object);
- cachefiles_end_secure(cache, &secctx);
+ ret = cachefiles_begin_secure(cache, &secctx);
+ if (ret == 0) {
+ cachefiles_delete_object(cache, object);
+ cachefiles_end_secure(cache, &secctx);
+ }
}

/* close the filesystem stuff attached to the object */
@@ -269,9 +275,11 @@ static void cachefiles_sync_cache(struct

/* make sure all pages pinned by operations on behalf of the netfs are
* written to disc */
- cachefiles_begin_secure(cache, &secctx);
- ret = fsync_super(cache->mnt->mnt_sb);
- cachefiles_end_secure(cache, &secctx);
+ ret = cachefiles_begin_secure(cache, &secctx);
+ if (ret == 0) {
+ ret = fsync_super(cache->mnt->mnt_sb);
+ cachefiles_end_secure(cache, &secctx);
+ }

if (ret == -EIO)
cachefiles_io_error(cache,
@@ -308,11 +316,13 @@ static int cachefiles_set_i_size(struct
newattrs.ia_size = i_size;
newattrs.ia_valid = ATTR_SIZE;

- cachefiles_begin_secure(cache, &secctx);
- mutex_lock(&object->backer->d_inode->i_mutex);
- ret = notify_change(object->backer, &newattrs);
- mutex_unlock(&object->backer->d_inode->i_mutex);
- cachefiles_end_secure(cache, &secctx);
+ ret = cachefiles_begin_secure(cache, &secctx);
+ if (ret == 0) {
+ mutex_lock(&object->backer->d_inode->i_mutex);
+ ret = notify_change(object->backer, &newattrs);
+ mutex_unlock(&object->backer->d_inode->i_mutex);
+ cachefiles_end_secure(cache, &secctx);
+ }

if (ret == -EIO) {
cachefiles_io_error_obj(object, "Size set failed");
diff --git a/fs/cachefiles/cf-security.c b/fs/cachefiles/cf-security.c
index e070bb3..6d294da 100644
--- a/fs/cachefiles/cf-security.c
+++ b/fs/cachefiles/cf-security.c
@@ -105,3 +105,39 @@ error:
_leave(" = %d", ret);
return ret;
}
+
+/*
+ * deal with failure to change the file creation security ID
+ */
+int cachefiles_begin_secure_failed(struct cachefiles_cache *cache, int ret)
+{
+ kerror("Unable to enter secure region in process %d (error %d)",
+ current->pid, ret);
+
+ if (test_bit(CACHEFILES_FSCACHE_INITED, &cache->flags))
+ fscache_io_error(&cache->cache);
+ set_bit(CACHEFILES_DEAD, &cache->flags);
+ security_act_as_self();
+ return ret;
+}
+
+/*
+ * deal with failure to restore the file creation security ID
+ */
+void cachefiles_end_secure_failed(struct cachefiles_cache *cache,
+ const struct cachefiles_secctx *ctx,
+ int ret)
+{
+ printk(KERN_ERR "CacheFiles: ERROR:"
+ " Failed to restore file creation security ID %x on process %d"
+ " (error %d)\n",
+ ctx->fscreate_secid, current->pid, ret);
+ printk(KERN_ERR "CacheFiles: Killing process %d\n", current->pid);
+
+ if (test_bit(CACHEFILES_FSCACHE_INITED, &cache->flags))
+ fscache_io_error(&cache->cache);
+ set_bit(CACHEFILES_DEAD, &cache->flags);
+
+ /* this process cannot be allowed to continue */
+ force_sig(SIGKILL, current);
+}
diff --git a/fs/cachefiles/cf-security.h b/fs/cachefiles/cf-security.h
new file mode 100644
index 0000000..a1dd74c
--- /dev/null
+++ b/fs/cachefiles/cf-security.h
@@ -0,0 +1,134 @@
+/* LSM security context manipulation
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/*
+ * saved security context
+ */
+struct cachefiles_secctx {
+ uid_t fsuid; /* save for current->fsuid */
+ gid_t fsgid; /* save for current->fsgid */
+#ifdef CONFIG_SECURITY
+ u32 fscreate_secid; /* save for current fscreate security ID */
+#endif
+};
+
+#ifndef CONFIG_SECURITY
+#define cachefiles_get_security_ID(cache) (0)
+#define cachefiles_determine_cache_secid(cache, root) (0)
+#define cachefiles_set_fscreate_secid(cache) do {} while(0)
+
+/*
+ * attempt to enter the cachefiles security context
+ */
+static inline int cachefiles_begin_secure(struct cachefiles_cache *cache,
+ struct cachefiles_secctx *ctx)
+{
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+ return 0;
+}
+
+#define cachefiles_begin_secure_nofs(cache, ctx) \
+ cachefiles_begin_secure(cache, ctx)
+
+/*
+ * attempt to leave the cachefiles security context
+ */
+static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
+ const struct cachefiles_secctx *ctx)
+{
+ current->fsuid = ctx->fsuid;
+ current->fsgid = ctx->fsgid;
+}
+
+#else /* !CONFIG_SECURITY */
+
+extern int cachefiles_get_security_ID(struct cachefiles_cache *cache);
+extern int cachefiles_determine_cache_secid(struct cachefiles_cache *cache,
+ struct dentry *root);
+extern int cachefiles_begin_secure_failed(struct cachefiles_cache *cache,
+ int ret);
+extern void cachefiles_end_secure_failed(struct cachefiles_cache *cache,
+ const struct cachefiles_secctx *ctx,
+ int ret);
+
+/*
+ * attempt to set the file creation security ID
+ */
+static inline int cachefiles_set_fscreate_secid(struct cachefiles_cache *cache)
+{
+ int ret;
+
+ ret = security_set_fscreate_secid(cache->cache_secid, NULL);
+ if (unlikely(ret < 0))
+ return cachefiles_begin_secure_failed(cache, ret);
+ return 0;
+}
+
+/*
+ * enter the cachefiles security context without changing the file creation
+ * security ID
+ */
+static inline void cachefiles_begin_secure_nofs(struct cachefiles_cache *cache,
+ struct cachefiles_secctx *ctx)
+{
+ security_act_as_secid(cache->access_secid);
+ ctx->fscreate_secid = security_get_fscreate_secid();
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+}
+
+/*
+ * attempt to enter the cachefiles security context
+ */
+static inline int cachefiles_begin_secure(struct cachefiles_cache *cache,
+ struct cachefiles_secctx *ctx)
+{
+ int ret;
+
+ security_act_as_secid(cache->access_secid);
+ ret = security_set_fscreate_secid(cache->cache_secid,
+ &ctx->fscreate_secid);
+ if (unlikely(ret < 0))
+ return cachefiles_begin_secure_failed(cache, ret);
+
+ ctx->fsuid = current->fsuid;
+ ctx->fsgid = current->fsgid;
+ current->fsuid = 0;
+ current->fsgid = 0;
+ return 0;
+}
+
+/*
+ * attempt to leave the cachefiles security context
+ */
+static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
+ const struct cachefiles_secctx *ctx)
+{
+ int ret;
+
+ current->fsuid = ctx->fsuid;
+ current->fsgid = ctx->fsgid;
+
+ /* restoring the file creation security ID might fail, but there's
+ * nothing we can do about it if it does */
+ ret = security_set_fscreate_secid(ctx->fscreate_secid, NULL);
+ if (unlikely(ret < 0))
+ cachefiles_end_secure_failed(cache, ctx, ret);
+
+ security_act_as_self();
+}
+
+#endif /* !CONFIG_SECURITY */
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 4715de5..16727fc 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -102,6 +102,7 @@ #define CACHEFILES_READY 0 /* T if cach
#define CACHEFILES_DEAD 1 /* T if cache dead */
#define CACHEFILES_CULLING 2 /* T if cull engaged */
#define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */
+#define CACHEFILES_FSCACHE_INITED 4 /* T if fscache_init_cache() has been called */
char *rootdirname; /* name of cache root directory */
char *tag; /* cache binding tag */
};
@@ -189,65 +190,7 @@ extern int cachefiles_check_in_use(struc
/*
* cf-security.c
*/
-#ifdef CONFIG_SECURITY
-extern int cachefiles_get_security_ID(struct cachefiles_cache *cache);
-extern int cachefiles_determine_cache_secid(struct cachefiles_cache *cache,
- struct dentry *root);
-static inline
-void cachefiles_set_fscreate_secid(struct cachefiles_cache *cache)
-{
- security_set_fscreate_secid(cache->cache_secid);
-}
-#else
-#define cachefiles_get_security_ID(cache) (0)
-#define cachefiles_determine_cache_secid(cache, root) (0)
-#define cachefiles_set_fscreate_secid(cache) do {} while(0)
-#endif
-
-struct cachefiles_secctx {
- uid_t fsuid; /* save for current->fsuid */
- gid_t fsgid; /* save for current->fsgid */
-#ifdef CONFIG_SECURITY
- u32 fscreate_secid; /* save for current fscreate security ID */
-#endif
-};
-
-static inline void cachefiles_begin_secure_nofs(struct cachefiles_cache *cache,
- struct cachefiles_secctx *ctx)
-{
-#ifdef CONFIG_SECURITY
- security_act_as_secid(cache->access_secid);
- ctx->fscreate_secid = security_get_fscreate_secid();
-#endif
- ctx->fsuid = current->fsuid;
- ctx->fsgid = current->fsgid;
- current->fsuid = 0;
- current->fsgid = 0;
-}
-
-static inline void cachefiles_begin_secure(struct cachefiles_cache *cache,
- struct cachefiles_secctx *ctx)
-{
-#ifdef CONFIG_SECURITY
- security_act_as_secid(cache->access_secid);
- ctx->fscreate_secid = security_set_fscreate_secid(cache->cache_secid);
-#endif
- ctx->fsuid = current->fsuid;
- ctx->fsgid = current->fsgid;
- current->fsuid = 0;
- current->fsgid = 0;
-}
-
-static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
- const struct cachefiles_secctx *ctx)
-{
- current->fsuid = ctx->fsuid;
- current->fsgid = ctx->fsgid;
-#ifdef CONFIG_SECURITY
- security_set_fscreate_secid(ctx->fscreate_secid);
- security_act_as_self();
-#endif
-}
+#include "cf-security.h"

/*
* cf-xattr.c

2006-11-15 15:10:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 05/19] NFS: Use local caching

On Tue, 2006-11-14 at 20:06 +0000, David Howells wrote:
> The attached patch makes it possible for the NFS filesystem to make use of the
> network filesystem local caching service (FS-Cache).
>
> To be able to use this, an updated mount program is required. This can be
> obtained from:
>
> http://people.redhat.com/steved/cachefs/util-linux/
>
> To mount an NFS filesystem to use caching, add an "fsc" option to the mount:
>
> mount warthog:/ /a -o fsc
>
> Signed-Off-By: David Howells <[email protected]>
> ---
>
> fs/Kconfig | 8 +
> fs/nfs/Makefile | 1
> fs/nfs/client.c | 11 +
> fs/nfs/file.c | 49 ++++-
> fs/nfs/fscache.c | 347 ++++++++++++++++++++++++++++++++
> fs/nfs/fscache.h | 471 ++++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/inode.c | 21 ++
> fs/nfs/internal.h | 32 +++
> fs/nfs/pagelist.c | 3
> fs/nfs/read.c | 30 +++
> fs/nfs/super.c | 1
> fs/nfs/sysctl.c | 43 ++++
> fs/nfs/write.c | 11 +
> include/linux/nfs4_mount.h | 1
> include/linux/nfs_fs.h | 4
> include/linux/nfs_fs_sb.h | 5
> include/linux/nfs_mount.h | 1
> 17 files changed, 1029 insertions(+), 10 deletions(-)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index aa6fad1..04bfc27 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -1648,6 +1648,14 @@ config NFS_V4
>
> If unsure, say N.
>
> +config NFS_FSCACHE
> + bool "Provide NFS client caching support (EXPERIMENTAL)"
> + depends on EXPERIMENTAL
> + depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
> + help
> + Say Y here if you want NFS data to be cached locally on disc through
> + the general filesystem cache manager
> +
> config NFS_DIRECTIO
> bool "Allow direct I/O on NFS files"
> depends on NFS_FS
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index f4580b4..2af6f22 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4x
> nfs4namespace.o
> nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
> nfs-$(CONFIG_SYSCTL) += sysctl.o
> +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
> nfs-objs := $(nfs-y)
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 5fea638..6e19b28 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -149,6 +149,8 @@ #ifdef CONFIG_NFS_V4
> clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
> #endif
>
> + nfs_fscache_get_client_cookie(clp);
> +
> return clp;
>
> error_3:
> @@ -192,6 +194,8 @@ static void nfs_free_client(struct nfs_c
>
> nfs4_shutdown_client(clp);
>
> + nfs_fscache_release_client_cookie(clp);
> +
> /* -EIO all pending I/O */
> if (!IS_ERR(clp->cl_rpcclient))
> rpc_shutdown_client(clp->cl_rpcclient);
> @@ -1368,7 +1372,7 @@ static int nfs_volume_list_show(struct s
>
> /* display header on line 1 */
> if (v == SEQ_START_TOKEN) {
> - seq_puts(m, "NV SERVER PORT DEV FSID\n");
> + seq_puts(m, "NV SERVER PORT DEV FSID FSC\n");
> return 0;
> }
> /* display one transport per line on subsequent lines */
> @@ -1382,12 +1386,13 @@ static int nfs_volume_list_show(struct s
> (unsigned long long) server->fsid.major,
> (unsigned long long) server->fsid.minor);
>
> - seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
> + seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
> clp->cl_nfsversion,
> NIPQUAD(clp->cl_addr.sin_addr),
> ntohs(clp->cl_addr.sin_port),
> dev,
> - fsid);
> + fsid,
> + nfs_server_fscache_state(server));
>
> return 0;
> }
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index cc93865..9da03ec 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -27,12 +27,14 @@ #include <linux/mm.h>
> #include <linux/slab.h>
> #include <linux/pagemap.h>
> #include <linux/smp_lock.h>
> +#include <linux/buffer_head.h>
>
> #include <asm/uaccess.h>
> #include <asm/system.h>
>
> #include "delegation.h"
> #include "iostat.h"
> +#include "internal.h"
>
> #define NFSDBG_FACILITY NFSDBG_FILE
>
> @@ -253,6 +255,10 @@ nfs_file_mmap(struct file * file, struct
> status = nfs_revalidate_mapping(inode, file->f_mapping);
> if (!status)
> status = generic_file_mmap(file, vma);
> +
> + if (status == 0)
> + nfs_fscache_install_vm_ops(inode, vma);
> +
> return status;
> }
>
> @@ -305,6 +311,12 @@ static int nfs_commit_write(struct file
> return status;
> }
>
> +/*
> + * partially or wholly invalidate a page
> + * - release the private state associated with a page if undergoing complete
> + * page invalidation
> + * - caller holds page lock
> + */
> static void nfs_invalidate_page(struct page *page, unsigned long offset)
> {
> struct inode *inode = page->mapping->host;
> @@ -312,19 +324,47 @@ static void nfs_invalidate_page(struct p
> /* Cancel any unstarted writes on this page */
> if (offset == 0)
> nfs_sync_inode_wait(inode, page->index, 1, FLUSH_INVALIDATE);
> +
> + nfs_fscache_invalidate_page(page, inode, offset);
> +
> + /* we can do this here as the bits are only set with the page lock
> + * held, and our caller is holding that */
> + if (!page->private)
> + ClearPagePrivate(page);
> }
>
> +/*
> + * release the private state associated with a page, if the page isn't busy
> + * - caller holds page lock
> + * - return true (may release) or false (may not)
> + */
> static int nfs_release_page(struct page *page, gfp_t gfp)
> {
> - if (gfp & __GFP_FS)
> - return !nfs_wb_page(page->mapping->host, page);
> - else
> + if ((gfp & __GFP_FS) == 0) {
> /*
> * Avoid deadlock on nfs_wait_on_request().
> */
> return 0;
> + }
> +
> + if (nfs_wb_page(page->mapping->host, page) < 0)
> + return 0;
> +
> + if (nfs_fscache_release_page(page) < 0)
> + return 0;

Why is fscache being given a vote on whether or not the NFS page can be
removed from the mapping? If the file has changed on the server, so that
we have to invalidate the mapping, then I don't care about the fact that
fscache is busy: the page has to go.

> + /* PG_private may have been set due to either caching or writing */
> + BUG_ON(page->private != 0);
> + ClearPagePrivate(page);
> +
> + return 1;
> }
>
> +/*
> + * Since we use page->private for our own nefarious purposes when using
> + * fscache, we have to override extra address space ops to prevent fs/buffer.c
> + * from getting confused, even though we may not have asked its opinion
> + */
> const struct address_space_operations nfs_file_aops = {
> .readpage = nfs_readpage,
> .readpages = nfs_readpages,
> @@ -338,6 +378,9 @@ const struct address_space_operations nf
> #ifdef CONFIG_NFS_DIRECTIO
> .direct_IO = nfs_direct_IO,
> #endif
> +#ifdef CONFIG_NFS_FSCACHE
> + .sync_page = block_sync_page,
> +#endif
> };
>
> static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> new file mode 100644
> index 0000000..81286f6
> --- /dev/null
> +++ b/fs/nfs/fscache.c
> @@ -0,0 +1,347 @@
> +/* fscache.c: NFS filesystem cache interface
> + *
> + * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_fs_sb.h>
> +#include <linux/in6.h>
> +
> +#include "internal.h"
> +
> +/*
> + * Sysctl variables
> + */
> +atomic_t nfs_fscache_to_pages;
> +atomic_t nfs_fscache_from_pages;
> +atomic_t nfs_fscache_uncache_page;
> +int nfs_fscache_from_error;
> +int nfs_fscache_to_error;
> +
> +#define NFSDBG_FACILITY NFSDBG_FSCACHE
> +
> +/* the auxiliary data in the cache (used for coherency management) */
> +struct nfs_fh_auxdata {
> + struct timespec i_mtime;
> + struct timespec i_ctime;
> + loff_t i_size;
> +};

You are missing the NFSv4 change attribute. The latter is supposed to
override mtime/ctime/size concerns in NFSv4.

> +static struct fscache_netfs_operations nfs_cache_ops = {
> +};
> +
> +struct fscache_netfs nfs_cache_netfs = {
> + .name = "nfs",
> + .version = 0,
> + .ops = &nfs_cache_ops,
> +};
> +
> +static const uint8_t nfs_cache_ipv6_wrapper_for_ipv4[12] = {
> + [0 ... 9] = 0x00,
> + [10 ... 11] = 0xff
> +};
> +
> +struct nfs_server_key {
> + uint16_t nfsversion;
> + uint16_t port;
> + union {
> + struct {
> + uint8_t ipv6wrapper[12];
> + struct in_addr addr;
> + } ipv4_addr;
> + struct in6_addr ipv6_addr;
> + };
> +};
>
> +static uint16_t nfs_server_get_key(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + const struct nfs_client *clp = cookie_netfs_data;
> + struct nfs_server_key *key = buffer;
> + uint16_t len = 0;
> +
> + key->nfsversion = clp->cl_nfsversion;
> +
> + switch (clp->cl_addr.sin_family) {
> + case AF_INET:
> + key->port = clp->cl_addr.sin_port;
> +
> + memcpy(&key->ipv4_addr.ipv6wrapper,
> + &nfs_cache_ipv6_wrapper_for_ipv4,
> + sizeof(key->ipv4_addr.ipv6wrapper));
> + memcpy(&key->ipv4_addr.addr,
> + &clp->cl_addr.sin_addr,
> + sizeof(key->ipv4_addr.addr));
> + len = sizeof(struct nfs_server_key);
> + break;
> +
> + case AF_INET6:
> + key->port = clp->cl_addr.sin_port;
> +
> + memcpy(&key->ipv6_addr,
> + &clp->cl_addr.sin_addr,
> + sizeof(key->ipv6_addr));
> + len = sizeof(struct nfs_server_key);
> + break;
> +
> + default:
> + len = 0;
> + printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
> + clp->cl_addr.sin_family);
> + break;
> + }
> +
> + return len;
> +}
> +
> +/*
> + * the root index for the filesystem is defined by nfsd IP address and ports
> + */
> +struct fscache_cookie_def nfs_cache_server_index_def = {
> + .name = "NFS.servers",
> + .type = FSCACHE_COOKIE_TYPE_INDEX,
> + .get_key = nfs_server_get_key,
> +};
> +
> +static uint16_t nfs_fh_get_key(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + const struct nfs_inode *nfsi = cookie_netfs_data;
> + uint16_t nsize;
> +
> + /* set the file handle */
> + nsize = nfsi->fh.size;
> + memcpy(buffer, nfsi->fh.data, nsize);
> + return nsize;
> +}
> +
> +/*
> + * indication of pages that now have cache metadata retained
> + * - this function should mark the specified pages as now being cached
> + */
> +static void nfs_fh_mark_pages_cached(void *cookie_netfs_data,
> + struct address_space *mapping,
> + struct pagevec *cached_pvec)
> +{
> + struct nfs_inode *nfsi = cookie_netfs_data;
> + unsigned long loop;
> +
> + dprintk("NFS: nfs_fh_mark_pages_cached: nfs_inode 0x%p pages %ld\n",
> + nfsi, cached_pvec->nr);
> +
> + BUG_ON(!nfsi->fscache);
> +
> + for (loop = 0; loop < cached_pvec->nr; loop++)
> + SetPageNfsCached(cached_pvec->pages[loop]);
> +}
> +
> +/*
> + * get an extra reference on a read context
> + * - this function can be absent if the completion function doesn't
> + * require a context
> + */
> +static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
> +{
> + get_nfs_open_context(context);
> +}
> +
> +/*
> + * release an extra reference on a read context
> + * - this function can be absent if the completion function doesn't
> + * require a context
> + */
> +static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
> +{
> + if (context)
> + put_nfs_open_context(context);
> +}
> +
> +/*
> + * indication the cookie is no longer uncached
> + * - this function is called when the backing store currently caching a cookie
> + * is removed
> + * - the netfs should use this to clean up any markers indicating cached pages
> + * - this is mandatory for any object that may have data
> + */
> +static void nfs_fh_now_uncached(void *cookie_netfs_data)
> +{
> + struct nfs_inode *nfsi = cookie_netfs_data;
> + struct pagevec pvec;
> + pgoff_t first;
> + int loop, nr_pages;
> +
> + pagevec_init(&pvec, 0);
> + first = 0;
> +
> + dprintk("NFS: nfs_fh_now_uncached: nfs_inode 0x%p\n", nfsi);
> +
> + for (;;) {
> + /* grab a bunch of pages to clean */
> + nr_pages = pagevec_lookup(&pvec,
> + nfsi->vfs_inode.i_mapping,
> + first,
> + PAGEVEC_SIZE - pagevec_count(&pvec));
> + if (!nr_pages)
> + break;
> +
> + for (loop = 0; loop < nr_pages; loop++)
> + ClearPageNfsCached(pvec.pages[loop]);
> +
> + first = pvec.pages[nr_pages - 1]->index + 1;
> +
> + pvec.nr = nr_pages;
> + pagevec_release(&pvec);
> + cond_resched();
> + }
> +}
> +
> +/*
> + * get certain file attributes from the netfs data
> + * - this function can be absent for an index
> + * - not permitted to return an error
> + * - the netfs data from the cookie being used as the source is
> + * presented
> + */
> +static void nfs_fh_get_attr(const void *cookie_netfs_data, uint64_t *size)
> +{
> + const struct nfs_inode *nfsi = cookie_netfs_data;
> +
> + *size = nfsi->vfs_inode.i_size;
> +}
> +
> +/*
> + * get the auxilliary data from netfs data
> + * - this function can be absent if the index carries no state data
> + * - should store the auxilliary data in the buffer
> + * - should return the amount of amount stored
> + * - not permitted to return an error
> + * - the netfs data from the cookie being used as the source is
> + * presented
> + */
> +static uint16_t nfs_fh_get_aux(const void *cookie_netfs_data,
> + void *buffer, uint16_t bufmax)
> +{
> + struct nfs_fh_auxdata auxdata;
> + const struct nfs_inode *nfsi = cookie_netfs_data;
> +
> + auxdata.i_size = nfsi->vfs_inode.i_size;
> + auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
> + auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
> +
> + if (bufmax > sizeof(auxdata))
> + bufmax = sizeof(auxdata);
> +
> + memcpy(buffer, &auxdata, bufmax);
> + return bufmax;
> +}
> +
> +/*
> + * consult the netfs about the state of an object
> + * - this function can be absent if the index carries no state data
> + * - the netfs data from the cookie being used as the target is
> + * presented, as is the auxilliary data
> + */
> +static fscache_checkaux_t nfs_fh_check_aux(void *cookie_netfs_data,
> + const void *data, uint16_t datalen)
> +{
> + struct nfs_fh_auxdata auxdata;
> + struct nfs_inode *nfsi = cookie_netfs_data;
> +
> + if (datalen > sizeof(auxdata))
> + return FSCACHE_CHECKAUX_OBSOLETE;
> +
> + auxdata.i_size = nfsi->vfs_inode.i_size;
> + auxdata.i_mtime = nfsi->vfs_inode.i_mtime;
> + auxdata.i_ctime = nfsi->vfs_inode.i_ctime;
> +
> + if (memcmp(data, &auxdata, datalen) != 0)
> + return FSCACHE_CHECKAUX_OBSOLETE;
> +
> + return FSCACHE_CHECKAUX_OKAY;
> +}
> +
> +/*
> + * the primary index for each server is simply made up of a series of NFS file
> + * handles
> + */
> +struct fscache_cookie_def nfs_cache_fh_index_def = {
> + .name = "NFS.fh",
> + .type = FSCACHE_COOKIE_TYPE_DATAFILE,
> + .get_key = nfs_fh_get_key,
> + .get_attr = nfs_fh_get_attr,
> + .get_aux = nfs_fh_get_aux,
> + .check_aux = nfs_fh_check_aux,
> + .get_context = nfs_fh_get_context,
> + .put_context = nfs_fh_put_context,
> + .mark_pages_cached = nfs_fh_mark_pages_cached,
> + .now_uncached = nfs_fh_now_uncached,
> +};
> +
> +static int nfs_file_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> +{
> + wait_on_page_fs_misc(page);
> + return 0;
> +}
> +
> +struct vm_operations_struct nfs_fs_vm_operations = {
> + .nopage = filemap_nopage,
> + .populate = filemap_populate,
> + .page_mkwrite = nfs_file_page_mkwrite,
> +};
> +
> +/*
> + * handle completion of a page being stored in the cache
> + */
> +void nfs_readpage_to_fscache_complete(struct page *page, void *data, int error)
> +{
> + dfprintk(FSCACHE,
> + "NFS: readpage_to_fscache_complete (p:%p(i:%lx f:%lx)/%d)\n",
> + page, page->index, page->flags, error);
> +
> + end_page_fs_misc(page);
> +}
> +
> +/*
> + * handle completion of a page being read from the cache
> + * - called in process (keventd) context
> + */
> +void nfs_readpage_from_fscache_complete(struct page *page,
> + void *context,
> + int error)
> +{
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
> + page, context, error);
> +
> + /* if the read completes with an error, we just unlock the page and let
> + * the VM reissue the readpage */
> + if (!error) {
> + SetPageUptodate(page);
> + unlock_page(page);
> + } else {
> + error = nfs_readpage_async(context, page->mapping->host, page);
> + if (error)
> + unlock_page(page);
> + }
> +}
> +
> +/*
> + * handle completion of a page being read from the cache
> + * - really need to synchronise the end of writeback, probably using a page
> + * flag, but for the moment we disable caching on writable files
> + */
> +void nfs_writepage_to_fscache_complete(struct page *page,
> + void *data,
> + int error)
> +{
> +}
> diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
> new file mode 100644
> index 0000000..00a2c07
> --- /dev/null
> +++ b/fs/nfs/fscache.h
> @@ -0,0 +1,471 @@
> +/* fscache.h: NFS filesystem cache interface definitions
> + *
> + * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _NFS_FSCACHE_H
> +#define _NFS_FSCACHE_H
> +
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_mount.h>
> +#include <linux/nfs4_mount.h>
> +
> +#ifdef CONFIG_NFS_FSCACHE
> +#include <linux/fscache.h>
> +
> +extern struct fscache_netfs nfs_cache_netfs;
> +extern struct fscache_cookie_def nfs_cache_server_index_def;
> +extern struct fscache_cookie_def nfs_cache_fh_index_def;
> +extern struct vm_operations_struct nfs_fs_vm_operations;
> +
> +extern void nfs_invalidatepage(struct page *, unsigned long);
> +extern int nfs_releasepage(struct page *, gfp_t);
> +
> +extern atomic_t nfs_fscache_to_pages;
> +extern atomic_t nfs_fscache_from_pages;
> +extern atomic_t nfs_fscache_uncache_page;
> +extern int nfs_fscache_from_error;
> +extern int nfs_fscache_to_error;
> +
> +/*
> + * register NFS for caching
> + */
> +static inline int nfs_fscache_register(void)
> +{
> + return fscache_register_netfs(&nfs_cache_netfs);
> +}
> +
> +/*
> + * unregister NFS for caching
> + */
> +static inline void nfs_fscache_unregister(void)
> +{
> + fscache_unregister_netfs(&nfs_cache_netfs);
> +}
> +
> +/*
> + * get the per-client index cookie for an NFS client if the appropriate mount
> + * flag was set
> + * - we always try and get an index cookie for the client, but get filehandle
> + * cookies on a per-superblock basis, depending on the mount flags
> + */
> +static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp)
> +{
> + /* create a cache index for looking up filehandles */
> + clp->fscache = fscache_acquire_cookie(nfs_cache_netfs.primary_index,
> + &nfs_cache_server_index_def,
> + clp);
> + dfprintk(FSCACHE,"NFS: get client cookie (0x%p/0x%p)\n",
> + clp, clp->fscache);
> +}
> +
> +/*
> + * dispose of a per-client cookie
> + */
> +static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp)
> +{
> + dfprintk(FSCACHE,"NFS: releasing client cookie (0x%p/0x%p)\n",
> + clp, clp->fscache);
> +
> + fscache_relinquish_cookie(clp->fscache, 0);
> + clp->fscache = NULL;
> +}
> +
> +/*
> + * indicate the client caching state as readable text
> + */
> +static inline const char *nfs_server_fscache_state(struct nfs_server *server)
> +{
> + if (server->nfs_client->fscache && (server->flags & NFS_MOUNT_FSCACHE))
> + return "yes";
> + return "no ";
> +}
> +
> +/*
> + * get the per-filehandle cookie for an NFS inode
> + */
> +static inline void nfs_fscache_get_fh_cookie(struct inode *inode,
> + int maycache)
> +{
> + struct super_block *sb = inode->i_sb;
> + struct nfs_inode *nfsi = NFS_I(inode);
> +
> + nfsi->fscache = NULL;
> + if (maycache && (NFS_SB(sb)->flags & NFS_MOUNT_FSCACHE)) {
> + nfsi->fscache = fscache_acquire_cookie(
> + NFS_SB(sb)->nfs_client->fscache,
> + &nfs_cache_fh_index_def,
> + nfsi);
> +
> + fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
> +
> + dfprintk(FSCACHE, "NFS: get FH cookie (0x%p/0x%p/0x%p)\n",
> + sb, nfsi, nfsi->fscache);
> + }
> +}
> +
> +/*
> + * change the filesize associated with a per-filehandle cookie
> + */
> +static inline void nfs_fscache_set_size(struct inode *inode)
> +{
> + fscache_set_i_size(NFS_I(inode)->fscache, inode->i_size);
> +}
> +
> +/*
> + * replace a per-filehandle cookie due to revalidation detecting a file having
> + * changed on the server
> + */
> +static inline void nfs_fscache_renew_fh_cookie(struct inode *inode)
> +{
> + struct nfs_inode *nfsi = NFS_I(inode);
> + struct nfs_server *server = NFS_SERVER(inode);
> + struct fscache_cookie *old = nfsi->fscache;
> +
> + if (nfsi->fscache) {
> + /* retire the current fscache cache and get a new one */
> + fscache_relinquish_cookie(nfsi->fscache, 1);
> +
> + nfsi->fscache = fscache_acquire_cookie(
> + server->nfs_client->fscache,
> + &nfs_cache_fh_index_def,
> + nfsi);
> + fscache_set_i_size(nfsi->fscache, nfsi->vfs_inode.i_size);
> +
> + dfprintk(FSCACHE,
> + "NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n",
> + server, nfsi, old, nfsi->fscache);
> + }
> +}
> +
> +/*
> + * release a per-filehandle cookie
> + */
> +static inline void nfs_fscache_release_fh_cookie(struct inode *inode)
> +{
> + struct nfs_inode *nfsi = NFS_I(inode);
> +
> + dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n",
> + nfsi, nfsi->fscache);
> +
> + fscache_relinquish_cookie(nfsi->fscache, 0);
> + nfsi->fscache = NULL;
> +}
> +
> +/*
> + * retire a per-filehandle cookie, destroying the data attached to it
> + */
> +static inline void nfs_fscache_zap_fh_cookie(struct inode *inode)
> +{
> + struct nfs_inode *nfsi = NFS_I(inode);
> +
> + dfprintk(FSCACHE,"NFS: zapping cookie (0x%p/0x%p)\n",
> + nfsi, nfsi->fscache);
> +
> + fscache_relinquish_cookie(nfsi->fscache, 1);
> + nfsi->fscache = NULL;
> +}
> +
> +/*
> + * turn off the cache with regard to a filehandle cookie if opened for writing,
> + * invalidating all the pages in the page cache relating to the associated
> + * inode to clear the per-page caching
> + */
> +static inline void nfs_fscache_disable_fh_cookie(struct inode *inode)
> +{
> + if (NFS_I(inode)->fscache) {
> + dfprintk(FSCACHE,
> + "NFS: nfsi 0x%p turning cache off\n", NFS_I(inode));
> +
> + /* Need to invalided any mapped pages that were read in before
> + * turning off the cache.
> + */
> + if (inode->i_mapping && inode->i_mapping->nrpages)
> + invalidate_inode_pages2(inode->i_mapping);
> +
> + nfs_fscache_zap_fh_cookie(inode);
> + }
> +}
> +
> +/*
> + * install the VM ops for mmap() of an NFS file so that we can hold up writes
> + * to pages on shared writable mappings until the store to the cache is
> + * complete
> + */
> +static inline void nfs_fscache_install_vm_ops(struct inode *inode,
> + struct vm_area_struct *vma)
> +{
> + if (NFS_I(inode)->fscache)
> + vma->vm_ops = &nfs_fs_vm_operations;
> +}
> +
> +/*
> + * release the caching state associated with a page, if the page isn't busy
> + * interacting with the cache
> + */
> +static inline int nfs_fscache_release_page(struct page *page)
> +{
> + if (PageFsMisc(page))
> + return -EBUSY;
> +
> + if (PageNfsCached(page)) {
> + struct nfs_inode *nfsi = NFS_I(page->mapping->host);
> +
> + BUG_ON(!nfsi->fscache);
> +
> + dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
> + nfsi->fscache, page, nfsi);
> +
> + fscache_uncache_page(nfsi->fscache, page);
> + atomic_inc(&nfs_fscache_uncache_page);
> + ClearPageNfsCached(page);
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * release the caching state associated with a page if undergoing complete page
> + * invalidation
> + */
> +static inline void nfs_fscache_invalidate_page(struct page *page,
> + struct inode *inode,
> + unsigned long offset)
> +{
> + struct nfs_inode *nfsi = NFS_I(page->mapping->host);
> +
> + if (PageNfsCached(page)) {
> + BUG_ON(!nfsi->fscache);
> +
> + dfprintk(FSCACHE,
> + "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
> + nfsi->fscache, page, nfsi);
> +
> + wait_on_page_fs_misc(page);
> +
> + if (offset == 0) {
> + BUG_ON(!PageLocked(page));
> + if (!PageWriteback(page)) {
> + fscache_uncache_page(nfsi->fscache, page);
> + atomic_inc(&nfs_fscache_uncache_page);
> + ClearPageNfsCached(page);
> + }
> + }
> + }
> +}
> +
> +/*
> + * store a newly fetched page in fscache
> + */
> +extern void nfs_readpage_to_fscache_complete(struct page *, void *, int);
> +
> +static inline void nfs_readpage_to_fscache(struct inode *inode,
> + struct page *page,
> + int sync)
> +{
> + int ret;
> +
> + if (PageNfsCached(page)) {
> + dfprintk(FSCACHE,
> + "NFS: "
> + "readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
> + NFS_I(inode)->fscache, page, page->index, page->flags,
> + sync);
> +
> + if (TestSetPageFsMisc(page))
> + BUG();
> +
> + ret = fscache_write_page(NFS_I(inode)->fscache, page,
> + nfs_readpage_to_fscache_complete,
> + NULL, GFP_KERNEL);
> + dfprintk(FSCACHE,
> + "NFS: "
> + "readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
> + page, page->index, page->flags, ret);
> +
> + if (ret != 0) {
> + fscache_uncache_page(NFS_I(inode)->fscache, page);
> + atomic_inc(&nfs_fscache_uncache_page);
> + ClearPageNfsCached(page);
> + end_page_fs_misc(page);
> + nfs_fscache_to_error = ret;
> + } else {
> + atomic_inc(&nfs_fscache_to_pages);
> + }
> + }
> +}
> +
> +/*
> + * retrieve a page from fscache
> + */
> +extern void nfs_readpage_from_fscache_complete(struct page *, void *, int);
> +
> +static inline
> +int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode,
> + struct page *page)
> +{
> + int ret;
> +
> + if (!NFS_I(inode)->fscache)
> + return 1;
> +
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
> + NFS_I(inode)->fscache, page, page->index, page->flags, inode);
> +
> + ret = fscache_read_or_alloc_page(NFS_I(inode)->fscache,
> + page,
> + nfs_readpage_from_fscache_complete,
> + ctx,
> + GFP_KERNEL);
> +
> + switch (ret) {
> + case 0: /* read BIO submitted (page in fscache) */
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache: BIO submitted\n");
> + atomic_inc(&nfs_fscache_from_pages);
> + return ret;
> +
> + case -ENOBUFS: /* inode not in cache */
> + case -ENODATA: /* page not in cache */
> + dfprintk(FSCACHE,
> + "NFS: readpage_from_fscache error %d\n", ret);
> + return 1;
> +
> + default:
> + dfprintk(FSCACHE, "NFS: readpage_from_fscache %d\n", ret);
> + nfs_fscache_from_error = ret;
> + }
> + return ret;
> +}
> +
> +/*
> + * retrieve a set of pages from fscache
> + */
> +static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + int ret, npages = *nr_pages;
> +
> + if (!NFS_I(inode)->fscache)
> + return 1;
> +
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
> + NFS_I(inode)->fscache, *nr_pages, inode);
> +
> + ret = fscache_read_or_alloc_pages(NFS_I(inode)->fscache,
> + mapping, pages, nr_pages,
> + nfs_readpage_from_fscache_complete,
> + ctx,
> + mapping_gfp_mask(mapping));
> +
> +
> + switch (ret) {
> + case 0: /* read BIO submitted (page in fscache) */
> + BUG_ON(!list_empty(pages));
> + BUG_ON(*nr_pages != 0);
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache: BIO submitted\n");
> +
> + atomic_add(npages, &nfs_fscache_from_pages);
> + return ret;
> +
> + case -ENOBUFS: /* inode not in cache */
> + case -ENODATA: /* page not in cache */
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
> + return 1;
> +
> + default:
> + dfprintk(FSCACHE,
> + "NFS: nfs_getpages_from_fscache: ret %d\n", ret);
> + nfs_fscache_from_error = ret;
> + }
> +
> + return ret;
> +}
> +
> +/*
> + * store an updated page in fscache
> + */
> +extern void nfs_writepage_to_fscache_complete(struct page *page, void *data, int error);
> +
> +static inline void nfs_writepage_to_fscache(struct inode *inode,
> + struct page *page)
> +{
> + int error;
> +
> + if (PageNfsCached(page) && NFS_I(inode)->fscache) {
> + dfprintk(FSCACHE,
> + "NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
> + NFS_I(inode)->fscache, page, inode);
> +
> + error = fscache_write_page(NFS_I(inode)->fscache, page,
> + nfs_writepage_to_fscache_complete,
> + NULL, GFP_KERNEL);
> + if (error != 0) {
> + dfprintk(FSCACHE,
> + "NFS: fscache_write_page error %d\n",
> + error);
> + fscache_uncache_page(NFS_I(inode)->fscache, page);
> + }
> + }
> +}
> +
> +#else /* CONFIG_NFS_FSCACHE */
> +static inline int nfs_fscache_register(void) { return 0; }
> +static inline void nfs_fscache_unregister(void) {}
> +static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
> +static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {}
> +static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
> +static inline const char *nfs_server_fscache_state(struct nfs_server *server) { return "no "; }
> +
> +static inline void nfs_fscache_get_fh_cookie(struct inode *inode, int aycache) {}
> +static inline void nfs_fscache_set_size(struct inode *inode) {}
> +static inline void nfs_fscache_release_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_zap_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_renew_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
> +static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
> +static inline int nfs_fscache_release_page(struct page *page)
> +{
> + return 1; /* True: may release page */
> +}
> +static inline void nfs_fscache_invalidate_page(struct page *page,
> + struct inode *inode,
> + unsigned long offset)
> +{
> +}
> +static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {}
> +static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode, struct page *page)
> +{
> + return -ENOBUFS;
> +}
> +static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
> + struct inode *inode,
> + struct address_space *mapping,
> + struct list_head *pages,
> + unsigned *nr_pages)
> +{
> + return -ENOBUFS;
> +}
> +
> +static inline void nfs_writepage_to_fscache(struct inode *inode, struct page *page)
> +{
> + BUG_ON(PageNfsCached(page));
> +}
> +
> +#endif /* CONFIG_NFS_FSCACHE */
> +#endif /* _NFS_FSCACHE_H */
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 08cc4c5..56acba0 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -84,6 +84,7 @@ void nfs_clear_inode(struct inode *inode
> BUG_ON(atomic_read(&NFS_I(inode)->data_updates) != 0);
> nfs_zap_acl_cache(inode);
> nfs_access_zap_cache(inode);
> + nfs_fscache_release_fh_cookie(inode);
> }

What about nfs4_clear_inode?

> /**
> @@ -129,6 +130,8 @@ void nfs_zap_caches(struct inode *inode)
> spin_lock(&inode->i_lock);
> nfs_zap_caches_locked(inode);
> spin_unlock(&inode->i_lock);
> +
> + nfs_fscache_zap_fh_cookie(inode);

The cache will be zapped upon the next revalidation anyway. and the
whole point of nfs_zap_caches is to allow fast invalidation in contexts
where we cannot sleep. nfs_fscache_zap_fh_cookie calls
fscache_relinquish_cookie(), which sleeps, grabs rw_semaphores, etc.

> }
> void nfs_zap_mapping(struct inode *inode, struct address_space *mapping)
> @@ -216,6 +219,7 @@ nfs_fhget(struct super_block *sb, struct
> };
> struct inode *inode = ERR_PTR(-ENOENT);
> unsigned long hash;
> + int maycache = 1;
>
> if ((fattr->valid & NFS_ATTR_FATTR) == 0)
> goto out_no_inode;
> @@ -264,6 +268,7 @@ nfs_fhget(struct super_block *sb, struct
> else
> inode->i_op = &nfs_mountpoint_inode_operations;
> inode->i_fop = NULL;
> + maycache = 0;
> }
> } else if (S_ISLNK(inode->i_mode))
> inode->i_op = &nfs_symlink_inode_operations;
> @@ -294,6 +299,8 @@ nfs_fhget(struct super_block *sb, struct
> memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
> nfsi->access_cache = RB_ROOT;
>
> + nfs_fscache_get_fh_cookie(inode, maycache);
> +
> unlock_new_inode(inode);
> } else
> nfs_refresh_inode(inode, fattr);
> @@ -376,6 +383,7 @@ void nfs_setattr_update_inode(struct ino
> if ((attr->ia_valid & ATTR_SIZE) != 0) {
> nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
> inode->i_size = attr->ia_size;
> + nfs_fscache_set_size(inode);

Why? Isn't this supposed to be a read-only inode?

> vmtruncate(inode, attr->ia_size);
> }
> }
> @@ -558,6 +566,8 @@ int nfs_open(struct inode *inode, struct
> ctx->mode = filp->f_mode;
> nfs_file_set_open_context(filp, ctx);
> put_nfs_open_context(ctx);
> + if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
> + nfs_fscache_disable_fh_cookie(inode);
> return 0;
> }
>
> @@ -704,6 +714,8 @@ int nfs_revalidate_mapping(struct inode
> spin_unlock(&inode->i_lock);
>
> nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
> + nfs_fscache_renew_fh_cookie(inode);
> +
> dfprintk(PAGECACHE, "NFS: (%s/%Ld) data cache invalidated\n",
> inode->i_sb->s_id,
> (long long)NFS_FILEID(inode));
> @@ -942,11 +954,13 @@ static int nfs_update_inode(struct inode
> if (data_stable) {
> inode->i_size = new_isize;
> invalid |= NFS_INO_INVALID_DATA;
> + nfs_fscache_set_size(inode);
> }
> invalid |= NFS_INO_INVALID_ATTR;
> } else if (new_isize > cur_isize) {
> inode->i_size = new_isize;
> invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
> + nfs_fscache_set_size(inode);

Doesn't nfs_fscache_set_size try to grab rw_semaphores? This function is
_always_ called with the inode->i_lock spinlock held.

> }
> nfsi->cache_change_attribute = jiffies;
> dprintk("NFS: isize change on server for file %s/%ld\n",
> @@ -1158,6 +1172,10 @@ static int __init init_nfs_fs(void)
> {
> int err;
>
> + err = nfs_fscache_register();
> + if (err < 0)
> + goto out6;
> +
> err = nfs_fs_proc_init();
> if (err)
> goto out5;
> @@ -1204,6 +1222,8 @@ out3:
> out4:
> nfs_fs_proc_exit();
> out5:
> + nfs_fscache_unregister();
> +out6:
> return err;
> }
>
> @@ -1214,6 +1234,7 @@ static void __exit exit_nfs_fs(void)
> nfs_destroy_readpagecache();
> nfs_destroy_inodecache();
> nfs_destroy_nfspagecache();
> + nfs_fscache_unregister();
> #ifdef CONFIG_PROC_FS
> rpc_proc_unregister("nfs");
> #endif
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index d205466..51b82d1 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -4,6 +4,30 @@
>
> #include <linux/mount.h>
>
> +#define NFS_PAGE_WRITING 0
> +#define NFS_PAGE_CACHED 1
> +
> +#define PageNfsBit(bit, page) test_bit(bit, &(page)->private)
> +
> +#define SetPageNfsBit(bit, page) \
> +do { \
> + SetPagePrivate((page)); \
> + set_bit(bit, &(page)->private); \
> +} while(0)
> +
> +#define ClearPageNfsBit(bit, page) \
> +do { \
> + clear_bit(bit, &(page)->private); \
> +} while(0)
> +
> +#define PageNfsWriting(page) PageNfsBit(NFS_PAGE_WRITING, (page))
> +#define SetPageNfsWriting(page) SetPageNfsBit(NFS_PAGE_WRITING, (page))
> +#define ClearPageNfsWriting(page) ClearPageNfsBit(NFS_PAGE_WRITING, (page))
> +
> +#define PageNfsCached(page) PageNfsBit(NFS_PAGE_CACHED, (page))
> +#define SetPageNfsCached(page) SetPageNfsBit(NFS_PAGE_CACHED, (page))
> +#define ClearPageNfsCached(page) ClearPageNfsBit(NFS_PAGE_CACHED, (page))
> +
> struct nfs_string;
> struct nfs_mount_data;
> struct nfs4_mount_data;
> @@ -27,6 +51,11 @@ struct nfs_clone_mount {
> rpc_authflavor_t authflavor;
> };
>
> +/*
> + * include filesystem caching stuff here
> + */
> +#include "fscache.h"
> +
> /* client.c */
> extern struct rpc_program nfs_program;
>
> @@ -153,6 +182,9 @@ extern int nfs4_path_walk(struct nfs_ser
> const char *path);
> #endif
>
> +/* read.c */
> +extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, struct page *);
> +
> /*
> * Determine the device name as a string
> */
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 829af32..a40c052 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -17,6 +17,7 @@ #include <linux/nfs4.h>
> #include <linux/nfs_page.h>
> #include <linux/nfs_fs.h>
> #include <linux/nfs_mount.h>
> +#include "internal.h"
>
> #define NFS_PARANOIA 1
>
> @@ -84,7 +85,7 @@ nfs_create_request(struct nfs_open_conte
> atomic_set(&req->wb_complete, 0);
> req->wb_index = page->index;
> page_cache_get(page);
> - BUG_ON(PagePrivate(page));
> + BUG_ON(PageNfsWriting(page));
> BUG_ON(!PageLocked(page));
> BUG_ON(page->mapping->host != inode);
> req->wb_offset = offset;
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index c2e49c3..d8e4b3b 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -26,11 +26,13 @@ #include <linux/pagemap.h>
> #include <linux/sunrpc/clnt.h>
> #include <linux/nfs_fs.h>
> #include <linux/nfs_page.h>
> +#include <linux/nfs_mount.h>
> #include <linux/smp_lock.h>
>
> #include <asm/system.h>
>
> #include "iostat.h"
> +#include "internal.h"
>
> #define NFSDBG_FACILITY NFSDBG_PAGECACHE
>
> @@ -211,13 +213,18 @@ static int nfs_readpage_sync(struct nfs_
> }
> result = 0;
>
> + nfs_readpage_to_fscache(inode, page, 1);
> + unlock_page(page);
> +
> + return result;
> +
> io_error:
> unlock_page(page);
> nfs_readdata_free(rdata);
> return result;
> }
>
> -static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> +int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
> struct page *page)
> {
> LIST_HEAD(one_request);
> @@ -242,6 +249,11 @@ static int nfs_readpage_async(struct nfs
>
> static void nfs_readpage_release(struct nfs_page *req)
> {
> + struct inode *d_inode = req->wb_context->dentry->d_inode;
> +
> + if (PageUptodate(req->wb_page))
> + nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
> +

Will usually be called from an rpciod context. Should therefore not be
grabbing semaphores, doing memory allocation etc.

> unlock_page(req->wb_page);
>
> dprintk("NFS: read done (%s/%Ld %d@%Ld)\n",
> @@ -633,6 +645,10 @@ int nfs_readpage(struct file *file, stru
> ctx = get_nfs_open_context((struct nfs_open_context *)
> file->private_data);
> if (!IS_SYNC(inode)) {
> + error = nfs_readpage_from_fscache(ctx, inode, page);
> + if (error == 0)
> + goto out;
> +
> error = nfs_readpage_async(ctx, inode, page);
> goto out;
> }
> @@ -663,6 +679,7 @@ readpage_async_filler(void *data, struct
> unsigned int len;
>
> nfs_wb_page(inode, page);
> +
> len = nfs_page_length(inode, page);
> if (len == 0)
> return nfs_return_empty_page(page);
> @@ -705,6 +722,17 @@ int nfs_readpages(struct file *filp, str
> } else
> desc.ctx = get_nfs_open_context((struct nfs_open_context *)
> filp->private_data);
> +
> + /* attempt to read as many of the pages as possible from the cache
> + * - this returns -ENOBUFS immediately if the cookie is negative
> + */
> + ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
> + pages, &nr_pages);
> + if (ret == 0) {
> + put_nfs_open_context(desc.ctx);
> + return ret; /* all read */
> + }
> +
> ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
> if (!list_empty(&head)) {
> int err = nfs_pagein_list(&head, server->rpages);
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 28108c8..59b0c33 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -290,6 +290,7 @@ static void nfs_show_mount_options(struc
> { NFS_MOUNT_NOAC, ",noac", "" },
> { NFS_MOUNT_NONLM, ",nolock", "" },
> { NFS_MOUNT_NOACL, ",noacl", "" },
> + { NFS_MOUNT_FSCACHE, ",fsc", "" },
> { 0, NULL, NULL }
> };
> const struct proc_nfs_info *nfs_infop;
> diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
> index 3ea50ac..251af9b 100644
> --- a/fs/nfs/sysctl.c
> +++ b/fs/nfs/sysctl.c
> @@ -14,6 +14,7 @@ #include <linux/nfs_idmap.h>
> #include <linux/nfs_fs.h>
>
> #include "callback.h"
> +#include "internal.h"
>
> static const int nfs_set_port_min = 0;
> static const int nfs_set_port_max = 65535;
> @@ -50,6 +51,48 @@ #endif
> .proc_handler = &proc_dointvec_jiffies,
> .strategy = &sysctl_jiffies,
> },
> +#ifdef CONFIG_NFS_FSCACHE
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_from_error",
> + .data = &nfs_fscache_from_error,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_to_error",
> + .data = &nfs_fscache_to_error,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_uncache_page",
> + .data = &nfs_fscache_uncache_page,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_to_pages",
> + .data = &nfs_fscache_to_pages,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec_minmax,
> + },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "fscache_from_pages",
> + .data = &nfs_fscache_from_pages,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> +#endif
> { .ctl_name = 0 }
> };
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 883dd4a..77d0d9d 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -64,6 +64,7 @@ #include <linux/smp_lock.h>
>
> #include "delegation.h"
> #include "iostat.h"
> +#include "internal.h"
>
> #define NFSDBG_FACILITY NFSDBG_PAGECACHE
>
> @@ -157,6 +158,9 @@ static void nfs_grow_file(struct page *p
> return;
> nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
> i_size_write(inode, end);
> +#ifdef FSCACHE_WRITE_SUPPORT
> + nfs_set_fscsize(NFS_SERVER(inode), NFS_I(inode), end);
> +#endif
> }
>
> /* We can set the PG_uptodate flag if we see that a write request
> @@ -336,6 +340,9 @@ do_it:
> err = -EBADF;
> goto out;
> }
> +
> + nfs_writepage_to_fscache(inode, page);
> +

Why are we doing this, if the cache is turned off whenever the file is
open for writes?

> lock_kernel();
> if (!IS_SYNC(inode) && inode_referenced) {
> err = nfs_writepage_async(ctx, inode, page, 0, offset);
> @@ -419,7 +426,7 @@ static int nfs_inode_add_request(struct
> if (nfs_have_delegation(inode, FMODE_WRITE))
> nfsi->change_attr++;
> }
> - SetPagePrivate(req->wb_page);
> + SetPageNfsWriting(req->wb_page);
> nfsi->npages++;
> atomic_inc(&req->wb_count);
> return 0;
> @@ -436,7 +443,7 @@ static void nfs_inode_remove_request(str
> BUG_ON (!NFS_WBACK_BUSY(req));
>
> spin_lock(&nfsi->req_lock);
> - ClearPagePrivate(req->wb_page);
> + ClearPageNfsWriting(req->wb_page);
> radix_tree_delete(&nfsi->nfs_page_tree, req->wb_index);
> nfsi->npages--;
> if (!nfsi->npages) {
> diff --git a/include/linux/nfs4_mount.h b/include/linux/nfs4_mount.h
> index 26b4c83..15199cc 100644
> --- a/include/linux/nfs4_mount.h
> +++ b/include/linux/nfs4_mount.h
> @@ -65,6 +65,7 @@ #define NFS4_MOUNT_INTR 0x0002 /* 1 */
> #define NFS4_MOUNT_NOCTO 0x0010 /* 1 */
> #define NFS4_MOUNT_NOAC 0x0020 /* 1 */
> #define NFS4_MOUNT_STRICTLOCK 0x1000 /* 1 */
> +#define NFS4_MOUNT_FSCACHE 0x4000 /* 1 */
> #define NFS4_MOUNT_FLAGMASK 0xFFFF
>
> #endif
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 45228c1..5ead2bf 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -182,6 +182,9 @@ #ifdef CONFIG_NFS_V4
> int delegation_state;
> struct rw_semaphore rwsem;
> #endif /* CONFIG_NFS_V4*/
> +#ifdef CONFIG_NFS_FSCACHE
> + struct fscache_cookie *fscache;
> +#endif
> struct inode vfs_inode;
> };
>
> @@ -582,6 +585,7 @@ #define NFSDBG_FILE 0x0040
> #define NFSDBG_ROOT 0x0080
> #define NFSDBG_CALLBACK 0x0100
> #define NFSDBG_CLIENT 0x0200
> +#define NFSDBG_FSCACHE 0x0400
> #define NFSDBG_ALL 0xFFFF
>
> #ifdef __KERNEL__
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 7ccfc7e..c44be53 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -3,6 +3,7 @@ #define _NFS_FS_SB
>
> #include <linux/list.h>
> #include <linux/backing-dev.h>
> +#include <linux/fscache.h>
>
> struct nfs_iostats;
>
> @@ -67,6 +68,10 @@ #ifdef CONFIG_NFS_V4
> char cl_ipaddr[16];
> unsigned char cl_id_uniquifier;
> #endif
> +
> +#ifdef CONFIG_NFS_FSCACHE
> + struct fscache_cookie *fscache; /* client index cache cookie */
> +#endif
> };
>
> /*
> diff --git a/include/linux/nfs_mount.h b/include/linux/nfs_mount.h
> index 659c754..278bb4e 100644
> --- a/include/linux/nfs_mount.h
> +++ b/include/linux/nfs_mount.h
> @@ -61,6 +61,7 @@ #define NFS_MOUNT_BROKEN_SUID 0x0400 /*
> #define NFS_MOUNT_NOACL 0x0800 /* 4 */
> #define NFS_MOUNT_STRICTLOCK 0x1000 /* reserved for NFSv4 */
> #define NFS_MOUNT_SECFLAVOUR 0x2000 /* 5 */
> +#define NFS_MOUNT_FSCACHE 0x4000
> #define NFS_MOUNT_FLAGMASK 0xFFFF
>
> #endif

2006-11-15 15:52:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 19/19] CacheFiles: Permit daemon to probe inuseness of a cache file

On Tue, Nov 14, 2006 at 08:07:02PM +0000, David Howells wrote:
> Permit the daemon to probe to see whether a cache file is in use by a netfs or
> not.
>
> Signed-Off-By: David Howells <[email protected]>
> ---
>
> fs/cachefiles/cf-daemon.c | 73 +++++++++++++++++++
> fs/cachefiles/cf-namei.c | 170 +++++++++++++++++++++++++++++++++++++++++++++
> fs/cachefiles/internal.h | 3 +
> 3 files changed, 246 insertions(+), 0 deletions(-)
>
> diff --git a/fs/cachefiles/cf-daemon.c b/fs/cachefiles/cf-daemon.c
> index ae82685..ee07865 100644
> --- a/fs/cachefiles/cf-daemon.c
> +++ b/fs/cachefiles/cf-daemon.c
> @@ -38,6 +38,7 @@ static int cachefiles_daemon_cull(struct
> static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args);
> static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args);
> static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args);
> +static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args);
>
> static unsigned long cachefiles_open;
>
> @@ -66,6 +67,7 @@ static const struct cachefiles_daemon_cm
> { "frun", cachefiles_daemon_frun },
> { "fcull", cachefiles_daemon_fcull },
> { "fstop", cachefiles_daemon_fstop },
> + { "inuse", cachefiles_daemon_inuse },
> { "tag", cachefiles_daemon_tag },
> { "", NULL }
> };
> @@ -602,3 +604,74 @@ inval:
> kerror("debug command requires mask");
> return -EINVAL;
> }
> +
> +/*
> + * find out whether an object is in use or not
> + * - command: "inuse <dirfd> <name>"
> + */
> +static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
> +{
> + struct dentry *dir;
> + struct file *dirfile;
> + uid_t fsuid;
> + gid_t fsgid;
> + u32 fscreatesid;
> + int dirfd, fput_needed, ret;
> +
> + _enter(",%s", args);
> +
> + dirfd = simple_strtoul(args, &args, 0);
> +
> + if (!isspace(*args))
> + goto inval;
> +
> + while (isspace(*args))
> + args++;
> +
> + if (!*args)
> + goto inval;
> +
> + if (strchr(args, '/'))
> + goto inval;
> +
> + if (!test_bit(CACHEFILES_READY, &cache->flags)) {
> + kerror("inuse applied to unready cache");
> + return -EIO;
> + }
> +
> + if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
> + kerror("inuse applied to dead cache");
> + return -EIO;
> + }
> +
> + /* extract the directory dentry from the fd */
> + dirfile = fget_light(dirfd, &fput_needed);

Once again a very strong NACK for anything that gets a fd argument as
text from userspace. Also a very strong NACK for use of fget/fget_light
from non-core code and exports for either of them.

2006-11-15 16:02:54

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 05/19] NFS: Use local caching

Trond Myklebust <[email protected]> wrote:

> Why is fscache being given a vote on whether or not the NFS page can be
> removed from the mapping? If the file has changed on the server, so that
> we have to invalidate the mapping, then I don't care about the fact that
> fscache is busy: the page has to go.

This is releasepage() not invalidatepage(). It is conditional.

At this point you can't get rid of the page if FS-Cache is still using it
because FS-Cache will call the netfs callback on the page when it has finished.

You also can't cancel the I/O because it may involve a BIO which itself can't
be cancelled.

You may not be able to sleep to wait for FS-Cache to finish because gfp might
not include __GFP_WAIT.

The whole point is to find out whether a page is releasable and recycle it if
it is, not to force it to be released; for that, invalidatepage() exists.

In my opinion, it is better to tell the VM that this page is not currently
available, and let it get on with trying to find one that is rather than
holding up the page allocator until the page becomes available.

> You are missing the NFSv4 change attribute. The latter is supposed to
> override mtime/ctime/size concerns in NFSv4.

Is that stored in the inode? I don't recall offhand. It's easy enough to add
if it is.

> > @@ -84,6 +84,7 @@ void nfs_clear_inode(struct inode *inode
> ...
> What about nfs4_clear_inode?

It calls nfs_clear_inode()...

> > + nfs_fscache_zap_fh_cookie(inode);
>
> The cache will be zapped upon the next revalidation anyway. and the
> whole point of nfs_zap_caches is to allow fast invalidation in contexts
> where we cannot sleep. nfs_fscache_zap_fh_cookie calls
> fscache_relinquish_cookie(), which sleeps, grabs rw_semaphores, etc.

Okay... It sounds like I should be able to drop that call there.

Perhaps you should add a comment to that function to note this...

> > @@ -376,6 +383,7 @@ void nfs_setattr_update_inode(struct ino
> > if ((attr->ia_valid & ATTR_SIZE) != 0) {
> > nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
> > inode->i_size = attr->ia_size;
> > + nfs_fscache_set_size(inode);
>
> Why? Isn't this supposed to be a read-only inode?

I suppose. This is a holdover from when I supported R/W inodes too.

> > @@ -942,11 +954,13 @@ static int nfs_update_inode(struct inode
> > ...
> > + nfs_fscache_set_size(inode);
>
> Doesn't nfs_fscache_set_size try to grab rw_semaphores? This function is
> _always_ called with the inode->i_lock spinlock held.

Hmmm... I wonder if I need to do this in nfs_update_inode() at all. Won't the
pages and the cache object attached to an inode be discarded anyway if the file
attributes returned by the server change?

When can an inode be left with its data attached when modified on the server?
Is this an NFSv4 thing?

> > static void nfs_readpage_release(struct nfs_page *req)
> > {
> > + struct inode *d_inode = req->wb_context->dentry->d_inode;
> > +
> > + if (PageUptodate(req->wb_page))
> > + nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
> > +
>
> Will usually be called from an rpciod context. Should therefore not be
> grabbing semaphores, doing memory allocation etc.

Is it possible to make an NFS kernel thread that can have completed nfs_page
structs queued for writing to the cache?

> > +
> > + nfs_writepage_to_fscache(inode, page);
> > +
>
> Why are we doing this, if the cache is turned off whenever the file is
> open for writes?

Good point again; for the moment, this can be discarded - though we could do it
for NFS4 under some circumstances, I believe.

David

2006-11-15 16:12:36

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 19/19] CacheFiles: Permit daemon to probe inuseness of a cache file

Christoph Hellwig <[email protected]> wrote:

> Once again a very strong NACK for anything that gets a fd argument as
> text from userspace.

Why? Would you rather I passed it in a struct to an ioctl?

> Also a very strong NACK for use of fget/fget_light from non-core code and
> exports for either of them.

Why?

I could possibly pass the pathname as text, except that (a) the path length
may exceed PAGE_SIZE and MAXPATHLEN, and (b) doing a full path lookup() is a
complete waste of time as the daemon already has the directory open on a file
descriptor, and so effectively has a bookmark to the location that I can use,
if I can but get hold of it.

David

2006-11-15 16:19:08

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

On Wed, 15 Nov 2006, David Howells wrote:

> James Morris <[email protected]> wrote:
>
> > > +static u32 selinux_set_fscreate_secid(u32 secid)
> > ...
> > The ability to set this needs to be mediated via MAC policy.
>
> There could a problem with that... Is it possible for there to be a race?

Well, the value can be changed at any time, so you could be using a
temporary fscreate value, or your new value could be overwritten
immediately by writing to /proc/$$/attr/fscreate

I think we need to add a separate field for this purpose, which can only
be written to via the in-kernel API and overrides fscreate.

> I have to call the function twice per cache op: once to set the file
> creation security ID and once to restore it back to what it was.
>
> However, what happens if I can't restore the original security ID (perhaps the
> rules changed between the two invocations)? I can't let the task continue as
> it's now running with the wrong security...

Kill the task?



- James
--
James Morris
<[email protected]>

2006-11-15 16:22:59

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

On Wed, 15 Nov 2006, David Howells wrote:

> James Morris <[email protected]> wrote:
>
> > The ability to set this needs to be mediated via MAC policy.
>
> Something like this, you mean?

Yes, although perhaps writing to tsec->kern_create_sid or similar, which
then overrides tsec->create_sid if set. Also need
/proc/pid/attr/kern_fscreate as a read only node.


> + error = task_has_perm(current, current, PROCESS__SETFSCREATE);

I wonder if we also need 'relabelto' and 'relabelfrom' permissions, to
control which labels are being used.



- James
--
James Morris
<[email protected]>

2006-11-15 16:25:58

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

James Morris <[email protected]> wrote:

> Well, the value can be changed at any time, so you could be using a
> temporary fscreate value, or your new value could be overwritten
> immediately by writing to /proc/$$/attr/fscreate

Ah. Hmmm. By whom? In selinux_setprocattr():

if (current != p) {
/* SELinux only allows a process to change its own
security attributes. */
return -EACCES;
}

But current busy inside the cache and can't do this.

> I think we need to add a separate field for this purpose, which can only
> be written to via the in-kernel API and overrides fscreate.

So, like my acts-as security ID patch?

Would it still need to be controlled by MAC policy in that case? Doing so is
a bit of a pain as it means I have a whole bunch of extra failures I still
need to check for, and the race in which the rules might change is still a
possibility I have to deal with.

David

2006-11-15 16:45:19

by David Howells

[permalink] [raw]
Subject: [PATCH 23/19] FS-Cache: NFS: Don't invoke FS-Cache from nfs_zap_caches()


Don't invoke FS-Cache from nfs_zap_caches() as that is supposed to be quick
whereas FS-Cache may get semaphores and other sleepy things.

As it happens, the cache will be zapped upon the next revalidation anyway, and
so this is probably unnecessary.

Signed-Off-By: David Howells <[email protected]>
---

fs/nfs/inode.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 0d683eb..25376a5 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -130,8 +130,6 @@ void nfs_zap_caches(struct inode *inode)
spin_lock(&inode->i_lock);
nfs_zap_caches_locked(inode);
spin_unlock(&inode->i_lock);
-
- nfs_fscache_zap_fh_cookie(inode);
}

void nfs_zap_mapping(struct inode *inode, struct address_space *mapping)

2006-11-15 16:53:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 05/19] NFS: Use local caching

On Wed, 2006-11-15 at 16:00 +0000, David Howells wrote:
> Trond Myklebust <[email protected]> wrote:
>
> > Why is fscache being given a vote on whether or not the NFS page can be
> > removed from the mapping? If the file has changed on the server, so that
> > we have to invalidate the mapping, then I don't care about the fact that
> > fscache is busy: the page has to go.
>
> This is releasepage() not invalidatepage(). It is conditional.

...and invalidate_complete_page2() calls try_to_release_page() which
again calls releasepage(). Success or failure of the latter should
therefore not depend on the internal fscache state.

> > > @@ -942,11 +954,13 @@ static int nfs_update_inode(struct inode
> > > ...
> > > + nfs_fscache_set_size(inode);
> >
> > Doesn't nfs_fscache_set_size try to grab rw_semaphores? This function is
> > _always_ called with the inode->i_lock spinlock held.
>
> Hmmm... I wonder if I need to do this in nfs_update_inode() at all. Won't the
> pages and the cache object attached to an inode be discarded anyway if the file
> attributes returned by the server change?

In the case of a read-only file, yes. That is not true of a read/write
file.

> > > static void nfs_readpage_release(struct nfs_page *req)
> > > {
> > > + struct inode *d_inode = req->wb_context->dentry->d_inode;
> > > +
> > > + if (PageUptodate(req->wb_page))
> > > + nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
> > > +
> >
> > Will usually be called from an rpciod context. Should therefore not be
> > grabbing semaphores, doing memory allocation etc.
>
> Is it possible to make an NFS kernel thread that can have completed nfs_page
> structs queued for writing to the cache?

Why should we add extra context switches for the non-fscache case? Just
move the call to nfs_readpage_to_fscache into its own kernel thread.

Trond

2006-11-15 16:54:08

by David Howells

[permalink] [raw]
Subject: [PATCH 24/19] FS-Cache: NFS: Remove old support for R/W caching


Remove old support for caching of files that are opened for writing. This is
not currently supported, and so the bits that enabled it are currently useless.

Signed-Off-By: David Howells <[email protected]>
---

fs/nfs/fscache.c | 11 -----------
fs/nfs/fscache.h | 32 --------------------------------
fs/nfs/inode.c | 1 -
fs/nfs/write.c | 3 ---
4 files changed, 0 insertions(+), 47 deletions(-)

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 81286f6..6bdd1f2 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -334,14 +334,3 @@ void nfs_readpage_from_fscache_complete(
unlock_page(page);
}
}
-
-/*
- * handle completion of a page being read from the cache
- * - really need to synchronise the end of writeback, probably using a page
- * flag, but for the moment we disable caching on writable files
- */
-void nfs_writepage_to_fscache_complete(struct page *page,
- void *data,
- int error)
-{
-}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index b82b896..92c2dbf 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -425,33 +425,6 @@ static inline int nfs_readpages_from_fsc
return ret;
}

-/*
- * store an updated page in fscache
- */
-extern void nfs_writepage_to_fscache_complete(struct page *page, void *data, int error);
-
-static inline void nfs_writepage_to_fscache(struct inode *inode,
- struct page *page)
-{
- int error;
-
- if (PageNfsCached(page) && NFS_I(inode)->fscache) {
- dfprintk(FSCACHE,
- "NFS: writepage_to_fscache (0x%p/0x%p/0x%p)\n",
- NFS_I(inode)->fscache, page, inode);
-
- error = fscache_write_page(NFS_I(inode)->fscache, page,
- nfs_writepage_to_fscache_complete,
- NULL, GFP_KERNEL);
- if (error != 0) {
- dfprintk(FSCACHE,
- "NFS: fscache_write_page error %d\n",
- error);
- fscache_uncache_page(NFS_I(inode)->fscache, page);
- }
- }
-}
-
#else /* CONFIG_NFS_FSCACHE */
static inline int nfs_fscache_register(void) { return 0; }
static inline void nfs_fscache_unregister(void) {}
@@ -493,10 +466,5 @@ static inline int nfs_readpages_from_fsc
return -ENOBUFS;
}

-static inline void nfs_writepage_to_fscache(struct inode *inode, struct page *page)
-{
- BUG_ON(PageNfsCached(page));
-}
-
#endif /* CONFIG_NFS_FSCACHE */
#endif /* _NFS_FSCACHE_H */
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 25376a5..3409448 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -381,7 +381,6 @@ void nfs_setattr_update_inode(struct ino
if ((attr->ia_valid & ATTR_SIZE) != 0) {
nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
inode->i_size = attr->ia_size;
- nfs_fscache_set_size(inode);
vmtruncate(inode, attr->ia_size);
}
}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 77d0d9d..a2e0570 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -340,9 +340,6 @@ do_it:
err = -EBADF;
goto out;
}
-
- nfs_writepage_to_fscache(inode, page);
-
lock_kernel();
if (!IS_SYNC(inode) && inode_referenced) {
err = nfs_writepage_async(ctx, inode, page, 0, offset);

2006-11-15 17:10:23

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 05/19] NFS: Use local caching

Trond Myklebust <[email protected]> wrote:

> > This is releasepage() not invalidatepage(). It is conditional.
>
> ...and invalidate_complete_page2() calls try_to_release_page() which
> again calls releasepage(). Success or failure of the latter should
> therefore not depend on the internal fscache state.

Okay... invalidate_complete_page2() passes __GFP_WAIT through. I'll make
nfs_fscache_release_page() check for that, and if it's set, it'll wait for
FS-Cache to finish with the page before returning true.

This sounds like invalidate_inode_pages2() is doing the wrong thing. After
all, releasepage() _is_ conditional. It sounds like it should be calling
invalidatepage() instead. Either that or NFS should be calling something
else entirely.

> > Hmmm... I wonder if I need to do this in nfs_update_inode() at all.
> > Won't the pages and the cache object attached to an inode be discarded
> > anyway if the file attributes returned by the server change?
>
> In the case of a read-only file, yes. That is not true of a read/write
> file.

So I can assume that, as we're only caching read-only files, I don't need to
invoke FS-Cache here.

> > > > static void nfs_readpage_release(struct nfs_page *req)
> > > > {
> > > > + struct inode *d_inode = req->wb_context->dentry->d_inode;
> > > > +
> > > > + if (PageUptodate(req->wb_page))
> > > > + nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
> > > > +
> > >
> > > Will usually be called from an rpciod context. Should therefore not be
> > > grabbing semaphores, doing memory allocation etc.
> >
> > Is it possible to make an NFS kernel thread that can have completed nfs_page
> > structs queued for writing to the cache?
>
> Why should we add extra context switches for the non-fscache case? Just
> move the call to nfs_readpage_to_fscache into its own kernel thread.

Sorry, I meant can I make use of the nfs_page struct that was handed to
nfs_readpage_release() by queuing it for an auxiliary thread to call
nfs_readpage_to_fscache() on? I didn't intend to call nfs_readpage_release()
in another thread.

After all, nfs_readpage_release() would ordinarily just clear and release it.

David

2006-11-15 17:24:53

by David Howells

[permalink] [raw]
Subject: [PATCH 25/19] FS-Cache: NFS: Wait in releasepage() if FS-Cache is busy and __GFP_WAIT is set


Make NFS wait in its releasepage() op if FS-Cache is busy with a page and
__GFP_WAIT was supplied in the gfp parameter rather than returning false to the
VM.

Signed-Off-By: David Howells <[email protected]>
---

fs/nfs/file.c | 2 +-
fs/nfs/fscache.h | 9 ++++++---
2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 9da03ec..6ac3ac7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -350,7 +350,7 @@ static int nfs_release_page(struct page
if (nfs_wb_page(page->mapping->host, page) < 0)
return 0;

- if (nfs_fscache_release_page(page) < 0)
+ if (nfs_fscache_release_page(page, gfp) < 0)
return 0;

/* PG_private may have been set due to either caching or writing */
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 92c2dbf..c363421 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -238,10 +238,13 @@ static inline void nfs_fscache_install_v
* release the caching state associated with a page, if the page isn't busy
* interacting with the cache
*/
-static inline int nfs_fscache_release_page(struct page *page)
+static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
- if (PageFsMisc(page))
- return -EBUSY;
+ if (PageFsMisc(page)) {
+ if (!(gfp & __GFP_WAIT))
+ return -EBUSY;
+ wait_on_page_fs_misc(page);
+ }

if (PageNfsCached(page)) {
struct nfs_inode *nfsi = NFS_I(page->mapping->host);

2006-11-15 17:52:08

by Karl MacMillan

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

David Howells wrote:
> James Morris <[email protected]> wrote:
>
>> Well, the value can be changed at any time, so you could be using a
>> temporary fscreate value, or your new value could be overwritten
>> immediately by writing to /proc/$$/attr/fscreate
>
> Ah. Hmmm. By whom? In selinux_setprocattr():
>
> if (current != p) {
> /* SELinux only allows a process to change its own
> security attributes. */
> return -EACCES;
> }
>
> But current busy inside the cache and can't do this.
>
>> I think we need to add a separate field for this purpose, which can only
>> be written to via the in-kernel API and overrides fscreate.
>
> So, like my acts-as security ID patch?
>
> Would it still need to be controlled by MAC policy in that case?

Yes - if we are going to perform some MAC checks for this kernel process
we need to have all checks performed.

Doing so is
> a bit of a pain as it means I have a whole bunch of extra failures I still
> need to check for,

This is true for going this route in general rather than simply
bypassing MAC. I don't think halfway makes any sense.

and the race in which the rules might change is still a
> possibility I have to deal with.
>

I don't think this is a race, it is revocation of access. If you check
the access at every operation and correctly deal with access failures,
then this shouldn't be a problem. Yes it is a pain, but that is how
SELinux is supposed to work.

Karl

2006-11-15 17:53:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 05/19] NFS: Use local caching

On Wed, 2006-11-15 at 17:07 +0000, David Howells wrote:
> Trond Myklebust <[email protected]> wrote:
>
> > > This is releasepage() not invalidatepage(). It is conditional.
> >
> > ...and invalidate_complete_page2() calls try_to_release_page() which
> > again calls releasepage(). Success or failure of the latter should
> > therefore not depend on the internal fscache state.
>
> Okay... invalidate_complete_page2() passes __GFP_WAIT through. I'll make
> nfs_fscache_release_page() check for that, and if it's set, it'll wait for
> FS-Cache to finish with the page before returning true.
>
> This sounds like invalidate_inode_pages2() is doing the wrong thing. After
> all, releasepage() _is_ conditional. It sounds like it should be calling
> invalidatepage() instead. Either that or NFS should be calling something
> else entirely.

No! invalidate_inode_pages2() is supposed to be non-destructive w.r.t.
dirty pages. That is why it calls try_to_release_page(). If you want a
destructive truncate, then you should be calling truncate_inode_pages().

> > > Hmmm... I wonder if I need to do this in nfs_update_inode() at all.
> > > Won't the pages and the cache object attached to an inode be discarded
> > > anyway if the file attributes returned by the server change?
> >
> > In the case of a read-only file, yes. That is not true of a read/write
> > file.
>
> So I can assume that, as we're only caching read-only files, I don't need to
> invoke FS-Cache here.

I would assume so.

> > > > > static void nfs_readpage_release(struct nfs_page *req)
> > > > > {
> > > > > + struct inode *d_inode = req->wb_context->dentry->d_inode;
> > > > > +
> > > > > + if (PageUptodate(req->wb_page))
> > > > > + nfs_readpage_to_fscache(d_inode, req->wb_page, 0);
> > > > > +
> > > >
> > > > Will usually be called from an rpciod context. Should therefore not be
> > > > grabbing semaphores, doing memory allocation etc.
> > >
> > > Is it possible to make an NFS kernel thread that can have completed nfs_page
> > > structs queued for writing to the cache?
> >
> > Why should we add extra context switches for the non-fscache case? Just
> > move the call to nfs_readpage_to_fscache into its own kernel thread.
>
> Sorry, I meant can I make use of the nfs_page struct that was handed to
> nfs_readpage_release() by queuing it for an auxiliary thread to call
> nfs_readpage_to_fscache() on? I didn't intend to call nfs_readpage_release()
> in another thread.

If you do, then you will either have to get rid of the call to
nfs_clear_request(), or track the struct page yourself.

At this point, I think that removing the call to nfs_clear_request is
safe: there should be nothing that checks page_count() in the mm layer
that we can race with AFAICS.

Cheers,
Trond

2006-11-15 17:54:46

by Karl MacMillan

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

James Morris wrote:
> On Wed, 15 Nov 2006, David Howells wrote:
>
>> James Morris <[email protected]> wrote:
>>
>>> The ability to set this needs to be mediated via MAC policy.
>> Something like this, you mean?
>
> Yes, although perhaps writing to tsec->kern_create_sid or similar, which
> then overrides tsec->create_sid if set. Also need
> /proc/pid/attr/kern_fscreate as a read only node.
>
>
>> + error = task_has_perm(current, current, PROCESS__SETFSCREATE);
>
> I wonder if we also need 'relabelto' and 'relabelfrom' permissions, to
> control which labels are being used.
>

No - assuming the existing checks are called, the controls on
file/dir/etc creation should be sufficient to control which labels are
used. Setting fscreate is not a relabel operation nor does it result in
a relabel operation as the sid is only used for creation.

Karl


2006-11-15 18:24:33

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

Karl MacMillan <[email protected]> wrote:

> > and the race in which the rules might change is still a
> > possibility I have to deal with.
>
> I don't think this is a race, it is revocation of access. If you check the
> access at every operation and correctly deal with access failures, then this
> shouldn't be a problem. Yes it is a pain, but that is how SELinux is supposed
> to work.

Yes, but what is the correct method of dealing with a failure? All I can think
of is to SIGKILL the process.

David

2006-11-15 19:11:37

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

David Howells <[email protected]> wrote:

> > I think we need to add a separate field for this purpose, which can only
> > be written to via the in-kernel API and overrides fscreate.
>
> So, like my acts-as security ID patch?

How about this then?

I haven't removed the old fscreate overriding patch yet, not have I put in the
error handling in CacheFiles.

And whilst selinux_fscreate_as_secid() does perform a MAC check, I think that
PROCESS__SETFSCREATE is probably the wroing thing to use. I think there should
be a PROCESS__SETFSCREATEAS or similar. I assume that doing that would require
the userspace policy compiler to be modified.

David
---

include/linux/security.h | 35 ++++++++++++++++++++++++++++++++
security/dummy.c | 14 +++++++++++++
security/selinux/hooks.c | 40 +++++++++++++++++++++++++++++++------
security/selinux/include/objsec.h | 1 +
fs/cachefiles/internal.h | 7 +++---
5 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 8cfeefc..33a20f9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1171,6 +1171,17 @@ #ifdef CONFIG_SECURITY
* owning security ID, and return the security ID as which the process was
* previously acting.
*
+ * @fscreate_as_secid:
+ * Set the security ID as which to create files, returning the security ID
+ * as which the process was previously creating files.
+ * @secid contains the security ID to act as.
+ * @oldsecid points to where the old security ID will be placed (or NULL).
+ *
+ * @fscreate_as_self:
+ * Reset the security ID as which to create files to be the same as the
+ * process's own creation security ID, and return the security ID as which
+ * the process was previously creating files.
+ *
* @cachefiles_get_secid:
* Determine the security ID for the CacheFiles module to use when
* accessing the filesystem containing the cache.
@@ -1366,6 +1377,8 @@ struct security_operations {
u32 (*set_fscreate_secid)(u32 secid);
u32 (*act_as_secid)(u32 secid);
u32 (*act_as_self)(void);
+ int (*fscreate_as_secid)(u32 secid, u32 *oldsecid);
+ u32 (*fscreate_as_self)(void);
int (*cachefiles_get_secid)(u32 secid, u32 *modsecid);

#ifdef CONFIG_SECURITY_NETWORK
@@ -2189,6 +2202,16 @@ static inline u32 security_act_as_self(v
return security_ops->act_as_self();
}

+static inline int security_fscreate_as_secid(u32 secid, u32 *oldsecid)
+{
+ return security_ops->fscreate_as_secid(secid, oldsecid);
+}
+
+static inline u32 security_fscreate_as_self(void)
+{
+ return security_ops->fscreate_as_self();
+}
+
static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
{
return security_ops->cachefiles_get_secid(secid, modsecid);
@@ -2899,6 +2922,18 @@ static inline u32 security_act_as_self(v
return 0;
}

+static inline int security_fscreate_as_secid(u32 secid, u32 *oldsecid)
+{
+ if (oldsecid)
+ *oldsecid = 0;
+ return 0;
+}
+
+static inline u32 security_fscreate_as_self(void)
+{
+ return 0;
+}
+
static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
{
*modsecid = 0;
diff --git a/security/dummy.c b/security/dummy.c
index 30096ec..b31bd4c 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -952,6 +952,18 @@ static u32 dummy_act_as_self(void)
return 0;
}

+static int dummy_fscreate_as_secid(u32 secid, u32 *oldsecid)
+{
+ if (oldsecid)
+ *oldsecid = 0;
+ return 0;
+}
+
+static u32 dummy_fscreate_as_self(void)
+{
+ return 0;
+}
+
static int dummy_cachefiles_get_secid(u32 secid, u32 *modsecid)
{
*modsecid = 0;
@@ -1117,6 +1129,8 @@ void security_fixup_ops (struct security
set_to_dummy_if_null(ops, set_fscreate_secid);
set_to_dummy_if_null(ops, act_as_secid);
set_to_dummy_if_null(ops, act_as_self);
+ set_to_dummy_if_null(ops, fscreate_as_secid);
+ set_to_dummy_if_null(ops, fscreate_as_self);
set_to_dummy_if_null(ops, cachefiles_get_secid);
#ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 3a52698..c9388e3 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1170,8 +1170,9 @@ static int may_create(struct inode *dir,
if (rc)
return rc;

- if (tsec->create_sid && sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
- newsid = tsec->create_sid;
+ if (tsec->create_as_sid &&
+ sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
+ newsid = tsec->create_as_sid;
} else {
rc = security_transition_sid(tsec->actor_sid, dsec->sid,
tclass, &newsid);
@@ -1606,7 +1607,7 @@ static int selinux_bprm_set_security(str
bsec->sid = tsec->actor_sid;

/* Reset fs, key, and sock SIDs on execve. */
- tsec->create_sid = 0;
+ tsec->create_as_sid = tsec->create_sid = 0;
tsec->keycreate_sid = 0;
tsec->sockcreate_sid = 0;

@@ -2088,8 +2089,9 @@ static int selinux_inode_init_security(s
dsec = dir->i_security;
sbsec = dir->i_sb->s_security;

- if (tsec->create_sid && sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
- newsid = tsec->create_sid;
+ if (tsec->create_as_sid &&
+ sbsec->behavior != SECURITY_FS_USE_MNTPOINT) {
+ newsid = tsec->create_as_sid;
} else {
rc = security_transition_sid(tsec->actor_sid, dsec->sid,
inode_mode_to_security_class(inode->i_mode),
@@ -2711,7 +2713,7 @@ static int selinux_task_alloc_security(s

/* Retain the exec, fs, key, and sock SIDs across fork */
tsec2->exec_sid = tsec1->exec_sid;
- tsec2->create_sid = tsec1->create_sid;
+ tsec2->create_as_sid = tsec2->create_sid = tsec1->create_sid;
tsec2->keycreate_sid = tsec1->keycreate_sid;
tsec2->sockcreate_sid = tsec1->sockcreate_sid;

@@ -4586,6 +4588,30 @@ static u32 selinux_act_as_self(void)
return oldactor_sid;
}

+static int selinux_fscreate_as_secid(u32 secid, u32 *oldsecid)
+{
+ struct task_security_struct *tsec = current->security;
+ int error;
+
+ error = task_has_perm(current, current, PROCESS__SETFSCREATE);
+ if (error < 0)
+ return error;
+
+ if (oldsecid)
+ *oldsecid = tsec->create_as_sid;
+ tsec->create_as_sid = secid;
+ return 0;
+}
+
+static u32 selinux_fscreate_as_self(void)
+{
+ struct task_security_struct *tsec = current->security;
+ u32 oldcreate_sid = tsec->create_as_sid;
+
+ tsec->create_as_sid = tsec->create_sid;
+ return oldcreate_sid;
+}
+
static int selinux_cachefiles_get_secid(u32 secid, u32 *modsecid)
{
return security_transition_sid(secid, SECINITSID_KERNEL,
@@ -4779,6 +4805,8 @@ static struct security_operations selinu
.set_fscreate_secid = selinux_set_fscreate_secid,
.act_as_secid = selinux_act_as_secid,
.act_as_self = selinux_act_as_self,
+ .fscreate_as_secid = selinux_fscreate_as_secid,
+ .fscreate_as_self = selinux_fscreate_as_self,
.cachefiles_get_secid = selinux_cachefiles_get_secid,

.unix_stream_connect = selinux_socket_unix_stream_connect,
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 4e8da30..70a6f00 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -33,6 +33,7 @@ struct task_security_struct {
u32 actor_sid; /* act-as SID (normally == sid) */
u32 exec_sid; /* exec SID */
u32 create_sid; /* fscreate SID */
+ u32 create_as_sid; /* fscreate-as SID (normally == create_sid) */
u32 keycreate_sid; /* keycreate SID */
u32 sockcreate_sid; /* fscreate SID */
u32 ptrace_sid; /* SID of ptrace parent */
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 4715de5..bd4529d 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -196,7 +196,7 @@ extern int cachefiles_determine_cache_se
static inline
void cachefiles_set_fscreate_secid(struct cachefiles_cache *cache)
{
- security_set_fscreate_secid(cache->cache_secid);
+ security_fscreate_as_secid(cache->cache_secid, NULL);
}
#else
#define cachefiles_get_security_ID(cache) (0)
@@ -217,7 +217,6 @@ static inline void cachefiles_begin_secu
{
#ifdef CONFIG_SECURITY
security_act_as_secid(cache->access_secid);
- ctx->fscreate_secid = security_get_fscreate_secid();
#endif
ctx->fsuid = current->fsuid;
ctx->fsgid = current->fsgid;
@@ -230,7 +229,7 @@ static inline void cachefiles_begin_secu
{
#ifdef CONFIG_SECURITY
security_act_as_secid(cache->access_secid);
- ctx->fscreate_secid = security_set_fscreate_secid(cache->cache_secid);
+ security_fscreate_as_secid(cache->cache_secid, NULL);
#endif
ctx->fsuid = current->fsuid;
ctx->fsgid = current->fsgid;
@@ -244,7 +243,7 @@ static inline void cachefiles_end_secure
current->fsuid = ctx->fsuid;
current->fsgid = ctx->fsgid;
#ifdef CONFIG_SECURITY
- security_set_fscreate_secid(ctx->fscreate_secid);
+ security_fscreate_as_self();
security_act_as_self();
#endif
}

2006-11-15 19:14:10

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

David Howells <[email protected]> wrote:

> I haven't removed the old fscreate overriding patch yet, not have I put in the
> error handling in CacheFiles.

that should read "... nor have I..."

David

2006-11-17 10:03:56

by David Howells

[permalink] [raw]
Subject: [PATCH 26/19] CacheFiles: Don't include linux/proc_fs.h


Don't include linux/proc_fs.h anymore as we no longer use procfs.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-bind.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/cachefiles/cf-bind.c b/fs/cachefiles/cf-bind.c
index 0c055a9..2c22d35 100644
--- a/fs/cachefiles/cf-bind.c
+++ b/fs/cachefiles/cf-bind.c
@@ -20,7 +20,6 @@ #include <linux/namei.h>
#include <linux/mount.h>
#include <linux/namespace.h>
#include <linux/statfs.h>
-#include <linux/proc_fs.h>
#include <linux/ctype.h>
#include "internal.h"

2006-11-20 18:45:22

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

On Tue, 2006-11-14 at 16:19 -0500, James Morris wrote:
> On Tue, 14 Nov 2006, David Howells wrote:
>
> > +static u32 selinux_set_fscreate_secid(u32 secid)
> > +{
> > + struct task_security_struct *tsec = current->security;
> > + u32 oldsid = tsec->create_sid;
> > +
> > + tsec->create_sid = secid;
> > + return oldsid;
> > +}
>
> The ability to set this needs to be mediated via MAC policy.
>
> See selinux_setprocattr()

That's different - selinux_set_fscreate_secid() is for internal use by a
kernel module that wishes to temporarily assume a particular fscreate
SID, whereas selinux_setprocattr() handles userspace writes
to /proc/self/attr nodes. Imposing a permission check here makes no
sense.

--
Stephen Smalley
National Security Agency

2006-11-20 18:53:43

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

On Wed, 2006-11-15 at 16:23 +0000, David Howells wrote:
> James Morris <[email protected]> wrote:
>
> > Well, the value can be changed at any time, so you could be using a
> > temporary fscreate value, or your new value could be overwritten
> > immediately by writing to /proc/$$/attr/fscreate
>
> Ah. Hmmm. By whom? In selinux_setprocattr():
>
> if (current != p) {
> /* SELinux only allows a process to change its own
> security attributes. */
> return -EACCES;
> }
>
> But current busy inside the cache and can't do this.

Correct; this is no different than modifying ->fsuid temporarily.

> > I think we need to add a separate field for this purpose, which can only
> > be written to via the in-kernel API and overrides fscreate.
>
> So, like my acts-as security ID patch?
>
> Would it still need to be controlled by MAC policy in that case? Doing so is
> a bit of a pain as it means I have a whole bunch of extra failures I still
> need to check for, and the race in which the rules might change is still a
> possibility I have to deal with.

I don't see any value added by introducing yet another field for the
create SID (unlike the actor SID, where we need to distinguish it for
certain checks where the task is the target rather than the actor), so I
don't advocate this approach.

I still suspect that the task flag approach would have been rather
simpler...

--
Stephen Smalley
National Security Agency

2006-11-20 19:57:04

by Karl MacMillan

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

Stephen Smalley wrote:
> On Tue, 2006-11-14 at 16:19 -0500, James Morris wrote:
>> On Tue, 14 Nov 2006, David Howells wrote:
>>
>>> +static u32 selinux_set_fscreate_secid(u32 secid)
>>> +{
>>> + struct task_security_struct *tsec = current->security;
>>> + u32 oldsid = tsec->create_sid;
>>> +
>>> + tsec->create_sid = secid;
>>> + return oldsid;
>>> +}
>> The ability to set this needs to be mediated via MAC policy.
>>
>> See selinux_setprocattr()
>
> That's different - selinux_set_fscreate_secid() is for internal use by a
> kernel module that wishes to temporarily assume a particular fscreate
> SID, whereas selinux_setprocattr() handles userspace writes
> to /proc/self/attr nodes. Imposing a permission check here makes no
> sense.
>

Since that discussion last week I have been thinking about this and I
have to say I agree with Steve. This should be a kernel only mechanism
for impersonating another SID - controlling the setting of process
attributes shouldn't be restricted as this will only lead to
inconsistencies in those attributes.

Karl

2006-11-20 22:29:29

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 12/19] CacheFiles: Permit a process's create SID to be overridden

On Mon, 20 Nov 2006, Stephen Smalley wrote:

> > The ability to set this needs to be mediated via MAC policy.
> >
> > See selinux_setprocattr()
>
> That's different - selinux_set_fscreate_secid() is for internal use by a
> kernel module that wishes to temporarily assume a particular fscreate
> SID, whereas selinux_setprocattr() handles userspace writes
> to /proc/self/attr nodes. Imposing a permission check here makes no
> sense.

Well, the hook is exported generally to the kernel, so we need to
ensure that it is documented with a big warning. The name of the hook
should perhaps make it more obvious, like set_internal_ or so.



- James
--
James Morris
<[email protected]>

2006-11-23 13:14:05

by David Howells

[permalink] [raw]
Subject: [PATCH 27/19] FS-Cache: Apply the PG_checked -> PG_fs_misc conversion to Ext4


Apply the PG_checked -> PG_fs_misc conversion to Ext4 [patch 02/19].

Signed-Off-By: David Howells <[email protected]>
---

fs/ext4/inode.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0a60ec5..4846bb9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1530,12 +1530,12 @@ static int ext4_journalled_writepage(str
goto no_write;
}

- if (!page_has_buffers(page) || PageChecked(page)) {
+ if (!page_has_buffers(page) || PageFsMisc(page)) {
/*
* It's mmapped pagecache. Add buffers and journal it. There
* doesn't seem much point in redirtying the page here.
*/
- ClearPageChecked(page);
+ ClearPageFsMisc(page);
ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
ext4_get_block);
if (ret != 0) {
@@ -1592,7 +1592,7 @@ static void ext4_invalidatepage(struct p
* If it's a full truncate we just forget about the pending dirtying
*/
if (offset == 0)
- ClearPageChecked(page);
+ ClearPageFsMisc(page);

jbd2_journal_invalidatepage(journal, page, offset);
}
@@ -1601,7 +1601,7 @@ static int ext4_releasepage(struct page
{
journal_t *journal = EXT4_JOURNAL(page->mapping->host);

- WARN_ON(PageChecked(page));
+ WARN_ON(PageFsMisc(page));
if (!page_has_buffers(page))
return 0;
return jbd2_journal_try_to_free_buffers(journal, page, wait);
@@ -1697,7 +1697,7 @@ out:
*/
static int ext4_journalled_set_page_dirty(struct page *page)
{
- SetPageChecked(page);
+ SetPageFsMisc(page);
return __set_page_dirty_nobuffers(page);
}

2006-11-23 13:20:22

by David Howells

[permalink] [raw]
Subject: [PATCH 28/19] FS-Cache: NFS: Handle caching being disabled correctly


Make the prototypes of the cache-disabled stubs consistent with the
cache-enabled functions

Signed-Off-By: David Howells <[email protected]>
---

fs/nfs/fscache.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index c363421..4e42bc9 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -445,7 +445,7 @@ static inline void nfs_fscache_renew_fh_
static inline void nfs_fscache_disable_fh_cookie(struct inode *inode) {}
static inline void nfs_fscache_set_fh_cookie(struct inode *inode, struct file *filp) {}
static inline void nfs_fscache_install_vm_ops(struct inode *inode, struct vm_area_struct *vma) {}
-static inline int nfs_fscache_release_page(struct page *page)
+static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
return 1; /* True: may release page */
}

2006-11-23 20:16:49

by David Howells

[permalink] [raw]
Subject: [PATCH 29/19] CacheFiles: Remove old obsolete cull function

CacheFiles: Remove old obsolete cull function

From: David Howells <[email protected]>

Remove the old cachefiles_cull() function that was obsolete and #if'd out.

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-namei.c | 110 ----------------------------------------------
1 files changed, 0 insertions(+), 110 deletions(-)

diff --git a/fs/cachefiles/cf-namei.c b/fs/cachefiles/cf-namei.c
index d0db9b3..9e6dd9f 100644
--- a/fs/cachefiles/cf-namei.c
+++ b/fs/cachefiles/cf-namei.c
@@ -524,116 +524,6 @@ nomem_d_alloc:
return ERR_PTR(-ENOMEM);
}

-#if 0
-/*
- * cull an object if it's not in use
- * - called only by cache manager daemon
- */
-int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
- char *filename)
-{
- struct cachefiles_object *object;
- struct rb_node *_n;
- struct dentry *victim;
- int ret;
-
- _enter(",%*.*s/,%s",
- dir->d_name.len, dir->d_name.len, dir->d_name.name, filename);
-
- /* look up the victim */
- mutex_lock(&dir->d_inode->i_mutex);
-
- victim = lookup_one_len(filename, dir, strlen(filename));
- if (IS_ERR(victim))
- goto lookup_error;
-
- _debug("victim -> %p %s",
- victim, victim->d_inode ? "positive" : "negative");
-
- /* if the object is no longer there then we probably retired the object
- * at the netfs's request whilst the cull was in progress
- */
- if (!victim->d_inode) {
- mutex_unlock(&dir->d_inode->i_mutex);
- dput(victim);
- _leave(" = -ENOENT [absent]");
- return -ENOENT;
- }
-
- /* check to see if we're using this object */
- read_lock(&cache->active_lock);
-
- _n = cache->active_nodes.rb_node;
-
- while (_n) {
- object = rb_entry(_n, struct cachefiles_object, active_node);
-
- if (object->dentry > victim)
- _n = _n->rb_left;
- else if (object->dentry < victim)
- _n = _n->rb_right;
- else
- goto object_in_use;
- }
-
- read_unlock(&cache->active_lock);
-
- /* okay... the victim is not being used so we can cull it
- * - start by marking it as stale
- */
- _debug("victim is cullable");
-
- ret = cachefiles_remove_object_xattr(cache, victim);
- if (ret < 0)
- goto error_unlock;
-
- /* actually remove the victim (drops the dir mutex) */
- _debug("bury");
-
- ret = cachefiles_bury_object(cache, dir, victim);
- if (ret < 0)
- goto error;
-
- dput(victim);
- _leave(" = 0");
- return 0;
-
-
-object_in_use:
- read_unlock(&cache->active_lock);
- mutex_unlock(&dir->d_inode->i_mutex);
- dput(victim);
- _leave(" = -EBUSY [in use]");
- return -EBUSY;
-
-lookup_error:
- mutex_unlock(&dir->d_inode->i_mutex);
- ret = PTR_ERR(victim);
- if (ret == -EIO)
- cachefiles_io_error(cache, "Lookup failed");
- goto choose_error;
-
-error_unlock:
- mutex_unlock(&dir->d_inode->i_mutex);
-error:
- dput(victim);
-choose_error:
- if (ret == -ENOENT) {
- /* file or dir now absent - probably retired by netfs */
- _leave(" = -ESTALE [absent]");
- return -ESTALE;
- }
-
- if (ret != -ENOMEM) {
- kerror("Internal error: %d", ret);
- ret = -EIO;
- }
-
- _leave(" = %d", ret);
- return ret;
-}
-#endif
-
/*
* find out if an object is in use or not
* - if finds object and it's not in use:

2006-11-29 16:48:11

by David Howells

[permalink] [raw]
Subject: [PATCH 30/19] CacheFiles: Fix the allocate_page() op


Fix cachefiles_allocate_page() to mark the specified page as being retained if
it returns successfully.

Also fix the header comment on that function (it doesn't read data from the
disk).

Signed-Off-By: David Howells <[email protected]>
---

fs/cachefiles/cf-interface.c | 19 +++++++++++++++++--
1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/cf-interface.c b/fs/cachefiles/cf-interface.c
index e96e63a..a08831b 100644
--- a/fs/cachefiles/cf-interface.c
+++ b/fs/cachefiles/cf-interface.c
@@ -1108,7 +1108,7 @@ static int cachefiles_read_or_alloc_page
}

/*
- * read a page from the cache or allocate a block in which to store it
+ * allocate a block in the cache in which to store a page
* - cache withdrawal is prevented by the caller
* - returns -EINTR if interrupted
* - returns -ENOMEM if ran out of memory
@@ -1124,6 +1124,9 @@ static int cachefiles_allocate_page(stru
{
struct cachefiles_object *object;
struct cachefiles_cache *cache;
+ struct fscache_cookie *cookie;
+ struct pagevec pagevec;
+ int ret;

object = container_of(_object, struct cachefiles_object, fscache);
cache = container_of(object->fscache.cache,
@@ -1131,7 +1134,19 @@ static int cachefiles_allocate_page(stru

_enter("%p,{%lx},,,", object, page->index);

- return cachefiles_has_space(cache, 0, 1);
+ ret = cachefiles_has_space(cache, 0, 1);
+ if (ret == 0) {
+ pagevec_init(&pagevec, 0);
+ pagevec_add(&pagevec, page);
+ cookie = object->fscache.cookie;
+ cookie->def->mark_pages_cached(cookie->netfs_data,
+ page->mapping, &pagevec);
+ } else {
+ ret = -ENOBUFS;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
}

/*