The following is a series of patches related to Unionfs. The main change
here is that unionfs now has its own ->writepages method.
These patches were tested (where appropriate) on Linus's 2.6.24 latest code
(as of v2.6.23-6623-g55b70a0), as well as the backports to
2.6.{23,22,21,20,19,18,9} on ext2/3/4, xfs, reiserfs, nfs2/3/4, jffs2,
ramfs, tmpfs, cramfs, and squashfs (where available). See
http://unionfs.filesystems.org/ to download backported unionfs code.
Please pull from the 'master' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/ezk/unionfs.git
to receive the following:
Andrew Morton (2):
Unionfs: security convert lsm into a static interface fix
Unionfs: slab api remove useless ctor parameter and reorder parameters
Erez Zadok (6):
Unionfs: support lower filesystems without writeback capability
Unionfs: don't printk trivial message upon normal rename-copyup
Unionfs: don't bother validating dentry if it has no lower branches
Unionfs: convert a printk to pr_debug in release
Unionfs: remove for_writepages nfs workaround
Unionfs: remove obsolete #define and comment
Jeff Layton (1):
Unionfs: fix unionfs_setattr to handle ATTR_KILL_S*ID
fs/unionfs/debug.c | 4 +++
fs/unionfs/dentry.c | 6 ++--
fs/unionfs/inode.c | 7 +++++
fs/unionfs/mmap.c | 62 +++++++++++++++++------------------------------
fs/unionfs/rename.c | 8 +++---
fs/unionfs/super.c | 4 +--
fs/unionfs/union.h | 1
include/linux/union_fs.h | 3 --
security/security.c | 2 +
9 files changed, 47 insertions(+), 50 deletions(-)
---
Erez Zadok
[email protected]
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/rename.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
index 91d41d4..1ab474f 100644
--- a/fs/unionfs/rename.c
+++ b/fs/unionfs/rename.c
@@ -40,10 +40,12 @@ static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
new_dentry, new_dentry->d_name.name,
bindex);
if (IS_ERR(lower_new_dentry)) {
- printk(KERN_ERR "unionfs: error creating directory "
- "tree for rename, bindex = %d, err = %ld\n",
- bindex, PTR_ERR(lower_new_dentry));
err = PTR_ERR(lower_new_dentry);
+ if (err == -EROFS)
+ goto out;
+ printk(KERN_ERR "unionfs: error creating directory "
+ "tree for rename, bindex=%d err=%d\n",
+ bindex, err);
goto out;
}
}
--
1.5.2.2
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/debug.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/fs/unionfs/debug.c b/fs/unionfs/debug.c
index 68692d7..894bf7c 100644
--- a/fs/unionfs/debug.c
+++ b/fs/unionfs/debug.c
@@ -132,6 +132,9 @@ void __unionfs_check_dentry(const struct dentry *dentry,
inode = dentry->d_inode;
dstart = dbstart(dentry);
dend = dbend(dentry);
+ /* don't check dentry/mnt if no lower branches */
+ if (dstart < 0 && dend < 0)
+ goto check_inode;
BUG_ON(dstart > dend);
if (unlikely((dstart == -1 && dend != -1) ||
@@ -212,6 +215,7 @@ void __unionfs_check_dentry(const struct dentry *dentry,
}
}
+check_inode:
/* for inodes now */
if (!inode)
return;
--
1.5.2.2
This is mostly an informational message, not an error.
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/dentry.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/unionfs/dentry.c b/fs/unionfs/dentry.c
index 6bab9d6..a3d7b6e 100644
--- a/fs/unionfs/dentry.c
+++ b/fs/unionfs/dentry.c
@@ -455,9 +455,9 @@ static void unionfs_d_release(struct dentry *dentry)
goto out;
} else if (dbstart(dentry) < 0) {
/* this is due to a failed lookup */
- printk(KERN_ERR "unionfs: dentry without lower "
- "dentries: %.*s\n",
- dentry->d_name.len, dentry->d_name.name);
+ pr_debug("unionfs: dentry without lower "
+ "dentries: %.*s\n",
+ dentry->d_name.len, dentry->d_name.name);
goto out_free;
}
--
1.5.2.2
From: Andrew Morton <[email protected]>
fs/unionfs/super.c: In function 'unionfs_init_inode_cache':
fs/unionfs/super.c:874: warning: passing argument 5 of 'kmem_cache_create' from incompatible pointer type
Cc: Christoph Lameter <[email protected]>
Cc: Josef 'Jeff' Sipek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/super.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
index 515689d..7d28045 100644
--- a/fs/unionfs/super.c
+++ b/fs/unionfs/super.c
@@ -863,9 +863,9 @@ static void unionfs_destroy_inode(struct inode *inode)
}
/* unionfs inode cache constructor */
-static void init_once(void *v, struct kmem_cache *cachep, unsigned long flags)
+static void init_once(struct kmem_cache *cachep, void *obj)
{
- struct unionfs_inode_info *i = v;
+ struct unionfs_inode_info *i = obj;
inode_init_once(&i->vfs_inode);
}
--
1.5.2.2
Signed-off-by: Erez Zadok <[email protected]>
---
include/linux/union_fs.h | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/include/linux/union_fs.h b/include/linux/union_fs.h
index 7f8dcc3..d29318f 100644
--- a/include/linux/union_fs.h
+++ b/include/linux/union_fs.h
@@ -20,8 +20,5 @@
# define UNIONFS_IOCTL_INCGEN _IOR(0x15, 11, int)
# define UNIONFS_IOCTL_QUERYFILE _IOR(0x15, 15, int)
-/* We don't support normal remount, but unionctl uses it. */
-# define UNIONFS_REMOUNT_MAGIC 0x4a5a4380
-
#endif /* _LINUX_UNIONFS_H */
--
1.5.2.2
Implement unionfs_writepages. As per
mm/filemap.c:__filemap_fdatawrite_range(), don't call our writepage if the
lower mapping has BDI_CAP_NO_WRITEBACK capability set.
Signed-off-by: Pekka J Enberg <[email protected]>
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/mmap.c | 23 +++++++++++++++++++++++
fs/unionfs/union.h | 1 +
2 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
index 6440282..b43557e 100644
--- a/fs/unionfs/mmap.c
+++ b/fs/unionfs/mmap.c
@@ -144,6 +144,28 @@ out:
return err;
}
+static int unionfs_writepages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ int err = 0;
+ struct inode *lower_inode;
+ struct inode *inode;
+
+ inode = mapping->host;
+ lower_inode = unionfs_lower_inode(inode);
+ BUG_ON(!lower_inode);
+
+ if (!mapping_cap_writeback_dirty(lower_inode->i_mapping))
+ goto out;
+
+ /* Note: generic_writepages may return AOP_WRITEPAGE_ACTIVATE */
+ err = generic_writepages(mapping, wbc);
+ if (err == 0)
+ unionfs_copy_attr_times(inode);
+out:
+ return err;
+}
+
/*
* readpage is called from generic_page_read and the fault handler.
* If your file system uses generic_page_read for the read op, it
@@ -374,6 +396,7 @@ out:
struct address_space_operations unionfs_aops = {
.writepage = unionfs_writepage,
+ .writepages = unionfs_writepages,
.readpage = unionfs_readpage,
.prepare_write = unionfs_prepare_write,
.commit_write = unionfs_commit_write,
diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
index 8eb2ee4..6333488 100644
--- a/fs/unionfs/union.h
+++ b/fs/unionfs/union.h
@@ -45,6 +45,7 @@
#include <linux/log2.h>
#include <linux/poison.h>
#include <linux/mman.h>
+#include <linux/backing-dev.h>
#include <asm/system.h>
--
1.5.2.2
This is no longer necessary since struct writeback_control no longer has a
fs_private field which lower file systems (esp. nfs) use. Plus, unionfs now
defines its own ->writepages method.
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/mmap.c | 39 ---------------------------------------
1 files changed, 0 insertions(+), 39 deletions(-)
diff --git a/fs/unionfs/mmap.c b/fs/unionfs/mmap.c
index b43557e..bed11c3 100644
--- a/fs/unionfs/mmap.c
+++ b/fs/unionfs/mmap.c
@@ -19,39 +19,6 @@
#include "union.h"
-/*
- * Unionfs doesn't implement ->writepages, which is OK with the VFS and
- * keeps our code simpler and smaller. Nevertheless, somehow, our own
- * ->writepage must be called so we can sync the upper pages with the lower
- * pages: otherwise data changed at the upper layer won't get written to the
- * lower layer.
- *
- * Some lower file systems (e.g., NFS) expect the VFS to call its writepages
- * only, which in turn will call generic_writepages and invoke each of the
- * lower file system's ->writepage. NFS in particular uses the
- * wbc->fs_private field in its nfs_writepage, which is set in its
- * nfs_writepages. So if we don't call the lower nfs_writepages first, then
- * NFS's nfs_writepage will dereference a NULL wbc->fs_private and cause an
- * OOPS. If, however, we implement a unionfs_writepages and then we do call
- * the lower nfs_writepages, then we "lose control" over the pages we're
- * trying to write to the lower file system: we won't be writing our own
- * new/modified data from the upper pages to the lower pages, and any
- * mmap-based changes are lost.
- *
- * This is a fundamental cache-coherency problem in Linux. The kernel isn't
- * able to support such stacking abstractions cleanly. One possible clean
- * way would be that a lower file system's ->writepage method have some sort
- * of a callback to validate if any upper pages for the same file+offset
- * exist and have newer content in them.
- *
- * This whole NULL ptr dereference is triggered at the lower file system
- * (NFS) because the wbc->for_writepages is set to 1. Therefore, to avoid
- * this NULL pointer dereference, we set this flag to 0 and restore it upon
- * exit. This probably means that we're slightly less efficient in writing
- * pages out, doing them one at a time, but at least we avoid the oops until
- * such day as Linux can better support address_space_ops in a stackable
- * fashion.
- */
static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
{
int err = -EIO;
@@ -59,7 +26,6 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
struct inode *lower_inode;
struct page *lower_page;
char *kaddr, *lower_kaddr;
- int saved_for_writepages = wbc->for_writepages;
inode = page->mapping->host;
lower_inode = unionfs_lower_inode(inode);
@@ -101,14 +67,9 @@ static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
BUG_ON(!lower_inode->i_mapping->a_ops->writepage);
- /* workaround for some lower file systems: see big comment on top */
- if (wbc->for_writepages && !wbc->fs_private)
- wbc->for_writepages = 0;
-
/* call lower writepage (expects locked page) */
clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
err = lower_inode->i_mapping->a_ops->writepage(lower_page, wbc);
- wbc->for_writepages = saved_for_writepages; /* restore value */
/* b/c find_lock_page locked it and ->writepage unlocks on success */
if (err)
--
1.5.2.2
From: Jeff Layton <[email protected]>
Don't allow unionfs_setattr to trip the BUG() in notify_change. Clear
ATTR_MODE if the either ATTR_KILL_S*ID is set. This also allows the
lower filesystem to interpret these bits in its own way.
Signed-off-by: Jeff Layton <[email protected]>
Cc: Josef 'Jeff' Sipek <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Erez Zadok <[email protected]>
---
fs/unionfs/inode.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
index 4e59ace..6ca52f4 100644
--- a/fs/unionfs/inode.c
+++ b/fs/unionfs/inode.c
@@ -1048,6 +1048,13 @@ static int unionfs_setattr(struct dentry *dentry, struct iattr *ia)
bend = dbend(dentry);
inode = dentry->d_inode;
+ /*
+ * mode change is for clearing setuid/setgid. Allow lower filesystem
+ * to reinterpret it in its own way.
+ */
+ if (ia->ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
+ ia->ia_valid &= ~ATTR_MODE;
+
for (bindex = bstart; (bindex <= bend) || (bindex == bstart);
bindex++) {
lower_dentry = unionfs_lower_dentry_idx(dentry, bindex);
--
1.5.2.2
From: Andrew Morton <[email protected]>
ERROR: "security_inode_permission" [fs/unionfs/unionfs.ko] undefined!
ERROR: "security_file_ioctl" [fs/unionfs/unionfs.ko] undefined!
Need these back.
Cc: "Serge E. Hallyn" <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Chris Wright <[email protected]>
Cc: James Morris <[email protected]>
Cc: Stephen Smalley <[email protected]>
Cc: Josef 'Jeff' Sipek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Erez Zadok <[email protected]>
---
security/security.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/security/security.c b/security/security.c
index 0e1f1f1..95a6733 100644
--- a/security/security.c
+++ b/security/security.c
@@ -409,6 +409,7 @@ int security_inode_permission(struct inode *inode, int mask, struct nameidata *n
return 0;
return security_ops->inode_permission(inode, mask, nd);
}
+EXPORT_SYMBOL(security_inode_permission);
int security_inode_setattr(struct dentry *dentry, struct iattr *attr)
{
@@ -518,6 +519,7 @@ int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
return security_ops->file_ioctl(file, cmd, arg);
}
+EXPORT_SYMBOL(security_file_ioctl);
int security_file_mmap(struct file *file, unsigned long reqprot,
unsigned long prot, unsigned long flags,
--
1.5.2.2
On Sun, Oct 21, 2007 at 07:51:14PM -0400, Erez Zadok wrote:
> From: Andrew Morton <[email protected]>
>
> ERROR: "security_inode_permission" [fs/unionfs/unionfs.ko] undefined!
> ERROR: "security_file_ioctl" [fs/unionfs/unionfs.ko] undefined!
>
> Need these back.
These should never used by modules. You'll need to use and/or introduce
vfs_ helpers that do proper security checks before calling the methods.
In message <[email protected]>, Christoph Hellwig writes:
> On Sun, Oct 21, 2007 at 07:51:14PM -0400, Erez Zadok wrote:
> > From: Andrew Morton <[email protected]>
> >
> > ERROR: "security_inode_permission" [fs/unionfs/unionfs.ko] undefined!
> > ERROR: "security_file_ioctl" [fs/unionfs/unionfs.ko] undefined!
> >
> > Need these back.
>
> These should never used by modules.
Why? Are you concerned that the security policy may change after a module
is loaded? My understanding of the security code is that it should handle
this, even if people call security_*() functions directly. When I look at
the security_* functions in security.c, to me they very much smell like
global wrappers that others can call, b/c they refer to private/global ops
vectors that one should not be referencing directly. For example:
int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
return security_ops->file_ioctl(file, cmd, arg);
}
> You'll need to use and/or introduce
> vfs_ helpers that do proper security checks before calling the methods.
I can probably get rid of having unionfs call security_inode_permission, by
calling permission() myself and carefully post-process its return code
(unionfs needs to "ignore" EROFS initially, to allow copyup to take place).
But security_file_ioctl doesn't have any existing helper I can call. I can
introduce a trivial vfs_security_file_ioctl wrapper to security_file_ioctl,
but what about the already existing *19* exported security_* functions in
security/security.c? Do you want to see simple wrappers for all of them?
It seems redundant to add a one-line wrapper around an already one-line
function around security_ops->XXX. Plus, some of the existing exported
security_* functions are file-system related, others are networking, etc. So
we'll need wrappers whose names are prefixed appropriately: vfs_*, net_*,
etc.
Thanks,
Erez.
On Mon, Oct 22, 2007 at 08:48:04PM -0400, Erez Zadok wrote:
> Why? Are you concerned that the security policy may change after a module
> is loaded?
No, it's a matter of proper layering. We generally don't want modules
like stackabke filesystems to call directly into methods but rather use
proper highlevel VFS helpers to isolate them from details and possible
changes. The move to out of line security_ helpers just put this on the
radard.
> I can probably get rid of having unionfs call security_inode_permission, by
> calling permission() myself and carefully post-process its return code
> (unionfs needs to "ignore" EROFS initially, to allow copyup to take place).
Sounds fine.
> But security_file_ioctl doesn't have any existing helper I can call. I can
> introduce a trivial vfs_security_file_ioctl wrapper to security_file_ioctl,
> but what about the already existing *19* exported security_* functions in
> security/security.c? Do you want to see simple wrappers for all of them?
> It seems redundant to add a one-line wrapper around an already one-line
> function around security_ops->XXX. Plus, some of the existing exported
> security_* functions are file-system related, others are networking, etc. So
> we'll need wrappers whose names are prefixed appropriately: vfs_*, net_*,
> etc.
The fix for security_file_ioctl is probably to either not do it at all
or move it the call to security_file_ioctl into vfs_ioctl and get it by
using that helper. I suspect most other security_ exports should be
avoided similarly.
I also suspect the whole issue of where and how-many times to call LSM
methods for stackable filesystems is a huge can of worms and it might make
sense to talk to the LSM folks about it.
In message <[email protected]>, Christoph Hellwig writes:
> On Mon, Oct 22, 2007 at 08:48:04PM -0400, Erez Zadok wrote:
> > Why? Are you concerned that the security policy may change after a module
> > is loaded?
>
> No, it's a matter of proper layering. We generally don't want modules
> like stackabke filesystems to call directly into methods but rather use
> proper highlevel VFS helpers to isolate them from details and possible
> changes. The move to out of line security_ helpers just put this on the
> radard.
OK, I'll be shortly posting a couple of patches to fs/ioctl.c.
> > I can probably get rid of having unionfs call security_inode_permission,
> > by calling permission() myself and carefully post-process its return
> > code (unionfs needs to "ignore" EROFS initially, to allow copyup to take
> > place).
>
> Sounds fine.
I was able to test this idea and it works fine. Now unionfs calls
permission(), post-processes the return value, and I don't need my own
modified version of permission() in unionfs. This saved me ~50 LoC and
reduced stack pressure a little.
> > But security_file_ioctl doesn't have any existing helper I can call. I
> > can introduce a trivial vfs_security_file_ioctl wrapper to
> > security_file_ioctl, but what about the already existing *19* exported
> > security_* functions in security/security.c? Do you want to see simple
> > wrappers for all of them? It seems redundant to add a one-line wrapper
> > around an already one-line function around security_ops->XXX. Plus,
> > some of the existing exported security_* functions are file-system
> > related, others are networking, etc. So we'll need wrappers whose names
> > are prefixed appropriately: vfs_*, net_*, etc.
>
> The fix for security_file_ioctl is probably to either not do it at all
> or move it the call to security_file_ioctl into vfs_ioctl and get it by
> using that helper. I suspect most other security_ exports should be
> avoided similarly.
Christoph, I looked more closely at that and the selinux code. Only
sys_ioctl calls security_file_ioctl. And security_file_ioctl performs all
sorts of checks that mostly have to do with the currently running task or
the open file. The running task is still the same, whether filesystem
stacking is involved or not. Also, the unionfs-level struct file is
logically the same file at the lower level: they refer to the same object,
just at two layers. I can't see any reason why unionfs_ioctl should have to
call security_file_ioctl(lower_file) the way sys_ioctl does: that check is
already done well before the file system's ->ioctl is invoked. I also don't
see how it would ever be possible that sys_ioctl will succeed in its call to
security_file_ioctl(upper_file), but unionfs will fail the same security
check on the lower file.
So I commented out unionfs's call to security_file_ioctl(lower_file) and
tested it on a bunch of things, including an selinux-enabled livecd.
Everything seemed to work just fine, so I'll be sending some patches to that
effect, and we can drop the -mm patch which exports security_file_ioctl().
BTW, ecryptfs doesn't call security_file_ioctl().
Cheers,
Erez.