2007-12-23 14:44:29

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Thank you for attending discussion for previous posting
(starting from http://lkml.org/lkml/2007/12/16/23 ).

The previous posting was for feasibility test to know
whether this kind of trivial filesystem is acceptable for mainline.

Now, it seems that there is a little chance for accepting.
Therefore I rebased the patch using the -mm tree.

Regards.
----------
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
"applications using well-known device locations under /dev
get the device they want" (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

Because /dev has to be available through the lifetime of the kernel.
It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

Because SELinux doesn't guarantee filename and its attribute.
As far as I know, no MAC implementation can handle filename and its attribute.
I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

I want to add functionality that the MACs are missing.
Instead of adding this functionality per MAC,
I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

Dropping CAP_MKNOD is not enough for emulating this filesystem because
a process can still rename()/unlink() to break filename and its attributes
handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa <[email protected]>
---
fs/ramfs/inode.c | 101 ++++-
fs/ramfs/syaoran.h | 1066 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 1160 insertions(+), 7 deletions(-)

--- linux-2.6-mm.orig/fs/ramfs/inode.c
+++ linux-2.6-mm/fs/ramfs/inode.c
@@ -35,6 +35,7 @@
#include <linux/sched.h>
#include <asm/uaccess.h>
#include "internal.h"
+#include "syaoran.h"

/* some random number */
#define RAMFS_MAGIC 0x858458f6
@@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac
BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP,
};

-struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
+struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev,
+ const int mac)
{
struct inode * inode = new_inode(sb);

@@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup
switch (mode & S_IFMT) {
default:
init_special_inode(inode, mode, dev);
+ if (mac) {
+ if (S_ISBLK(mode))
+ inode->i_fop = &wrapped_def_blk_fops;
+ else if (S_ISCHR(mode))
+ inode->i_fop = &wrapped_def_chr_fops;
+ inode->i_op = &syaoran_file_inode_operations;
+ }
break;
case S_IFREG:
inode->i_op = &ramfs_file_inode_operations;
inode->i_fop = &ramfs_file_operations;
+ if (mac)
+ inode->i_op = &syaoran_file_inode_operations;
break;
case S_IFDIR:
inode->i_op = &ramfs_dir_inode_operations;
@@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup
break;
case S_IFLNK:
inode->i_op = &page_symlink_inode_operations;
+ if (mac)
+ inode->i_op = &syaoran_symlink_inode_operations;
break;
}
}
return inode;
}

+struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
+{
+ return __ramfs_get_inode(sb, mode, dev, 0);
+}
+
/*
* File creation. Allocate an inode, and we're done..
*/
@@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup
static int
ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{
- struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
+ struct inode *inode;
int error = -ENOSPC;

+ /*** SYAORAN start. ***/
+ if (dir->i_sb->s_op == &syaoran_ops) {
+ if (syaoran_may_create_node(dentry, mode, dev) < 0)
+ return -EPERM;
+ inode = syaoran_get_inode(dir->i_sb, mode, dev);
+ /*** SYAORAN end. ***/
+ } else
+ inode = ramfs_get_inode(dir->i_sb, mode, dev);
if (inode) {
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
@@ -127,7 +153,14 @@ static int ramfs_symlink(struct inode *
struct inode *inode;
int error = -ENOSPC;

- inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
+ /*** SYAORAN start. ***/
+ if (dir->i_sb->s_op == &syaoran_ops) {
+ if (syaoran_may_create_node(dentry, S_IFLNK, 0) < 0)
+ return -EPERM;
+ inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
+ /*** SYAORAN end. ***/
+ } else
+ inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
if (inode) {
int l = strlen(symname)+1;
error = page_symlink(inode, symname, l);
@@ -146,13 +179,14 @@ static int ramfs_symlink(struct inode *
static const struct inode_operations ramfs_dir_inode_operations = {
.create = ramfs_create,
.lookup = simple_lookup,
- .link = simple_link,
- .unlink = simple_unlink,
+ .link = ramfs_link,
+ .unlink = ramfs_unlink,
.symlink = ramfs_symlink,
.mkdir = ramfs_mkdir,
- .rmdir = simple_rmdir,
+ .rmdir = ramfs_rmdir,
.mknod = ramfs_mknod,
- .rename = simple_rename,
+ .rename = ramfs_rename,
+ .setattr = ramfs_setattr,
};

static const struct super_operations ramfs_ops = {
@@ -184,6 +218,35 @@ static int ramfs_fill_super(struct super
return 0;
}

+static int syaoran_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct inode *inode;
+ struct dentry *root;
+ int error;
+
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = SYAORAN_MAGIC;
+ sb->s_op = &syaoran_ops;
+ sb->s_time_gran = 1;
+ error = syaoran_initialize(sb, data);
+ if (error < 0)
+ return error;
+ inode = syaoran_get_inode(sb, S_IFDIR | 0755, 0);
+ if (!inode)
+ return -ENOMEM;
+
+ root = d_alloc_root(inode);
+ if (!root) {
+ iput(inode);
+ return -ENOMEM;
+ }
+ sb->s_root = root;
+ syaoran_make_initial_nodes(sb);
+ return 0;
+}
+
int ramfs_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data, struct vfsmount *mnt)
{
@@ -197,6 +260,13 @@ static int rootfs_get_sb(struct file_sys
mnt);
}

+static int syaoran_get_sb(struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *data,
+ struct vfsmount *mnt)
+{
+ return get_sb_nodev(fs_type, flags, data, syaoran_fill_super, mnt);
+}
+
static struct file_system_type ramfs_fs_type = {
.name = "ramfs",
.get_sb = ramfs_get_sb,
@@ -207,6 +277,11 @@ static struct file_system_type rootfs_fs
.get_sb = rootfs_get_sb,
.kill_sb = kill_litter_super,
};
+static struct file_system_type syaoran_fs_type = {
+ .name = "syaoran",
+ .get_sb = syaoran_get_sb,
+ .kill_sb = kill_litter_super,
+};

static int __init init_ramfs_fs(void)
{
@@ -237,3 +312,15 @@ int __init init_rootfs(void)
}

MODULE_LICENSE("GPL");
+
+static int __init init_syaoran_fs(void)
+{
+ return register_filesystem(&syaoran_fs_type);
+}
+
+static void __exit exit_syaoran_fs(void)
+{
+ unregister_filesystem(&syaoran_fs_type);
+}
+module_init(init_syaoran_fs);
+module_exit(exit_syaoran_fs);
--- /dev/null
+++ linux-2.6-mm/fs/ramfs/syaoran.h
@@ -0,0 +1,1066 @@
+/*
+ * fs/ramfs/syaoran.h
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2007 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2007/12/23
+ */
+
+#include <linux/namei.h>
+#include <linux/mm.h>
+#include <linux/quotaops.h>
+
+#define list_for_each_cookie(pos, cookie, head) \
+ for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
+ prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
+ (cookie) = pos, pos = pos->next)
+
+/* The following constants are used to restrict operations.*/
+
+#define MAY_CREATE 1 /* This file is allowed to be mknod()ed. */
+#define MAY_DELETE 2 /* This file is allowed to be unlink()ed. */
+#define MAY_CHMOD 4 /* This file is allowed to be chmod()ed. */
+#define MAY_CHOWN 8 /* This file is allowed to be chown()ed. */
+#define DEVICE_USED 16 /* This block or character device file is used. */
+#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
+
+/* some random number */
+#define SYAORAN_MAGIC 0x2F646576 /* = '/dev' */
+
+static struct inode_operations syaoran_file_inode_operations;
+static struct inode_operations syaoran_symlink_inode_operations;
+
+static void syaoran_put_super(struct super_block *sb);
+static int syaoran_initialize(struct super_block *sb, void *data);
+static void syaoran_make_initial_nodes(struct super_block *sb);
+static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev);
+static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags);
+static int syaoran_create_tracelog(struct super_block *sb,
+ const char *filename);
+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+ dev_t dev, int mac);
+
+static struct inode *syaoran_get_inode(struct super_block *sb, int mode,
+ dev_t dev)
+{
+ return __ramfs_get_inode(sb, mode, dev, 1);
+}
+
+static struct super_operations syaoran_ops = {
+ .statfs = simple_statfs,
+ .drop_inode = generic_delete_inode,
+ .put_super = syaoran_put_super,
+};
+
+/* Wraps blkdev_open() to trace open operation for block devices. */
+static int (*org_blkdev_open) (struct inode *inode, struct file *filp);
+static struct file_operations wrapped_def_blk_fops;
+
+static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
+{
+ int error = org_blkdev_open(inode, filp);
+ if (error != -ENXIO)
+ syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+/* Wraps chrdev_open() to trace open operation for character devices. */
+static int (*org_chrdev_open) (struct inode *inode, struct file *filp);
+static struct file_operations wrapped_def_chr_fops;
+
+static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
+{
+ int error = org_chrdev_open(inode, filp);
+ if (error != -ENXIO)
+ syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+struct dev_entry {
+ struct list_head list;
+ /* Binary form of pathname under mount point. Never NULL. */
+ char *name;
+ /*
+ * Mode and permissions. setuid/setgid/sticky bits are not supported.
+ */
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ dev_t kdev;
+ /*
+ * Binary form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ char *symlink_data;
+ /* File access control flags. */
+ unsigned int flags;
+ /* Text form of pathname under mount point. Never NULL. */
+ const char *printable_name;
+ /*
+ * Text form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ const char *printable_symlink_data;
+};
+
+struct syaoran_sb_info {
+ struct list_head list;
+ bool initialize_done; /* False if initialization is in progress. */
+ bool is_permissive_mode; /* True if permissive mode. */
+};
+
+static void syaoran_put_super(struct super_block *sb)
+{
+ struct syaoran_sb_info *info;
+ struct dev_entry *entry;
+ struct dev_entry *tmp;
+ if (!sb)
+ return;
+ info = (struct syaoran_sb_info *) sb->s_fs_info;
+ if (!info)
+ return;
+ list_for_each_entry_safe(entry, tmp, &info->list, list) {
+ kfree(entry->name);
+ kfree(entry->symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->printable_symlink_data);
+ list_del(&entry->list);
+ /* printk("Entry removed.\n"); */
+ kfree(entry);
+ }
+ kfree(info);
+ sb->s_fs_info = NULL;
+ printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
+}
+
+/* Get absolute pathname from mount point. */
+static int get_local_absolute_path(struct dentry *dentry, char *buffer,
+ int buflen)
+{
+ char *start = buffer;
+ char *end = buffer + buflen;
+ int namelen;
+
+ if (buflen < 256)
+ goto out;
+
+ *--end = '\0';
+ buflen--;
+ for (;;) {
+ struct dentry *parent;
+ if (IS_ROOT(dentry))
+ break;
+ parent = dentry->d_parent;
+ namelen = dentry->d_name.len;
+ buflen -= namelen + 1;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ *--end = '/';
+ dentry = parent;
+ }
+ if (*end == '/') {
+ buflen++;
+ end++;
+ }
+ namelen = dentry->d_name.len;
+ buflen -= namelen;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ memmove(start, end, strlen(end) + 1);
+ return 0;
+out:
+ return -ENOMEM;
+}
+
+/* Get absolute pathname of the given dentry from mount point. */
+static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
+ int newname_len)
+{
+ int error;
+ struct dentry *d_dentry;
+ if (!dentry || !newname || newname_len <= 0)
+ return -EINVAL;
+ d_dentry = dget(dentry);
+ /***** CRITICAL SECTION START *****/
+ spin_lock(&dcache_lock);
+ error = get_local_absolute_path(d_dentry, newname, newname_len);
+ spin_unlock(&dcache_lock);
+ /***** CRITICAL SECTION END *****/
+ dput(d_dentry);
+ return error;
+}
+
+static int syaoran_check_flags(struct syaoran_sb_info *info,
+ struct dentry *dentry, int mode, int dev,
+ unsigned int flags)
+{
+ int error = -EPERM;
+ struct dev_entry *entry;
+ /*
+ * Since local_realpath_from_dentry() holds dcache_lock,
+ * allocating buffer using kmalloc() won't help improving concurrency.
+ * Therefore, I use static buffer here.
+ */
+ static char filename[PAGE_SIZE];
+ static DEFINE_SPINLOCK(lock);
+ spin_lock(&lock);
+ memset(filename, 0, sizeof(filename));
+ if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1))
+ goto out;
+ list_for_each_entry(entry, &info->list, list) {
+ if ((mode & S_IFMT) != (entry->mode & S_IFMT))
+ continue;
+ if ((S_ISBLK(mode) || S_ISCHR(mode)) && dev != entry->kdev)
+ continue;
+ if (strcmp(entry->name, filename + 1))
+ continue;
+ if (info->is_permissive_mode) {
+ entry->flags |= flags;
+ error = 0;
+ } else {
+ if ((entry->flags & flags) == flags)
+ error = 0;
+ }
+ break;
+ }
+out:
+ if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {
+ const char *name;
+ const uid_t uid = current->fsuid;
+ const gid_t gid = current->fsgid;
+ const mode_t perm = mode & 0777;
+ flags &= ~DEVICE_USED;
+ {
+ char *end = filename + sizeof(filename) - 1;
+ const char *cp = strchr(filename, '\0') - 1;
+ while (cp > filename) {
+ const unsigned char c = *cp--;
+ if (c == '\\') {
+ *--end = '\\';
+ *--end = '\\';
+ } else if (c > ' ' && c < 127) {
+ *--end = c;
+ } else {
+ *--end = (c & 7) + '0';
+ *--end = ((c >> 3) & 7) + '0';
+ *--end = (c >> 6) + '0';
+ *--end = '\\';
+ }
+ }
+ name = end;
+ }
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFBLK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFIFO:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l', "unknown");
+ break;
+ case S_IFREG:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ }
+ spin_unlock(&lock);
+ return error;
+}
+
+/* Check whether the given dentry is allowed to mknod. */
+static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev)
+{
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (!info->initialize_done)
+ return 0;
+ return syaoran_check_flags(info, dentry, mode, dev, MAY_CREATE);
+}
+
+/* Check whether the given dentry is allowed to chmod/chown/unlink. */
+static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags)
+{
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (flags == DEVICE_USED && !info->is_permissive_mode)
+ return 0;
+ if (!dentry->d_inode)
+ return -ENOENT;
+ return syaoran_check_flags(info, dentry, dentry->d_inode->i_mode,
+ dentry->d_inode->i_rdev, flags);
+}
+
+static int ramfs_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
+{
+ struct inode *inode;
+ if (dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ inode = old_dentry->d_inode;
+ if (!inode ||
+ syaoran_may_create_node(dentry, inode->i_mode, inode->i_rdev))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+ return simple_link(old_dentry, dir, dentry);
+}
+
+static int ramfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+ if (dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ if (syaoran_may_modify_node(dentry, MAY_DELETE))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+ return simple_unlink(dir, dentry);
+}
+
+static int ramfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ struct inode *inode;
+ if (old_dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ inode = old_dentry->d_inode;
+ if (!inode || syaoran_may_modify_node(old_dentry, MAY_DELETE) ||
+ syaoran_may_create_node(new_dentry, inode->i_mode, inode->i_rdev))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+ return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
+static int ramfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ if (dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ if (syaoran_may_modify_node(dentry, MAY_DELETE))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+ return simple_rmdir(dir, dentry);
+}
+
+/*
+ * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
+ * Now I'm setting the field to share tmpfs/rootfs/SYAORAN code.
+ * Side effect is that the checking order of notify_change() has changed from
+ * inode_change_ok() -> security_inode_setattr() ->
+ * DQUOT_TRANSFER() -> inode_setattr()
+ * to
+ * security_inode_setattr() -> inode_change_ok() ->
+ * DQUOT_TRANSFER() -> inode_setattr()
+ *
+ * Is this change problematic? If problematic, I'll stop sharing the field.
+ */
+static int ramfs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ unsigned int ia_valid;
+ unsigned int flags = 0;
+ struct inode *inode = dentry->d_inode;
+ int error = inode_change_ok(inode, attr);
+ if (inode->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ ia_valid = attr->ia_valid;
+ if (ia_valid & (ATTR_UID | ATTR_GID))
+ flags |= MAY_CHOWN;
+ if (ia_valid & ATTR_MODE)
+ flags |= MAY_CHMOD;
+ if (syaoran_may_modify_node(dentry, flags))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+ if (!error) {
+ if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
+ (ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid))
+ error = DQUOT_TRANSFER(inode, attr) ? -EDQUOT : 0;
+ if (!error)
+ error = inode_setattr(inode, attr);
+ }
+ return error;
+}
+
+static struct inode_operations syaoran_file_inode_operations = {
+ .getattr = simple_getattr,
+ .setattr = ramfs_setattr,
+};
+
+static struct inode_operations syaoran_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = page_follow_link_light,
+ .put_link = page_put_link,
+ .setattr = ramfs_setattr,
+};
+
+/*
+ * The following codes are used for processing the policy file and
+ * creating initial nodes.
+ */
+
+/* lookup_create() without nameidata. Called only while initialization. */
+static struct dentry *lookup_create2(const char *name, struct dentry *base,
+ const bool is_dir)
+{
+ struct dentry *dentry;
+ const int len = name ? strlen(name) : 0;
+ mutex_lock(&base->d_inode->i_mutex);
+ dentry = lookup_one_len(name, base, len);
+ if (IS_ERR(dentry))
+ goto fail;
+ if (!is_dir && name[len] && !dentry->d_inode)
+ goto enoent;
+ return dentry;
+enoent:
+ dput(dentry);
+ dentry = ERR_PTR(-ENOENT);
+fail:
+ return dentry;
+}
+
+/* mkdir(). Called only while initialization. */
+static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
+ uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 1);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mkdir(base->d_inode, dentry, mode);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/* mknod(). Called only while initialization. */
+static int fs_mknod(const char *filename, struct dentry *base, int mode,
+ dev_t dev, uid_t user, gid_t group)
+{
+ struct dentry *dentry;
+ int error;
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ case S_IFREG:
+ break;
+ default:
+ return -EPERM;
+ }
+ dentry = lookup_create2(filename, base, 0);
+ error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mknod(base->d_inode, dentry, mode, dev);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/* symlink(). Called only while initialization. */
+static int fs_symlink(const char *pathname, struct dentry *base,
+ char *oldname, int mode, uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 0);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_mode = mode;
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/*
+ * Format string.
+ * Leading and trailing whitespaces are removed.
+ * Multiple whitespaces are packed into single space.
+ */
+static void syaoran_normalize_line(unsigned char *buffer)
+{
+ unsigned char *sp = buffer;
+ unsigned char *dp = buffer;
+ bool first = 1;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ while (*sp) {
+ if (!first)
+ *dp++ = ' ';
+ first = 0;
+ while (*sp > ' ' && *sp < 127)
+ *dp++ = *sp++;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ }
+ *dp = '\0';
+}
+
+/* Convert text form of filename into binary form. */
+static void syaoran_unescape(char *filename)
+{
+ char *cp = filename;
+ char c, d, e;
+ if (!cp)
+ return;
+ while ((c = *filename++) != '\0') {
+ if (c != '\\') {
+ *cp++ = c;
+ continue;
+ }
+ if ((c = *filename++) == '\\') {
+ *cp++ = c;
+ continue;
+ }
+ if (c < '0' || c > '3')
+ break;
+ d = *filename++;
+ if (d < '0' || d > '7')
+ break;
+ e = *filename++;
+ if (e < '0' || e > '7')
+ break;
+ *(unsigned char *) cp++ = (unsigned char)
+ (((unsigned char) (c - '0') << 6) +
+ ((unsigned char) (d - '0') << 3) +
+ (unsigned char) (e - '0'));
+ }
+ *cp = '\0';
+}
+
+static inline char *strdup(const char *data)
+{
+ return kstrdup(data, GFP_KERNEL);
+}
+
+static int register_node_info(char *buffer, struct super_block *sb)
+{
+ enum {
+ ARG_FILENAME = 0,
+ ARG_PERMISSION = 1,
+ ARG_UID = 2,
+ ARG_GID = 3,
+ ARG_FLAGS = 4,
+ ARG_DEV_TYPE = 5,
+ ARG_SYMLINK_DATA = 6,
+ ARG_DEV_MAJOR = 6,
+ ARG_DEV_MINOR = 7,
+ MAX_ARG = 8
+ };
+ char *args[MAX_ARG];
+ int i;
+ int error = -EINVAL;
+ unsigned int perm, uid, gid, flags, major = 0, minor = 0;
+ struct syaoran_sb_info *info = (struct syaoran_sb_info *) sb->s_fs_info;
+ struct dev_entry *entry;
+ memset(args, 0, sizeof(args));
+ args[0] = buffer;
+ for (i = 1; i < MAX_ARG; i++) {
+ args[i] = strchr(args[i - 1] + 1, ' ');
+ if (!args[i])
+ break;
+ *args[i]++ = '\0';
+ }
+ /*
+ printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
+ args[0], args[1], args[2], args[3], args[4], args[5],
+ args[6], args[7]);
+ */
+ if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
+ !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
+ goto out;
+ if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
+ || sscanf(args[ARG_UID], "%u", &uid) != 1
+ || sscanf(args[ARG_GID], "%u", &gid) != 1
+ || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
+ || *(args[ARG_DEV_TYPE] + 1))
+ goto out;
+ switch (*args[ARG_DEV_TYPE]) {
+ case 'c':
+ perm |= S_IFCHR;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'b':
+ perm |= S_IFBLK;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'l':
+ perm |= S_IFLNK;
+ if (!args[ARG_SYMLINK_DATA])
+ goto out;
+ break;
+ case 'd':
+ perm |= S_IFDIR;
+ break;
+ case 's':
+ perm |= S_IFSOCK;
+ break;
+ case 'p':
+ perm |= S_IFIFO;
+ break;
+ case 'f':
+ perm |= S_IFREG;
+ break;
+ default:
+ goto out;
+ }
+ error = -ENOMEM;
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ goto out;
+ if (S_ISLNK(perm)) {
+ entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
+ if (!entry->printable_symlink_data)
+ goto out_freemem;
+ }
+ entry->printable_name = strdup(args[ARG_FILENAME]);
+ if (!entry->printable_name)
+ goto out_freemem;
+ if (S_ISLNK(perm)) {
+ entry->symlink_data = strdup(entry->printable_symlink_data);
+ if (!entry->symlink_data)
+ goto out_freemem;
+ syaoran_unescape(entry->symlink_data);
+ }
+ entry->name = strdup(entry->printable_name);
+ if (!entry->name)
+ goto out_freemem;
+ syaoran_unescape(entry->name);
+ /*
+ * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
+ * trailing '/'.
+ */
+ i = strlen(entry->name);
+ if (i && entry->name[i - 1] == '/')
+ entry->name[i - 1] = '\0';
+ entry->mode = perm;
+ entry->uid = uid;
+ entry->gid = gid;
+ entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
+ entry->flags = flags;
+ list_add_tail(&entry->list, &info->list);
+ /* printk("Entry added.\n"); */
+ error = 0;
+out:
+ return error;
+out_freemem:
+ kfree(entry->printable_symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->symlink_data);
+ kfree(entry);
+ goto out;
+}
+
+static int read_config_file(struct file *file, struct super_block *sb)
+{
+ char *buffer;
+ int error = -ENOMEM;
+ if (!file)
+ return -EINVAL;
+ buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (buffer) {
+ int len;
+ char *cp;
+ unsigned long offset = 0;
+ while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
+ && (cp = memchr(buffer, '\n', len)) != NULL) {
+ *cp = '\0';
+ offset += cp - buffer + 1;
+ syaoran_normalize_line(buffer);
+ if (register_node_info(buffer, sb) == -ENOMEM)
+ goto out;
+ }
+ error = 0;
+ }
+out:
+ kfree(buffer);
+ return error;
+}
+
+static void make_node(struct dev_entry *entry, struct dentry *root)
+{
+ struct dentry *base = dget(root);
+ char *filename = entry->name;
+ char *name = filename;
+ unsigned int c;
+ const mode_t perm = entry->mode;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ goto start;
+ while ((c = *(unsigned char *) filename) != '\0') {
+ if (c == '/') {
+ struct dentry *new_base;
+ const int len = filename - name;
+ *filename = '\0';
+ mutex_lock(&base->d_inode->i_mutex);
+ new_base = lookup_one_len(name, base, len);
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ *filename = '/';
+ filename++;
+ if (IS_ERR(new_base))
+ return;
+ if (!new_base->d_inode ||
+ !S_ISDIR(new_base->d_inode->i_mode)) {
+ dput(new_base);
+ return;
+ }
+ base = new_base;
+start:
+ name = filename;
+ } else {
+ filename++;
+ }
+ }
+ filename = (char *) name;
+ if (S_ISLNK(perm)) {
+ fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
+ } else if (S_ISDIR(perm)) {
+ fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
+ } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
+ fs_mknod(filename, base, perm, 0, uid, gid);
+ } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
+ fs_mknod(filename, base, perm, entry->kdev, uid, gid);
+ }
+ dput(base);
+}
+
+/* Create files according to the policy file. */
+static void syaoran_make_initial_nodes(struct super_block *sb)
+{
+ struct syaoran_sb_info *info;
+ struct dev_entry *entry;
+ if (!sb)
+ return;
+ info = (struct syaoran_sb_info *) sb->s_fs_info;
+ if (!info)
+ return;
+ if (info->is_permissive_mode) {
+ syaoran_create_tracelog(sb, ".syaoran");
+ syaoran_create_tracelog(sb, ".syaoran_all");
+ }
+ list_for_each_entry(entry, &info->list, list) {
+ if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
+ make_node(entry, sb->s_root);
+ }
+ info->initialize_done = 1;
+}
+
+/* Read policy file. */
+static int syaoran_initialize(struct super_block *sb, void *data)
+{
+ int error = -EINVAL;
+ static bool first = 1;
+ if (first) {
+ first = 0;
+ printk(KERN_INFO "SYAORAN: 1.5.3-pre 2007/12/23\n");
+ }
+ {
+ struct inode *inode = new_inode(sb);
+ if (!inode)
+ return -EINVAL;
+ /* Create /dev/ram0 to get the value of blkdev_open(). */
+ init_special_inode(inode, S_IFBLK | 0666, MKDEV(1, 0));
+ wrapped_def_blk_fops = *inode->i_fop;
+ iput(inode);
+ org_blkdev_open = wrapped_def_blk_fops.open;
+ wrapped_def_blk_fops.open = wrapped_blkdev_open;
+ }
+ {
+ struct inode *inode = new_inode(sb);
+ if (!inode)
+ return -EINVAL;
+ /* Create /dev/null to get the value of chrdev_open(). */
+ init_special_inode(inode, S_IFCHR | 0666, MKDEV(1, 3));
+ wrapped_def_chr_fops = *inode->i_fop;
+ iput(inode);
+ org_chrdev_open = wrapped_def_chr_fops.open;
+ wrapped_def_chr_fops.open = wrapped_chrdev_open;
+ }
+ if (data) {
+ struct file *f;
+ char *filename = (char *) data;
+ bool is_permissive_mode = 0;
+ if (strncmp(filename, "accept=", 7) == 0) {
+ filename += 7;
+ is_permissive_mode = 1;
+ } else if (strncmp(filename, "enforce=", 8) == 0) {
+ filename += 8;
+ is_permissive_mode = 0;
+ } else {
+ printk(KERN_INFO
+ "SYAORAN: Missing 'accept=' or 'enforce='.\n");
+ return -EINVAL;
+ }
+ f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
+ if (!IS_ERR(f)) {
+ struct syaoran_sb_info *p;
+ if (!S_ISREG(f->f_dentry->d_inode->i_mode))
+ goto out;
+ p = kzalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ goto out;
+ p->is_permissive_mode = is_permissive_mode;
+ sb->s_fs_info = p;
+ INIT_LIST_HEAD(&((struct syaoran_sb_info *)
+ sb->s_fs_info)->list);
+ printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
+ error = read_config_file(f, sb);
+out:
+ if (error)
+ printk(KERN_INFO "SYAORAN: Can't read '%s'\n",
+ filename);
+ filp_close(f, NULL);
+ } else {
+ printk(KERN_INFO "SYAORAN: Can't open '%s'\n",
+ filename);
+ }
+ } else {
+ printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
+ }
+ return error;
+}
+
+/*
+ * The following structure and codes are used for transferring data
+ * to interfaces files.
+ */
+
+struct syaoran_read_struct {
+ char *buf; /* Buffer for reading. */
+ int avail; /* Bytes available for reading. */
+ struct super_block *sb; /* The super_block of this partition. */
+ struct dev_entry *entry; /* The entry currently reading from. */
+ _Bool read_all; /* Dump all entries? */
+ struct list_head *pos; /* Current position. */
+};
+
+static void syaoran_read_table(struct syaoran_read_struct *head, char *buf,
+ int count)
+{
+ struct super_block *sb = head->sb;
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) sb->s_fs_info;
+ struct list_head *pos;
+ const _Bool read_all = head->read_all;
+ if (!info)
+ return;
+ if (!head->pos)
+ return;
+ list_for_each_cookie(pos, head->pos, &info->list) {
+ struct dev_entry *entry =
+ list_entry(pos, struct dev_entry, list);
+ const unsigned int flags =
+ read_all ? entry->flags : entry->flags & ~DEVICE_USED;
+ const char *name = entry->printable_name;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ const mode_t perm = entry->mode & 0777;
+ int len = 0;
+ switch (entry->mode & S_IFMT) {
+ case S_IFCHR:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFBLK:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFIFO:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l',
+ entry->printable_symlink_data);
+ break;
+ case S_IFREG:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ if (len < 0 || count <= len)
+ break;
+ count -= len;
+ buf += len;
+ head->avail += len;
+ }
+}
+
+static int syaoran_trace_open(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head =
+ kzalloc(sizeof(*head), GFP_KERNEL);
+ if (!head)
+ return -ENOMEM;
+ head->sb = inode->i_sb;
+ head->read_all =
+ (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
+ head->pos = &((struct syaoran_sb_info *) head->sb->s_fs_info)->list;
+ head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
+ if (!head->buf) {
+ kfree(head);
+ return -ENOMEM;
+ }
+ file->private_data = head;
+ return 0;
+}
+
+static int syaoran_trace_release(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head = file->private_data;
+ kfree(head->buf);
+ kfree(head);
+ file->private_data = NULL;
+ return 0;
+}
+
+static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct syaoran_read_struct *head =
+ (struct syaoran_read_struct *) file->private_data;
+ int len = head->avail;
+ char *cp = head->buf;
+ if (!access_ok(VERIFY_WRITE, buf, count))
+ return -EFAULT;
+ syaoran_read_table(head, cp + len, PAGE_SIZE * 2 - len);
+ len = head->avail;
+ if (len > count)
+ len = count;
+ if (len > 0) {
+ if (copy_to_user(buf, cp, len))
+ return -EFAULT;
+ head->avail -= len;
+ memmove(cp, cp + len, head->avail);
+ }
+ return len;
+}
+
+static struct file_operations syaoran_trace_operations = {
+ .open = syaoran_trace_open,
+ .release = syaoran_trace_release,
+ .read = syaoran_trace_read,
+};
+
+/* Create interface files for reading status. */
+static int syaoran_create_tracelog(struct super_block *sb, const char *filename)
+{
+ struct inode *inode;
+ struct dentry *base = dget(sb->s_root);
+ struct dentry *dentry = lookup_create2(filename, base, 0);
+ int error = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
+ goto out;
+ inode = syaoran_get_inode(sb, S_IFREG | 0400, 0);
+ if (!inode)
+ error = -ENOSPC;
+ else {
+ /* Override file operation. */
+ inode->i_fop = &syaoran_trace_operations;
+ d_instantiate(dentry, inode);
+ dget(dentry); /* Extra count - pin the dentry in core */
+ error = 0;
+ }
+ dput(dentry);
+out:
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ return error;
+}


2007-12-31 20:02:58

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Quoting Tetsuo Handa ([email protected]):
> Hello.
>
> Thank you for attending discussion for previous posting
> (starting from http://lkml.org/lkml/2007/12/16/23 ).
>
> The previous posting was for feasibility test to know
> whether this kind of trivial filesystem is acceptable for mainline.
>
> Now, it seems that there is a little chance for accepting.
> Therefore I rebased the patch using the -mm tree.
>
> Regards.
> ----------
> Subject: Simple tamper-proof device filesystem.
>
> The goal of this filesystem is to guarantee that
> "applications using well-known device locations under /dev
> get the device they want" (e.g. an application that accesses /dev/null can
> always get a character special device with major=1 and minor=3).
>
> This idea sounds silly? Indeed, if you think the root can do whatever
> he/she wants do do. But this filesystem makes sense when used with
> access control mechanisms like MAC (mandatory access control).
> I want to use this filesystem in case where a process with root privilege was
> hijacked but the behavior of the hijacked process is still restricted by MAC.
>
> Why not use FUSE?
>
> Because /dev has to be available through the lifetime of the kernel.
> It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.
>
> Why not use SELinux?
>
> Because SELinux doesn't guarantee filename and its attribute.
> As far as I know, no MAC implementation can handle filename and its attribute.
> I guess this is because
>
> Filename and its attributes pairs are conventionally considered as
> constant and reliable.
>
> It makes the MAC's policy syntax complicated to describe this attribute
> enforcement information in MAC's policy.
>
> I want to add functionality that the MACs are missing.
> Instead of adding this functionality per MAC,
> I propose to add it as ground work, to be combined with any MAC.
>
> Why not drop CAP_MKNOD?
>
> Dropping CAP_MKNOD is not enough for emulating this filesystem because
> a process can still rename()/unlink() to break filename and its attributes
> handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
> mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).
>
> This time, I'm implementing this filesystem as an extension to tmpfs
> because what this filesystem does are nothing but check filename and
> its attributes in addition to what tmpfs does.
>
> Signed-off-by: Tetsuo Handa <[email protected]>
> ---
> fs/ramfs/inode.c | 101 ++++-
> fs/ramfs/syaoran.h | 1066 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 1160 insertions(+), 7 deletions(-)
>
> --- linux-2.6-mm.orig/fs/ramfs/inode.c
> +++ linux-2.6-mm/fs/ramfs/inode.c
> @@ -35,6 +35,7 @@
> #include <linux/sched.h>
> #include <asm/uaccess.h>
> #include "internal.h"
> +#include "syaoran.h"
>
> /* some random number */
> #define RAMFS_MAGIC 0x858458f6
> @@ -49,7 +50,8 @@ static struct backing_dev_info ramfs_bac
> BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP,
> };
>
> -struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
> +struct inode *__ramfs_get_inode(struct super_block *sb, int mode, dev_t dev,
> + const int mac)
> {
> struct inode * inode = new_inode(sb);
>
> @@ -65,10 +67,19 @@ struct inode *ramfs_get_inode(struct sup
> switch (mode & S_IFMT) {
> default:
> init_special_inode(inode, mode, dev);
> + if (mac) {
> + if (S_ISBLK(mode))
> + inode->i_fop = &wrapped_def_blk_fops;
> + else if (S_ISCHR(mode))
> + inode->i_fop = &wrapped_def_chr_fops;
> + inode->i_op = &syaoran_file_inode_operations;
> + }
> break;
> case S_IFREG:
> inode->i_op = &ramfs_file_inode_operations;
> inode->i_fop = &ramfs_file_operations;
> + if (mac)
> + inode->i_op = &syaoran_file_inode_operations;
> break;
> case S_IFDIR:
> inode->i_op = &ramfs_dir_inode_operations;
> @@ -79,12 +90,19 @@ struct inode *ramfs_get_inode(struct sup
> break;
> case S_IFLNK:
> inode->i_op = &page_symlink_inode_operations;
> + if (mac)
> + inode->i_op = &syaoran_symlink_inode_operations;
> break;
> }
> }
> return inode;
> }
>
> +struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
> +{
> + return __ramfs_get_inode(sb, mode, dev, 0);
> +}
> +
> /*
> * File creation. Allocate an inode, and we're done..
> */
> @@ -92,9 +110,17 @@ struct inode *ramfs_get_inode(struct sup
> static int
> ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
> {
> - struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
> + struct inode *inode;
> int error = -ENOSPC;
>
> + /*** SYAORAN start. ***/
> + if (dir->i_sb->s_op == &syaoran_ops) {
> + if (syaoran_may_create_node(dentry, mode, dev) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, mode, dev);
> + /*** SYAORAN end. ***/
> + } else
> + inode = ramfs_get_inode(dir->i_sb, mode, dev);
> if (inode) {
> if (dir->i_mode & S_ISGID) {
> inode->i_gid = dir->i_gid;
> @@ -127,7 +153,14 @@ static int ramfs_symlink(struct inode *
> struct inode *inode;
> int error = -ENOSPC;
>
> - inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> + /*** SYAORAN start. ***/
> + if (dir->i_sb->s_op == &syaoran_ops) {
> + if (syaoran_may_create_node(dentry, S_IFLNK, 0) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> + /*** SYAORAN end. ***/
> + } else
> + inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> if (inode) {
> int l = strlen(symname)+1;
> error = page_symlink(inode, symname, l);
> @@ -146,13 +179,14 @@ static int ramfs_symlink(struct inode *
> static const struct inode_operations ramfs_dir_inode_operations = {
> .create = ramfs_create,
> .lookup = simple_lookup,
> - .link = simple_link,
> - .unlink = simple_unlink,
> + .link = ramfs_link,
> + .unlink = ramfs_unlink,
> .symlink = ramfs_symlink,
> .mkdir = ramfs_mkdir,
> - .rmdir = simple_rmdir,
> + .rmdir = ramfs_rmdir,
> .mknod = ramfs_mknod,
> - .rename = simple_rename,
> + .rename = ramfs_rename,
> + .setattr = ramfs_setattr,
> };
>
> static const struct super_operations ramfs_ops = {
> @@ -184,6 +218,35 @@ static int ramfs_fill_super(struct super
> return 0;
> }
>
> +static int syaoran_fill_super(struct super_block *sb, void *data, int silent)
> +{
> + struct inode *inode;
> + struct dentry *root;
> + int error;
> +
> + sb->s_maxbytes = MAX_LFS_FILESIZE;
> + sb->s_blocksize = PAGE_CACHE_SIZE;
> + sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
> + sb->s_magic = SYAORAN_MAGIC;
> + sb->s_op = &syaoran_ops;
> + sb->s_time_gran = 1;
> + error = syaoran_initialize(sb, data);
> + if (error < 0)
> + return error;
> + inode = syaoran_get_inode(sb, S_IFDIR | 0755, 0);
> + if (!inode)
> + return -ENOMEM;
> +
> + root = d_alloc_root(inode);
> + if (!root) {
> + iput(inode);
> + return -ENOMEM;
> + }
> + sb->s_root = root;
> + syaoran_make_initial_nodes(sb);
> + return 0;
> +}
> +
> int ramfs_get_sb(struct file_system_type *fs_type,
> int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> {
> @@ -197,6 +260,13 @@ static int rootfs_get_sb(struct file_sys
> mnt);
> }
>
> +static int syaoran_get_sb(struct file_system_type *fs_type, int flags,
> + const char *dev_name, void *data,
> + struct vfsmount *mnt)
> +{
> + return get_sb_nodev(fs_type, flags, data, syaoran_fill_super, mnt);
> +}
> +
> static struct file_system_type ramfs_fs_type = {
> .name = "ramfs",
> .get_sb = ramfs_get_sb,
> @@ -207,6 +277,11 @@ static struct file_system_type rootfs_fs
> .get_sb = rootfs_get_sb,
> .kill_sb = kill_litter_super,
> };
> +static struct file_system_type syaoran_fs_type = {
> + .name = "syaoran",
> + .get_sb = syaoran_get_sb,
> + .kill_sb = kill_litter_super,
> +};
>
> static int __init init_ramfs_fs(void)
> {
> @@ -237,3 +312,15 @@ int __init init_rootfs(void)
> }
>
> MODULE_LICENSE("GPL");
> +
> +static int __init init_syaoran_fs(void)
> +{
> + return register_filesystem(&syaoran_fs_type);
> +}
> +
> +static void __exit exit_syaoran_fs(void)
> +{
> + unregister_filesystem(&syaoran_fs_type);
> +}
> +module_init(init_syaoran_fs);
> +module_exit(exit_syaoran_fs);
> --- /dev/null
> +++ linux-2.6-mm/fs/ramfs/syaoran.h
> @@ -0,0 +1,1066 @@
> +/*
> + * fs/ramfs/syaoran.h
> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Copyright (C) 2005-2007 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2007/12/23
> + */
> +
> +#include <linux/namei.h>
> +#include <linux/mm.h>
> +#include <linux/quotaops.h>
> +
> +#define list_for_each_cookie(pos, cookie, head) \
> + for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
> + prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
> + (cookie) = pos, pos = pos->next)
> +
> +/* The following constants are used to restrict operations.*/
> +
> +#define MAY_CREATE 1 /* This file is allowed to be mknod()ed. */
> +#define MAY_DELETE 2 /* This file is allowed to be unlink()ed. */
> +#define MAY_CHMOD 4 /* This file is allowed to be chmod()ed. */
> +#define MAY_CHOWN 8 /* This file is allowed to be chown()ed. */
> +#define DEVICE_USED 16 /* This block or character device file is used. */
> +#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
> +
> +/* some random number */
> +#define SYAORAN_MAGIC 0x2F646576 /* = '/dev' */
> +
> +static struct inode_operations syaoran_file_inode_operations;
> +static struct inode_operations syaoran_symlink_inode_operations;
> +
> +static void syaoran_put_super(struct super_block *sb);
> +static int syaoran_initialize(struct super_block *sb, void *data);
> +static void syaoran_make_initial_nodes(struct super_block *sb);
> +static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev);
> +static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags);
> +static int syaoran_create_tracelog(struct super_block *sb,
> + const char *filename);
> +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
> + dev_t dev, int mac);
> +
> +static struct inode *syaoran_get_inode(struct super_block *sb, int mode,
> + dev_t dev)
> +{
> + return __ramfs_get_inode(sb, mode, dev, 1);

To integrate this nicer into tmpfs, at least define TMPFS_IS_MAC as 1
and TMPFS_NOT_MAC as 0 and pass those values instead of just 1 and 0.

> +}
> +
> +static struct super_operations syaoran_ops = {
> + .statfs = simple_statfs,
> + .drop_inode = generic_delete_inode,
> + .put_super = syaoran_put_super,
> +};
> +
> +/* Wraps blkdev_open() to trace open operation for block devices. */
> +static int (*org_blkdev_open) (struct inode *inode, struct file *filp);
> +static struct file_operations wrapped_def_blk_fops;

Again, I should think you'd actually want to take blkdev_open() from
fs/block_dev.c and chrdev_open() from fs/char_dev.c. Surely your
method of grabbing it here is not acceptable for upstream code.

That's all I've got for now - though if you'd just break up some of
these functions - especially syaoran_initialize() with it's set of {}
blocks, it would help.

thanks,
-serge


> +
> +static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
> +{
> + int error = org_blkdev_open(inode, filp);
> + if (error != -ENXIO)
> + syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
> + return error;
> +}
> +
> +/* Wraps chrdev_open() to trace open operation for character devices. */
> +static int (*org_chrdev_open) (struct inode *inode, struct file *filp);
> +static struct file_operations wrapped_def_chr_fops;
> +
> +static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
> +{
> + int error = org_chrdev_open(inode, filp);
> + if (error != -ENXIO)
> + syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
> + return error;
> +}
> +
> +struct dev_entry {
> + struct list_head list;
> + /* Binary form of pathname under mount point. Never NULL. */
> + char *name;
> + /*
> + * Mode and permissions. setuid/setgid/sticky bits are not supported.
> + */
> + mode_t mode;
> + uid_t uid;
> + gid_t gid;
> + dev_t kdev;
> + /*
> + * Binary form of initial contents for the symlink.
> + * NULL if not symlink.
> + */
> + char *symlink_data;
> + /* File access control flags. */
> + unsigned int flags;
> + /* Text form of pathname under mount point. Never NULL. */
> + const char *printable_name;
> + /*
> + * Text form of initial contents for the symlink.
> + * NULL if not symlink.
> + */
> + const char *printable_symlink_data;
> +};
> +
> +struct syaoran_sb_info {
> + struct list_head list;
> + bool initialize_done; /* False if initialization is in progress. */
> + bool is_permissive_mode; /* True if permissive mode. */
> +};
> +
> +static void syaoran_put_super(struct super_block *sb)
> +{
> + struct syaoran_sb_info *info;
> + struct dev_entry *entry;
> + struct dev_entry *tmp;
> + if (!sb)
> + return;
> + info = (struct syaoran_sb_info *) sb->s_fs_info;
> + if (!info)
> + return;
> + list_for_each_entry_safe(entry, tmp, &info->list, list) {
> + kfree(entry->name);
> + kfree(entry->symlink_data);
> + kfree(entry->printable_name);
> + kfree(entry->printable_symlink_data);
> + list_del(&entry->list);
> + /* printk("Entry removed.\n"); */
> + kfree(entry);
> + }
> + kfree(info);
> + sb->s_fs_info = NULL;
> + printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
> +}
> +
> +/* Get absolute pathname from mount point. */
> +static int get_local_absolute_path(struct dentry *dentry, char *buffer,
> + int buflen)
> +{
> + char *start = buffer;
> + char *end = buffer + buflen;
> + int namelen;
> +
> + if (buflen < 256)
> + goto out;
> +
> + *--end = '\0';
> + buflen--;
> + for (;;) {
> + struct dentry *parent;
> + if (IS_ROOT(dentry))
> + break;
> + parent = dentry->d_parent;
> + namelen = dentry->d_name.len;
> + buflen -= namelen + 1;
> + if (buflen < 0)
> + goto out;
> + end -= namelen;
> + memcpy(end, dentry->d_name.name, namelen);
> + *--end = '/';
> + dentry = parent;
> + }
> + if (*end == '/') {
> + buflen++;
> + end++;
> + }
> + namelen = dentry->d_name.len;
> + buflen -= namelen;
> + if (buflen < 0)
> + goto out;
> + end -= namelen;
> + memcpy(end, dentry->d_name.name, namelen);
> + memmove(start, end, strlen(end) + 1);
> + return 0;
> +out:
> + return -ENOMEM;
> +}
> +
> +/* Get absolute pathname of the given dentry from mount point. */
> +static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
> + int newname_len)
> +{
> + int error;
> + struct dentry *d_dentry;
> + if (!dentry || !newname || newname_len <= 0)
> + return -EINVAL;
> + d_dentry = dget(dentry);
> + /***** CRITICAL SECTION START *****/
> + spin_lock(&dcache_lock);
> + error = get_local_absolute_path(d_dentry, newname, newname_len);
> + spin_unlock(&dcache_lock);
> + /***** CRITICAL SECTION END *****/
> + dput(d_dentry);
> + return error;
> +}
> +
> +static int syaoran_check_flags(struct syaoran_sb_info *info,
> + struct dentry *dentry, int mode, int dev,
> + unsigned int flags)
> +{
> + int error = -EPERM;
> + struct dev_entry *entry;
> + /*
> + * Since local_realpath_from_dentry() holds dcache_lock,
> + * allocating buffer using kmalloc() won't help improving concurrency.
> + * Therefore, I use static buffer here.
> + */
> + static char filename[PAGE_SIZE];
> + static DEFINE_SPINLOCK(lock);
> + spin_lock(&lock);
> + memset(filename, 0, sizeof(filename));
> + if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1))
> + goto out;
> + list_for_each_entry(entry, &info->list, list) {
> + if ((mode & S_IFMT) != (entry->mode & S_IFMT))
> + continue;
> + if ((S_ISBLK(mode) || S_ISCHR(mode)) && dev != entry->kdev)
> + continue;
> + if (strcmp(entry->name, filename + 1))
> + continue;
> + if (info->is_permissive_mode) {
> + entry->flags |= flags;
> + error = 0;
> + } else {
> + if ((entry->flags & flags) == flags)
> + error = 0;
> + }
> + break;
> + }
> +out:
> + if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {
> + const char *name;
> + const uid_t uid = current->fsuid;
> + const gid_t gid = current->fsgid;
> + const mode_t perm = mode & 0777;
> + flags &= ~DEVICE_USED;
> + {
> + char *end = filename + sizeof(filename) - 1;
> + const char *cp = strchr(filename, '\0') - 1;
> + while (cp > filename) {
> + const unsigned char c = *cp--;
> + if (c == '\\') {
> + *--end = '\\';
> + *--end = '\\';
> + } else if (c > ' ' && c < 127) {
> + *--end = c;
> + } else {
> + *--end = (c & 7) + '0';
> + *--end = ((c >> 3) & 7) + '0';
> + *--end = (c >> 6) + '0';
> + *--end = '\\';
> + }
> + }
> + name = end;
> + }
> + switch (mode & S_IFMT) {
> + case S_IFCHR:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'c',
> + MAJOR(dev), MINOR(dev));
> + break;
> + case S_IFBLK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'b',
> + MAJOR(dev), MINOR(dev));
> + break;
> + case S_IFIFO:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'p');
> + break;
> + case S_IFSOCK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 's');
> + break;
> + case S_IFDIR:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'd');
> + break;
> + case S_IFLNK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
> + name, perm, uid, gid, flags, 'l', "unknown");
> + break;
> + case S_IFREG:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'f');
> + break;
> + }
> + }
> + spin_unlock(&lock);
> + return error;
> +}
> +
> +/* Check whether the given dentry is allowed to mknod. */
> +static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev)
> +{
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
> + if (!info) {
> + printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
> + __FUNCTION__);
> + return -EPERM;
> + }
> + if (!info->initialize_done)
> + return 0;
> + return syaoran_check_flags(info, dentry, mode, dev, MAY_CREATE);
> +}
> +
> +/* Check whether the given dentry is allowed to chmod/chown/unlink. */
> +static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags)
> +{
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
> + if (!info) {
> + printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
> + __FUNCTION__);
> + return -EPERM;
> + }
> + if (flags == DEVICE_USED && !info->is_permissive_mode)
> + return 0;
> + if (!dentry->d_inode)
> + return -ENOENT;
> + return syaoran_check_flags(info, dentry, dentry->d_inode->i_mode,
> + dentry->d_inode->i_rdev, flags);
> +}
> +
> +static int ramfs_link(struct dentry *old_dentry, struct inode *dir,
> + struct dentry *dentry)
> +{
> + struct inode *inode;
> + if (dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + inode = old_dentry->d_inode;
> + if (!inode ||
> + syaoran_may_create_node(dentry, inode->i_mode, inode->i_rdev))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> + return simple_link(old_dentry, dir, dentry);
> +}
> +
> +static int ramfs_unlink(struct inode *dir, struct dentry *dentry)
> +{
> + if (dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + if (syaoran_may_modify_node(dentry, MAY_DELETE))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> + return simple_unlink(dir, dentry);
> +}
> +
> +static int ramfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> +{
> + struct inode *inode;
> + if (old_dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + inode = old_dentry->d_inode;
> + if (!inode || syaoran_may_modify_node(old_dentry, MAY_DELETE) ||
> + syaoran_may_create_node(new_dentry, inode->i_mode, inode->i_rdev))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> + return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
> +}
> +
> +static int ramfs_rmdir(struct inode *dir, struct dentry *dentry)
> +{
> + if (dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + if (syaoran_may_modify_node(dentry, MAY_DELETE))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> + return simple_rmdir(dir, dentry);
> +}
> +
> +/*
> + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
> + * Now I'm setting the field to share tmpfs/rootfs/SYAORAN code.
> + * Side effect is that the checking order of notify_change() has changed from
> + * inode_change_ok() -> security_inode_setattr() ->
> + * DQUOT_TRANSFER() -> inode_setattr()
> + * to
> + * security_inode_setattr() -> inode_change_ok() ->
> + * DQUOT_TRANSFER() -> inode_setattr()
> + *
> + * Is this change problematic? If problematic, I'll stop sharing the field.
> + */
> +static int ramfs_setattr(struct dentry *dentry, struct iattr *attr)
> +{
> + unsigned int ia_valid;
> + unsigned int flags = 0;
> + struct inode *inode = dentry->d_inode;
> + int error = inode_change_ok(inode, attr);
> + if (inode->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + ia_valid = attr->ia_valid;
> + if (ia_valid & (ATTR_UID | ATTR_GID))
> + flags |= MAY_CHOWN;
> + if (ia_valid & ATTR_MODE)
> + flags |= MAY_CHMOD;
> + if (syaoran_may_modify_node(dentry, flags))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> + if (!error) {
> + if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
> + (ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid))
> + error = DQUOT_TRANSFER(inode, attr) ? -EDQUOT : 0;
> + if (!error)
> + error = inode_setattr(inode, attr);
> + }
> + return error;
> +}
> +
> +static struct inode_operations syaoran_file_inode_operations = {
> + .getattr = simple_getattr,
> + .setattr = ramfs_setattr,
> +};
> +
> +static struct inode_operations syaoran_symlink_inode_operations = {
> + .readlink = generic_readlink,
> + .follow_link = page_follow_link_light,
> + .put_link = page_put_link,
> + .setattr = ramfs_setattr,
> +};
> +
> +/*
> + * The following codes are used for processing the policy file and
> + * creating initial nodes.
> + */
> +
> +/* lookup_create() without nameidata. Called only while initialization. */
> +static struct dentry *lookup_create2(const char *name, struct dentry *base,
> + const bool is_dir)
> +{
> + struct dentry *dentry;
> + const int len = name ? strlen(name) : 0;
> + mutex_lock(&base->d_inode->i_mutex);
> + dentry = lookup_one_len(name, base, len);
> + if (IS_ERR(dentry))
> + goto fail;
> + if (!is_dir && name[len] && !dentry->d_inode)
> + goto enoent;
> + return dentry;
> +enoent:
> + dput(dentry);
> + dentry = ERR_PTR(-ENOENT);
> +fail:
> + return dentry;
> +}
> +
> +/* mkdir(). Called only while initialization. */
> +static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
> + uid_t user, gid_t group)
> +{
> + struct dentry *dentry = lookup_create2(pathname, base, 1);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_mkdir(base->d_inode, dentry, mode);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/* mknod(). Called only while initialization. */
> +static int fs_mknod(const char *filename, struct dentry *base, int mode,
> + dev_t dev, uid_t user, gid_t group)
> +{
> + struct dentry *dentry;
> + int error;
> + switch (mode & S_IFMT) {
> + case S_IFCHR:
> + case S_IFBLK:
> + case S_IFIFO:
> + case S_IFSOCK:
> + case S_IFREG:
> + break;
> + default:
> + return -EPERM;
> + }
> + dentry = lookup_create2(filename, base, 0);
> + error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_mknod(base->d_inode, dentry, mode, dev);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/* symlink(). Called only while initialization. */
> +static int fs_symlink(const char *pathname, struct dentry *base,
> + char *oldname, int mode, uid_t user, gid_t group)
> +{
> + struct dentry *dentry = lookup_create2(pathname, base, 0);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_mode = mode;
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/*
> + * Format string.
> + * Leading and trailing whitespaces are removed.
> + * Multiple whitespaces are packed into single space.
> + */
> +static void syaoran_normalize_line(unsigned char *buffer)
> +{
> + unsigned char *sp = buffer;
> + unsigned char *dp = buffer;
> + bool first = 1;
> + while (*sp && (*sp <= ' ' || *sp >= 127))
> + sp++;
> + while (*sp) {
> + if (!first)
> + *dp++ = ' ';
> + first = 0;
> + while (*sp > ' ' && *sp < 127)
> + *dp++ = *sp++;
> + while (*sp && (*sp <= ' ' || *sp >= 127))
> + sp++;
> + }
> + *dp = '\0';
> +}
> +
> +/* Convert text form of filename into binary form. */
> +static void syaoran_unescape(char *filename)
> +{
> + char *cp = filename;
> + char c, d, e;
> + if (!cp)
> + return;
> + while ((c = *filename++) != '\0') {
> + if (c != '\\') {
> + *cp++ = c;
> + continue;
> + }
> + if ((c = *filename++) == '\\') {
> + *cp++ = c;
> + continue;
> + }
> + if (c < '0' || c > '3')
> + break;
> + d = *filename++;
> + if (d < '0' || d > '7')
> + break;
> + e = *filename++;
> + if (e < '0' || e > '7')
> + break;
> + *(unsigned char *) cp++ = (unsigned char)
> + (((unsigned char) (c - '0') << 6) +
> + ((unsigned char) (d - '0') << 3) +
> + (unsigned char) (e - '0'));
> + }
> + *cp = '\0';
> +}
> +
> +static inline char *strdup(const char *data)
> +{
> + return kstrdup(data, GFP_KERNEL);
> +}
> +
> +static int register_node_info(char *buffer, struct super_block *sb)
> +{
> + enum {
> + ARG_FILENAME = 0,
> + ARG_PERMISSION = 1,
> + ARG_UID = 2,
> + ARG_GID = 3,
> + ARG_FLAGS = 4,
> + ARG_DEV_TYPE = 5,
> + ARG_SYMLINK_DATA = 6,
> + ARG_DEV_MAJOR = 6,
> + ARG_DEV_MINOR = 7,
> + MAX_ARG = 8
> + };
> + char *args[MAX_ARG];
> + int i;
> + int error = -EINVAL;
> + unsigned int perm, uid, gid, flags, major = 0, minor = 0;
> + struct syaoran_sb_info *info = (struct syaoran_sb_info *) sb->s_fs_info;
> + struct dev_entry *entry;
> + memset(args, 0, sizeof(args));
> + args[0] = buffer;
> + for (i = 1; i < MAX_ARG; i++) {
> + args[i] = strchr(args[i - 1] + 1, ' ');
> + if (!args[i])
> + break;
> + *args[i]++ = '\0';
> + }
> + /*
> + printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
> + args[0], args[1], args[2], args[3], args[4], args[5],
> + args[6], args[7]);
> + */
> + if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
> + !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
> + goto out;
> + if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
> + || sscanf(args[ARG_UID], "%u", &uid) != 1
> + || sscanf(args[ARG_GID], "%u", &gid) != 1
> + || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
> + || *(args[ARG_DEV_TYPE] + 1))
> + goto out;
> + switch (*args[ARG_DEV_TYPE]) {
> + case 'c':
> + perm |= S_IFCHR;
> + if (!args[ARG_DEV_MAJOR]
> + || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
> + || !args[ARG_DEV_MINOR]
> + || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
> + goto out;
> + break;
> + case 'b':
> + perm |= S_IFBLK;
> + if (!args[ARG_DEV_MAJOR]
> + || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
> + || !args[ARG_DEV_MINOR]
> + || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
> + goto out;
> + break;
> + case 'l':
> + perm |= S_IFLNK;
> + if (!args[ARG_SYMLINK_DATA])
> + goto out;
> + break;
> + case 'd':
> + perm |= S_IFDIR;
> + break;
> + case 's':
> + perm |= S_IFSOCK;
> + break;
> + case 'p':
> + perm |= S_IFIFO;
> + break;
> + case 'f':
> + perm |= S_IFREG;
> + break;
> + default:
> + goto out;
> + }
> + error = -ENOMEM;
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry)
> + goto out;
> + if (S_ISLNK(perm)) {
> + entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
> + if (!entry->printable_symlink_data)
> + goto out_freemem;
> + }
> + entry->printable_name = strdup(args[ARG_FILENAME]);
> + if (!entry->printable_name)
> + goto out_freemem;
> + if (S_ISLNK(perm)) {
> + entry->symlink_data = strdup(entry->printable_symlink_data);
> + if (!entry->symlink_data)
> + goto out_freemem;
> + syaoran_unescape(entry->symlink_data);
> + }
> + entry->name = strdup(entry->printable_name);
> + if (!entry->name)
> + goto out_freemem;
> + syaoran_unescape(entry->name);
> + /*
> + * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
> + * trailing '/'.
> + */
> + i = strlen(entry->name);
> + if (i && entry->name[i - 1] == '/')
> + entry->name[i - 1] = '\0';
> + entry->mode = perm;
> + entry->uid = uid;
> + entry->gid = gid;
> + entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
> + entry->flags = flags;
> + list_add_tail(&entry->list, &info->list);
> + /* printk("Entry added.\n"); */
> + error = 0;
> +out:
> + return error;
> +out_freemem:
> + kfree(entry->printable_symlink_data);
> + kfree(entry->printable_name);
> + kfree(entry->symlink_data);
> + kfree(entry);
> + goto out;
> +}
> +
> +static int read_config_file(struct file *file, struct super_block *sb)
> +{
> + char *buffer;
> + int error = -ENOMEM;
> + if (!file)
> + return -EINVAL;
> + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (buffer) {
> + int len;
> + char *cp;
> + unsigned long offset = 0;
> + while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
> + && (cp = memchr(buffer, '\n', len)) != NULL) {
> + *cp = '\0';
> + offset += cp - buffer + 1;
> + syaoran_normalize_line(buffer);
> + if (register_node_info(buffer, sb) == -ENOMEM)
> + goto out;
> + }
> + error = 0;
> + }
> +out:
> + kfree(buffer);
> + return error;
> +}
> +
> +static void make_node(struct dev_entry *entry, struct dentry *root)
> +{
> + struct dentry *base = dget(root);
> + char *filename = entry->name;
> + char *name = filename;
> + unsigned int c;
> + const mode_t perm = entry->mode;
> + const uid_t uid = entry->uid;
> + const gid_t gid = entry->gid;
> + goto start;
> + while ((c = *(unsigned char *) filename) != '\0') {
> + if (c == '/') {
> + struct dentry *new_base;
> + const int len = filename - name;
> + *filename = '\0';
> + mutex_lock(&base->d_inode->i_mutex);
> + new_base = lookup_one_len(name, base, len);
> + mutex_unlock(&base->d_inode->i_mutex);
> + dput(base);
> + *filename = '/';
> + filename++;
> + if (IS_ERR(new_base))
> + return;
> + if (!new_base->d_inode ||
> + !S_ISDIR(new_base->d_inode->i_mode)) {
> + dput(new_base);
> + return;
> + }
> + base = new_base;
> +start:
> + name = filename;
> + } else {
> + filename++;
> + }
> + }
> + filename = (char *) name;
> + if (S_ISLNK(perm)) {
> + fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
> + } else if (S_ISDIR(perm)) {
> + fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
> + } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
> + fs_mknod(filename, base, perm, 0, uid, gid);
> + } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
> + fs_mknod(filename, base, perm, entry->kdev, uid, gid);
> + }
> + dput(base);
> +}
> +
> +/* Create files according to the policy file. */
> +static void syaoran_make_initial_nodes(struct super_block *sb)
> +{
> + struct syaoran_sb_info *info;
> + struct dev_entry *entry;
> + if (!sb)
> + return;
> + info = (struct syaoran_sb_info *) sb->s_fs_info;
> + if (!info)
> + return;
> + if (info->is_permissive_mode) {
> + syaoran_create_tracelog(sb, ".syaoran");
> + syaoran_create_tracelog(sb, ".syaoran_all");
> + }
> + list_for_each_entry(entry, &info->list, list) {
> + if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
> + make_node(entry, sb->s_root);
> + }
> + info->initialize_done = 1;
> +}
> +
> +/* Read policy file. */
> +static int syaoran_initialize(struct super_block *sb, void *data)
> +{
> + int error = -EINVAL;
> + static bool first = 1;
> + if (first) {
> + first = 0;
> + printk(KERN_INFO "SYAORAN: 1.5.3-pre 2007/12/23\n");
> + }
> + {
> + struct inode *inode = new_inode(sb);
> + if (!inode)
> + return -EINVAL;
> + /* Create /dev/ram0 to get the value of blkdev_open(). */
> + init_special_inode(inode, S_IFBLK | 0666, MKDEV(1, 0));
> + wrapped_def_blk_fops = *inode->i_fop;
> + iput(inode);
> + org_blkdev_open = wrapped_def_blk_fops.open;
> + wrapped_def_blk_fops.open = wrapped_blkdev_open;
> + }
> + {
> + struct inode *inode = new_inode(sb);
> + if (!inode)
> + return -EINVAL;
> + /* Create /dev/null to get the value of chrdev_open(). */
> + init_special_inode(inode, S_IFCHR | 0666, MKDEV(1, 3));
> + wrapped_def_chr_fops = *inode->i_fop;
> + iput(inode);
> + org_chrdev_open = wrapped_def_chr_fops.open;
> + wrapped_def_chr_fops.open = wrapped_chrdev_open;
> + }
> + if (data) {
> + struct file *f;
> + char *filename = (char *) data;
> + bool is_permissive_mode = 0;
> + if (strncmp(filename, "accept=", 7) == 0) {
> + filename += 7;
> + is_permissive_mode = 1;
> + } else if (strncmp(filename, "enforce=", 8) == 0) {
> + filename += 8;
> + is_permissive_mode = 0;
> + } else {
> + printk(KERN_INFO
> + "SYAORAN: Missing 'accept=' or 'enforce='.\n");
> + return -EINVAL;
> + }
> + f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
> + if (!IS_ERR(f)) {
> + struct syaoran_sb_info *p;
> + if (!S_ISREG(f->f_dentry->d_inode->i_mode))
> + goto out;
> + p = kzalloc(sizeof(*p), GFP_KERNEL);
> + if (!p)
> + goto out;
> + p->is_permissive_mode = is_permissive_mode;
> + sb->s_fs_info = p;
> + INIT_LIST_HEAD(&((struct syaoran_sb_info *)
> + sb->s_fs_info)->list);
> + printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
> + error = read_config_file(f, sb);
> +out:
> + if (error)
> + printk(KERN_INFO "SYAORAN: Can't read '%s'\n",
> + filename);
> + filp_close(f, NULL);
> + } else {
> + printk(KERN_INFO "SYAORAN: Can't open '%s'\n",
> + filename);
> + }
> + } else {
> + printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
> + }
> + return error;
> +}
> +
> +/*
> + * The following structure and codes are used for transferring data
> + * to interfaces files.
> + */
> +
> +struct syaoran_read_struct {
> + char *buf; /* Buffer for reading. */
> + int avail; /* Bytes available for reading. */
> + struct super_block *sb; /* The super_block of this partition. */
> + struct dev_entry *entry; /* The entry currently reading from. */
> + _Bool read_all; /* Dump all entries? */
> + struct list_head *pos; /* Current position. */
> +};
> +
> +static void syaoran_read_table(struct syaoran_read_struct *head, char *buf,
> + int count)
> +{
> + struct super_block *sb = head->sb;
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) sb->s_fs_info;
> + struct list_head *pos;
> + const _Bool read_all = head->read_all;
> + if (!info)
> + return;
> + if (!head->pos)
> + return;
> + list_for_each_cookie(pos, head->pos, &info->list) {
> + struct dev_entry *entry =
> + list_entry(pos, struct dev_entry, list);
> + const unsigned int flags =
> + read_all ? entry->flags : entry->flags & ~DEVICE_USED;
> + const char *name = entry->printable_name;
> + const uid_t uid = entry->uid;
> + const gid_t gid = entry->gid;
> + const mode_t perm = entry->mode & 0777;
> + int len = 0;
> + switch (entry->mode & S_IFMT) {
> + case S_IFCHR:
> + if (!head->read_all && !(entry->flags & DEVICE_USED))
> + break;
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'c',
> + MAJOR(entry->kdev), MINOR(entry->kdev));
> + break;
> + case S_IFBLK:
> + if (!head->read_all && !(entry->flags & DEVICE_USED))
> + break;
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'b',
> + MAJOR(entry->kdev), MINOR(entry->kdev));
> + break;
> + case S_IFIFO:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'p');
> + break;
> + case S_IFSOCK:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 's');
> + break;
> + case S_IFDIR:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'd');
> + break;
> + case S_IFLNK:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %s\n",
> + name, perm, uid, gid, flags, 'l',
> + entry->printable_symlink_data);
> + break;
> + case S_IFREG:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'f');
> + break;
> + }
> + if (len < 0 || count <= len)
> + break;
> + count -= len;
> + buf += len;
> + head->avail += len;
> + }
> +}
> +
> +static int syaoran_trace_open(struct inode *inode, struct file *file)
> +{
> + struct syaoran_read_struct *head =
> + kzalloc(sizeof(*head), GFP_KERNEL);
> + if (!head)
> + return -ENOMEM;
> + head->sb = inode->i_sb;
> + head->read_all =
> + (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
> + head->pos = &((struct syaoran_sb_info *) head->sb->s_fs_info)->list;
> + head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
> + if (!head->buf) {
> + kfree(head);
> + return -ENOMEM;
> + }
> + file->private_data = head;
> + return 0;
> +}
> +
> +static int syaoran_trace_release(struct inode *inode, struct file *file)
> +{
> + struct syaoran_read_struct *head = file->private_data;
> + kfree(head->buf);
> + kfree(head);
> + file->private_data = NULL;
> + return 0;
> +}
> +
> +static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct syaoran_read_struct *head =
> + (struct syaoran_read_struct *) file->private_data;
> + int len = head->avail;
> + char *cp = head->buf;
> + if (!access_ok(VERIFY_WRITE, buf, count))
> + return -EFAULT;
> + syaoran_read_table(head, cp + len, PAGE_SIZE * 2 - len);
> + len = head->avail;
> + if (len > count)
> + len = count;
> + if (len > 0) {
> + if (copy_to_user(buf, cp, len))
> + return -EFAULT;
> + head->avail -= len;
> + memmove(cp, cp + len, head->avail);
> + }
> + return len;
> +}
> +
> +static struct file_operations syaoran_trace_operations = {
> + .open = syaoran_trace_open,
> + .release = syaoran_trace_release,
> + .read = syaoran_trace_read,
> +};
> +
> +/* Create interface files for reading status. */
> +static int syaoran_create_tracelog(struct super_block *sb, const char *filename)
> +{
> + struct inode *inode;
> + struct dentry *base = dget(sb->s_root);
> + struct dentry *dentry = lookup_create2(filename, base, 0);
> + int error = PTR_ERR(dentry);
> + if (IS_ERR(dentry))
> + goto out;
> + inode = syaoran_get_inode(sb, S_IFREG | 0400, 0);
> + if (!inode)
> + error = -ENOSPC;
> + else {
> + /* Override file operation. */
> + inode->i_fop = &syaoran_trace_operations;
> + d_instantiate(dentry, inode);
> + dget(dentry); /* Extra count - pin the dentry in core */
> + error = 0;
> + }
> + dput(dentry);
> +out:
> + mutex_unlock(&base->d_inode->i_mutex);
> + dput(base);
> + return error;
> +}
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-01-01 02:17:19

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Thank you for reviewing.

Serge E. Hallyn wrote:
> > This time, I'm implementing this filesystem as an extension to tmpfs
> > because what this filesystem does are nothing but check filename and
> > its attributes in addition to what tmpfs does.
>
> To integrate this nicer into tmpfs, at least define TMPFS_IS_MAC as 1
> and TMPFS_NOT_MAC as 0 and pass those values instead of just 1 and 0.
>
Question to everybody:
Not all users need this extension, so I'm worrying that integrating
this extension into tmpfs increases memory usage needlessly.
May I implement this filesystem as an extension to tmpfs
provided that users can enable/disable this extension via kernel config?

> Again, I should think you'd actually want to take blkdev_open() from
> fs/block_dev.c and chrdev_open() from fs/char_dev.c. Surely your
> method of grabbing it here is not acceptable for upstream code.
I see.

Regards.

2008-01-06 06:20:26

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Changes from previous posting:

(1) Added kernel config so that users can choose
whether to compile this filesystem or not.

I didn't receive any ACK/NACK regarding whether I'm permitted to
implement this filesystem as an extension to tmpfs or not.
So, I continued implementing this filesystem as an extension to tmpfs.

(2) Removed indirect grabbing of blkdev_open() and chrdev_open().

The previous posting was using indirect approach to call
blkdev_open() and chrdev_open() so that users can compile
this filesystem as a module without exporting blkdev_open()
from fs/block_dev.c and chrdev_open() from fs/char_dev.c .
But since tmpfs cannot be compiled as a module,
I changed it to direct accessing.

(3) Splitted single file into three files.

syaoran_init.c: initialization part
syaoran_main.c: access control part
syaoran_debug.c: taking snapshot part

This patch is for 2.6.24-rc6-mm1.

Regards.
----------
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
"applications using well-known device locations under /dev
get the device they want" (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

Because /dev has to be available through the lifetime of the kernel.
It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

Because SELinux doesn't guarantee filename and its attribute.
As far as I know, no MAC implementation can handle filename and its attribute.
I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

I want to add functionality that the MACs are missing.
Instead of adding this functionality per MAC,
I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

Dropping CAP_MKNOD is not enough for emulating this filesystem because
a process can still rename()/unlink() to break filename and its attributes
handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa <[email protected]>
---
fs/Kconfig | 18 +
fs/ramfs/inode.c | 177 ++++++++++++++
fs/ramfs/syaoran.h | 75 ++++++
fs/ramfs/syaoran_debug.c | 183 +++++++++++++++
fs/ramfs/syaoran_init.c | 568 +++++++++++++++++++++++++++++++++++++++++++++++
fs/ramfs/syaoran_main.c | 207 +++++++++++++++++
6 files changed, 1222 insertions(+), 6 deletions(-)

--- linux-2.6-mm.orig/fs/ramfs/inode.c
+++ linux-2.6-mm/fs/ramfs/inode.c
@@ -36,6 +36,20 @@
#include <asm/uaccess.h>
#include "internal.h"

+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+ dev_t dev, bool tmpfs_with_mac);
+
+#define TMPFS_WITH_MAC 1
+#define TMPFS_WITHOUT_MAC 0
+#include <linux/quotaops.h>
+
+#ifdef CONFIG_SYAORAN
+#include "syaoran.h"
+#include "syaoran_init.c"
+#include "syaoran_main.c"
+#include "syaoran_debug.c"
+#endif
+
/* some random number */
#define RAMFS_MAGIC 0x858458f6

@@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac

struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
{
+ return __ramfs_get_inode(sb, mode, dev, TMPFS_WITHOUT_MAC);
+}
+
+static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
+ dev_t dev, const bool tmpfs_with_mac)
+{
struct inode * inode = new_inode(sb);

if (inode) {
@@ -65,10 +85,18 @@ struct inode *ramfs_get_inode(struct sup
switch (mode & S_IFMT) {
default:
init_special_inode(inode, mode, dev);
+#ifdef CONFIG_SYAORAN
+ if (tmpfs_with_mac)
+ init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFREG:
inode->i_op = &ramfs_file_inode_operations;
inode->i_fop = &ramfs_file_operations;
+#ifdef CONFIG_SYAORAN
+ if (tmpfs_with_mac)
+ init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFDIR:
inode->i_op = &ramfs_dir_inode_operations;
@@ -79,6 +107,10 @@ struct inode *ramfs_get_inode(struct sup
break;
case S_IFLNK:
inode->i_op = &page_symlink_inode_operations;
+#ifdef CONFIG_SYAORAN
+ if (tmpfs_with_mac)
+ init_syaoran_inode(inode, mode);
+#endif
break;
}
}
@@ -92,9 +124,19 @@ struct inode *ramfs_get_inode(struct sup
static int
ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{
- struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
+ struct inode *inode;
int error = -ENOSPC;

+#ifdef CONFIG_SYAORAN
+ /*** SYAORAN start. ***/
+ if (dir->i_sb->s_op == &syaoran_ops) {
+ if (syaoran_may_create_node(dentry, mode, dev) < 0)
+ return -EPERM;
+ inode = syaoran_get_inode(dir->i_sb, mode, dev);
+ /*** SYAORAN end. ***/
+ } else
+#endif
+ inode = ramfs_get_inode(dir->i_sb, mode, dev);
if (inode) {
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
@@ -127,7 +169,16 @@ static int ramfs_symlink(struct inode *
struct inode *inode;
int error = -ENOSPC;

- inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
+#ifdef CONFIG_SYAORAN
+ /*** SYAORAN start. ***/
+ if (dir->i_sb->s_op == &syaoran_ops) {
+ if (syaoran_may_create_node(dentry, S_IFLNK, 0) < 0)
+ return -EPERM;
+ inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
+ /*** SYAORAN end. ***/
+ } else
+#endif
+ inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
if (inode) {
int l = strlen(symname)+1;
error = page_symlink(inode, symname, l);
@@ -143,16 +194,130 @@ static int ramfs_symlink(struct inode *
return error;
}

+static int ramfs_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
+{
+#ifdef CONFIG_SYAORAN
+ struct inode *inode;
+ if (dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ inode = old_dentry->d_inode;
+ if (!inode ||
+ syaoran_may_create_node(dentry, inode->i_mode, inode->i_rdev))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+#endif
+ return simple_link(old_dentry, dir, dentry);
+}
+
+static int ramfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+#ifdef CONFIG_SYAORAN
+ if (dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ if (syaoran_may_modify_node(dentry, MAY_DELETE))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+#endif
+ return simple_unlink(dir, dentry);
+}
+
+static int ramfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+#ifdef CONFIG_SYAORAN
+ struct inode *inode;
+ if (old_dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ inode = old_dentry->d_inode;
+ if (!inode || syaoran_may_modify_node(old_dentry, MAY_DELETE) ||
+ syaoran_may_create_node(new_dentry, inode->i_mode, inode->i_rdev))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+#endif
+ return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
+static int ramfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+#ifdef CONFIG_SYAORAN
+ if (dir->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ if (syaoran_may_modify_node(dentry, MAY_DELETE))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+#endif
+ return simple_rmdir(dir, dentry);
+}
+
+/*
+ * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
+ * Now I'm setting the field to share tmpfs/rootfs/syaoran code.
+ * Side effect is that the checking order of notify_change() has changed from
+ * inode_change_ok() -> security_inode_setattr() ->
+ * DQUOT_TRANSFER() -> inode_setattr()
+ * to
+ * security_inode_setattr() -> inode_change_ok() ->
+ * DQUOT_TRANSFER() -> inode_setattr()
+ *
+ * Is this change problematic? If problematic, I'll stop sharing the field.
+ */
+static int ramfs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ unsigned int ia_valid = attr->ia_valid;
+ struct inode *inode = dentry->d_inode;
+ int error = inode_change_ok(inode, attr);
+#ifdef CONFIG_SYAORAN
+ unsigned int flags = 0;
+ if (inode->i_sb->s_op != &syaoran_ops)
+ goto ok;
+ /*** SYAORAN start. ***/
+ if (ia_valid & (ATTR_UID | ATTR_GID))
+ flags |= MAY_CHOWN;
+ if (ia_valid & ATTR_MODE)
+ flags |= MAY_CHMOD;
+ if (syaoran_may_modify_node(dentry, flags))
+ return -EPERM;
+ /*** SYAORAN end. ***/
+ok:
+#endif
+ if (!error) {
+ if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
+ (ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid))
+ error = DQUOT_TRANSFER(inode, attr) ? -EDQUOT : 0;
+ if (!error)
+ error = inode_setattr(inode, attr);
+ }
+ return error;
+}
+
static const struct inode_operations ramfs_dir_inode_operations = {
.create = ramfs_create,
.lookup = simple_lookup,
- .link = simple_link,
- .unlink = simple_unlink,
+ /* Set link() hook for tracking link operation. */
+ .link = ramfs_link,
+ /* Set unlink() hook for tracking unlink operation. */
+ .unlink = ramfs_unlink,
+ /* Set symlink() hook for tracking symlink operation. */
.symlink = ramfs_symlink,
+ /* Set mkdir() hook for tracking mkdir operation. */
.mkdir = ramfs_mkdir,
- .rmdir = simple_rmdir,
+ /* Set rmdir() hook for tracking rmdir operation. */
+ .rmdir = ramfs_rmdir,
+ /* Set mknod() hook for tracking mknod operation. */
.mknod = ramfs_mknod,
- .rename = simple_rename,
+ /* Set rename() hook for tracking rename operation. */
+ .rename = ramfs_rename,
+ /* Set setattr() hook for tracking chmod/chown operations. */
+ .setattr = ramfs_setattr,
};

static const struct super_operations ramfs_ops = {
--- /dev/null
+++ linux-2.6-mm/fs/ramfs/syaoran.h
@@ -0,0 +1,75 @@
+/*
+ * fs/ramfs/syaoran.h
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+
+#ifndef SYAORAN_H
+#define SYAORAN_H
+
+#include <linux/namei.h>
+#include <linux/mm.h>
+
+static struct super_operations syaoran_ops;
+static void init_syaoran_inode(struct inode *inode, int mode);
+
+static int syaoran_create_tracelog(struct super_block *sb,
+ const char *filename);
+static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags);
+
+/* The following constants are used to restrict operations.*/
+
+#define MAY_CREATE 1 /* This file is allowed to be mknod()ed. */
+#define MAY_DELETE 2 /* This file is allowed to be unlink()ed. */
+#define MAY_CHMOD 4 /* This file is allowed to be chmod()ed. */
+#define MAY_CHOWN 8 /* This file is allowed to be chown()ed. */
+#define DEVICE_USED 16 /* This block or character device file is used. */
+#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
+
+/* some random number */
+#define SYAORAN_MAGIC 0x2F646576 /* = '/dev' */
+
+struct dev_entry {
+ struct list_head list;
+ /* Binary form of pathname under mount point. Never NULL. */
+ char *name;
+ /*
+ * Mode and permissions. setuid/setgid/sticky bits are not supported.
+ */
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ dev_t kdev;
+ /*
+ * Binary form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ char *symlink_data;
+ /* File access control flags. */
+ unsigned int flags;
+ /* Text form of pathname under mount point. Never NULL. */
+ const char *printable_name;
+ /*
+ * Text form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ const char *printable_symlink_data;
+};
+
+struct syaoran_sb_info {
+ struct list_head list;
+ bool initialize_done; /* False if initialization is in progress. */
+ bool is_permissive_mode; /* True if permissive mode. */
+};
+
+static inline struct inode *syaoran_get_inode(struct super_block *sb,
+ int mode, dev_t dev)
+{
+ return __ramfs_get_inode(sb, mode, dev, TMPFS_WITH_MAC);
+}
+
+#endif
--- linux-2.6-mm.orig/fs/Kconfig
+++ linux-2.6-mm/fs/Kconfig
@@ -978,6 +978,24 @@ config TMPFS_POSIX_ACL

If you don't know what Access Control Lists are, say N.

+config SYAORAN
+ bool "Tamper-proof device filesystem support"
+ depends on TMPFS
+ help
+ If you mount this filesystem for /dev directory instead of tmpfs,
+ you can guarantee the following thing.
+
+ "Applications using well-known device locations under /dev
+ get the device they want" (e.g. an application that accesses
+ /dev/null can always get a character special device
+ with major=1 and minor=3).
+
+ The list of possible combinations of filename and its attributes
+ that can exist on this filesystem is defined at mount time
+ using a configuration file.
+
+ If unsure, say N.
+
config HUGETLBFS
bool "HugeTLB file system support"
depends on X86 || IA64 || PPC64 || SPARC64 || (SUPERH && MMU) || BROKEN
--- /dev/null
+++ linux-2.6-mm/fs/ramfs/syaoran_debug.c
@@ -0,0 +1,183 @@
+/*
+ * fs/ramfs/syaoran_debug.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+/*
+ * The following structure and codes are used for transferring data
+ * to interfaces files.
+ */
+
+#define list_for_each_cookie(pos, cookie, head) \
+ for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
+ prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
+ (cookie) = pos, pos = pos->next)
+
+struct syaoran_read_struct {
+ char *buf; /* Buffer for reading. */
+ int avail; /* Bytes available for reading. */
+ struct super_block *sb; /* The super_block of this partition. */
+ struct dev_entry *entry; /* The entry currently reading from. */
+ bool read_all; /* Dump all entries? */
+ struct list_head *pos; /* Current position. */
+};
+
+static void syaoran_read_table(struct syaoran_read_struct *head, char *buf,
+ int count)
+{
+ struct super_block *sb = head->sb;
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) sb->s_fs_info;
+ struct list_head *pos;
+ const bool read_all = head->read_all;
+ if (!info)
+ return;
+ if (!head->pos)
+ return;
+ list_for_each_cookie(pos, head->pos, &info->list) {
+ struct dev_entry *entry =
+ list_entry(pos, struct dev_entry, list);
+ const unsigned int flags =
+ read_all ? entry->flags : entry->flags & ~DEVICE_USED;
+ const char *name = entry->printable_name;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ const mode_t perm = entry->mode & 0777;
+ int len = 0;
+ switch (entry->mode & S_IFMT) {
+ case S_IFCHR:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFBLK:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFIFO:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l',
+ entry->printable_symlink_data);
+ break;
+ case S_IFREG:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ if (len < 0 || count <= len)
+ break;
+ count -= len;
+ buf += len;
+ head->avail += len;
+ }
+}
+
+static int syaoran_trace_open(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head =
+ kzalloc(sizeof(*head), GFP_KERNEL);
+ if (!head)
+ return -ENOMEM;
+ head->sb = inode->i_sb;
+ head->read_all =
+ (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
+ head->pos = &((struct syaoran_sb_info *) head->sb->s_fs_info)->list;
+ head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
+ if (!head->buf) {
+ kfree(head);
+ return -ENOMEM;
+ }
+ file->private_data = head;
+ return 0;
+}
+
+static int syaoran_trace_release(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head = file->private_data;
+ kfree(head->buf);
+ kfree(head);
+ file->private_data = NULL;
+ return 0;
+}
+
+static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct syaoran_read_struct *head =
+ (struct syaoran_read_struct *) file->private_data;
+ int len = head->avail;
+ char *cp = head->buf;
+ if (!access_ok(VERIFY_WRITE, buf, count))
+ return -EFAULT;
+ syaoran_read_table(head, cp + len, PAGE_SIZE * 2 - len);
+ len = head->avail;
+ if (len > count)
+ len = count;
+ if (len > 0) {
+ if (copy_to_user(buf, cp, len))
+ return -EFAULT;
+ head->avail -= len;
+ memmove(cp, cp + len, head->avail);
+ }
+ return len;
+}
+
+static struct file_operations syaoran_trace_operations = {
+ .open = syaoran_trace_open,
+ .release = syaoran_trace_release,
+ .read = syaoran_trace_read,
+};
+
+/* Create interface files for reading status. */
+static int syaoran_create_tracelog(struct super_block *sb, const char *filename)
+{
+ struct inode *inode;
+ struct dentry *base = dget(sb->s_root);
+ struct dentry *dentry = lookup_create2(filename, base, 0);
+ int error = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
+ goto out;
+ inode = syaoran_get_inode(sb, S_IFREG | 0400, 0);
+ if (!inode)
+ error = -ENOSPC;
+ else {
+ /* Override file operation. */
+ inode->i_fop = &syaoran_trace_operations;
+ d_instantiate(dentry, inode);
+ dget(dentry); /* Extra count - pin the dentry in core */
+ error = 0;
+ }
+ dput(dentry);
+out:
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ return error;
+}
--- /dev/null
+++ linux-2.6-mm/fs/ramfs/syaoran_init.c
@@ -0,0 +1,568 @@
+/*
+ * fs/ramfs/syaoran_init.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+
+/*
+ * The following codes are used for processing the policy file and
+ * creating initial nodes at mount time.
+ */
+
+/* lookup_create() without nameidata */
+static struct dentry *lookup_create2(const char *name, struct dentry *base,
+ const bool is_dir)
+{
+ struct dentry *dentry;
+ const int len = name ? strlen(name) : 0;
+ mutex_lock(&base->d_inode->i_mutex);
+ dentry = lookup_one_len(name, base, len);
+ if (IS_ERR(dentry))
+ goto fail;
+ if (!is_dir && name[len] && !dentry->d_inode)
+ goto enoent;
+ return dentry;
+enoent:
+ dput(dentry);
+ dentry = ERR_PTR(-ENOENT);
+fail:
+ return dentry;
+}
+
+static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
+ uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 1);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mkdir(base->d_inode, dentry, mode);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+static int fs_mknod(const char *filename, struct dentry *base, int mode,
+ dev_t dev, uid_t user, gid_t group)
+{
+ struct dentry *dentry;
+ int error;
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ case S_IFREG:
+ break;
+ default:
+ return -EPERM;
+ }
+ dentry = lookup_create2(filename, base, 0);
+ error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mknod(base->d_inode, dentry, mode, dev);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+static int fs_symlink(const char *pathname, struct dentry *base,
+ char *oldname, int mode, uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 0);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_mode = mode;
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/*
+ * Format string.
+ * Leading and trailing whitespaces are removed.
+ * Multiple whitespaces are packed into single space.
+ */
+static void syaoran_normalize_line(unsigned char *buffer)
+{
+ unsigned char *sp = buffer;
+ unsigned char *dp = buffer;
+ bool first = 1;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ while (*sp) {
+ if (!first)
+ *dp++ = ' ';
+ first = 0;
+ while (*sp > ' ' && *sp < 127)
+ *dp++ = *sp++;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ }
+ *dp = '\0';
+}
+
+/* Convert text form of filename into binary form. */
+static void syaoran_unescape(char *filename)
+{
+ char *cp = filename;
+ char c, d, e;
+ if (!cp)
+ return;
+ while ((c = *filename++) != '\0') {
+ if (c != '\\') {
+ *cp++ = c;
+ continue;
+ }
+ if ((c = *filename++) == '\\') {
+ *cp++ = c;
+ continue;
+ }
+ if (c < '0' || c > '3')
+ break;
+ d = *filename++;
+ if (d < '0' || d > '7')
+ break;
+ e = *filename++;
+ if (e < '0' || e > '7')
+ break;
+ *(unsigned char *) cp++ = (unsigned char)
+ (((unsigned char) (c - '0') << 6) +
+ ((unsigned char) (d - '0') << 3) +
+ (unsigned char) (e - '0'));
+ }
+ *cp = '\0';
+}
+
+static inline char *strdup(const char *data)
+{
+ return kstrdup(data, GFP_KERNEL);
+}
+
+static int register_node_info(char *buffer, struct super_block *sb)
+{
+ enum {
+ ARG_FILENAME = 0,
+ ARG_PERMISSION = 1,
+ ARG_UID = 2,
+ ARG_GID = 3,
+ ARG_FLAGS = 4,
+ ARG_DEV_TYPE = 5,
+ ARG_SYMLINK_DATA = 6,
+ ARG_DEV_MAJOR = 6,
+ ARG_DEV_MINOR = 7,
+ MAX_ARG = 8
+ };
+ char *args[MAX_ARG];
+ int i;
+ int error = -EINVAL;
+ unsigned int perm, uid, gid, flags, major = 0, minor = 0;
+ struct syaoran_sb_info *info = (struct syaoran_sb_info *) sb->s_fs_info;
+ struct dev_entry *entry;
+ memset(args, 0, sizeof(args));
+ args[0] = buffer;
+ for (i = 1; i < MAX_ARG; i++) {
+ args[i] = strchr(args[i - 1] + 1, ' ');
+ if (!args[i])
+ break;
+ *args[i]++ = '\0';
+ }
+ /*
+ printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
+ args[0], args[1], args[2], args[3], args[4], args[5],
+ args[6], args[7]);
+ */
+ if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
+ !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
+ goto out;
+ if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
+ || sscanf(args[ARG_UID], "%u", &uid) != 1
+ || sscanf(args[ARG_GID], "%u", &gid) != 1
+ || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
+ || *(args[ARG_DEV_TYPE] + 1))
+ goto out;
+ switch (*args[ARG_DEV_TYPE]) {
+ case 'c':
+ perm |= S_IFCHR;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'b':
+ perm |= S_IFBLK;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'l':
+ perm |= S_IFLNK;
+ if (!args[ARG_SYMLINK_DATA])
+ goto out;
+ break;
+ case 'd':
+ perm |= S_IFDIR;
+ break;
+ case 's':
+ perm |= S_IFSOCK;
+ break;
+ case 'p':
+ perm |= S_IFIFO;
+ break;
+ case 'f':
+ perm |= S_IFREG;
+ break;
+ default:
+ goto out;
+ }
+ error = -ENOMEM;
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ goto out;
+ if (S_ISLNK(perm)) {
+ entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
+ if (!entry->printable_symlink_data)
+ goto out_freemem;
+ }
+ entry->printable_name = strdup(args[ARG_FILENAME]);
+ if (!entry->printable_name)
+ goto out_freemem;
+ if (S_ISLNK(perm)) {
+ entry->symlink_data = strdup(entry->printable_symlink_data);
+ if (!entry->symlink_data)
+ goto out_freemem;
+ syaoran_unescape(entry->symlink_data);
+ }
+ entry->name = strdup(entry->printable_name);
+ if (!entry->name)
+ goto out_freemem;
+ syaoran_unescape(entry->name);
+ /*
+ * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
+ * trailing '/'.
+ */
+ i = strlen(entry->name);
+ if (i && entry->name[i - 1] == '/')
+ entry->name[i - 1] = '\0';
+ entry->mode = perm;
+ entry->uid = uid;
+ entry->gid = gid;
+ entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
+ entry->flags = flags;
+ list_add_tail(&entry->list, &info->list);
+ /* printk("Entry added.\n"); */
+ error = 0;
+out:
+ return error;
+out_freemem:
+ kfree(entry->printable_symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->symlink_data);
+ kfree(entry);
+ goto out;
+}
+
+static int read_config_file(struct file *file, struct super_block *sb)
+{
+ char *buffer;
+ int error = -ENOMEM;
+ if (!file)
+ return -EINVAL;
+ buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (buffer) {
+ int len;
+ char *cp;
+ unsigned long offset = 0;
+ while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
+ && (cp = memchr(buffer, '\n', len)) != NULL) {
+ *cp = '\0';
+ offset += cp - buffer + 1;
+ syaoran_normalize_line(buffer);
+ if (register_node_info(buffer, sb) == -ENOMEM)
+ goto out;
+ }
+ error = 0;
+ }
+out:
+ kfree(buffer);
+ return error;
+}
+
+static void make_node(struct dev_entry *entry, struct dentry *root)
+{
+ struct dentry *base = dget(root);
+ char *filename = entry->name;
+ char *name = filename;
+ unsigned int c;
+ const mode_t perm = entry->mode;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ goto start;
+ while ((c = *(unsigned char *) filename) != '\0') {
+ if (c == '/') {
+ struct dentry *new_base;
+ const int len = filename - name;
+ *filename = '\0';
+ mutex_lock(&base->d_inode->i_mutex);
+ new_base = lookup_one_len(name, base, len);
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ *filename = '/';
+ filename++;
+ if (IS_ERR(new_base))
+ return;
+ if (!new_base->d_inode ||
+ !S_ISDIR(new_base->d_inode->i_mode)) {
+ dput(new_base);
+ return;
+ }
+ base = new_base;
+start:
+ name = filename;
+ } else {
+ filename++;
+ }
+ }
+ filename = (char *) name;
+ if (S_ISLNK(perm)) {
+ fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
+ } else if (S_ISDIR(perm)) {
+ fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
+ } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
+ fs_mknod(filename, base, perm, 0, uid, gid);
+ } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
+ fs_mknod(filename, base, perm, entry->kdev, uid, gid);
+ }
+ dput(base);
+}
+
+/* Create files according to the policy file. */
+static void syaoran_make_initial_nodes(struct super_block *sb)
+{
+ struct syaoran_sb_info *info;
+ struct dev_entry *entry;
+ if (!sb)
+ return;
+ info = (struct syaoran_sb_info *) sb->s_fs_info;
+ if (!info)
+ return;
+ if (info->is_permissive_mode) {
+ syaoran_create_tracelog(sb, ".syaoran");
+ syaoran_create_tracelog(sb, ".syaoran_all");
+ }
+ list_for_each_entry(entry, &info->list, list) {
+ if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
+ make_node(entry, sb->s_root);
+ }
+ info->initialize_done = 1;
+}
+
+/* Read policy file. */
+static int syaoran_initialize(struct super_block *sb, void *data)
+{
+ int error = -EINVAL;
+ struct file *f;
+ char *filename = (char *) data;
+ bool is_permissive_mode = 0;
+ struct syaoran_sb_info *p;
+ static bool first = 1;
+ if (first) {
+ first = 0;
+ printk(KERN_INFO "SYAORAN: 1.5.3-pre 2008/01/06\n");
+ }
+ if (!filename) {
+ printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
+ return -EINVAL;
+ } else if (strncmp(filename, "accept=", 7) == 0) {
+ filename += 7;
+ is_permissive_mode = 1;
+ } else if (strncmp(filename, "enforce=", 8) == 0) {
+ filename += 8;
+ is_permissive_mode = 0;
+ } else {
+ printk(KERN_INFO "SYAORAN: Missing 'accept=' or 'enforce='.\n");
+ return -EINVAL;
+ }
+ f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
+ if (IS_ERR(f)) {
+ printk(KERN_INFO "SYAORAN: Can't open '%s'\n", filename);
+ return -EINVAL;
+ }
+ if (!S_ISREG(f->f_dentry->d_inode->i_mode))
+ goto out;
+ p = kzalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ goto out;
+ p->is_permissive_mode = is_permissive_mode;
+ sb->s_fs_info = p;
+ INIT_LIST_HEAD(&((struct syaoran_sb_info *) sb->s_fs_info)->list);
+ printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
+ error = read_config_file(f, sb);
+out:
+ if (error)
+ printk(KERN_INFO "SYAORAN: Can't read '%s'\n", filename);
+ filp_close(f, NULL);
+ return error;
+}
+
+static int syaoran_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct inode *inode;
+ struct dentry *root;
+ int error;
+
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = SYAORAN_MAGIC;
+ sb->s_op = &syaoran_ops;
+ sb->s_time_gran = 1;
+ error = syaoran_initialize(sb, data);
+ if (error < 0)
+ return error;
+ inode = syaoran_get_inode(sb, S_IFDIR | 0755, 0);
+ if (!inode)
+ return -ENOMEM;
+
+ root = d_alloc_root(inode);
+ if (!root) {
+ iput(inode);
+ return -ENOMEM;
+ }
+ sb->s_root = root;
+ syaoran_make_initial_nodes(sb);
+ return 0;
+}
+
+static int syaoran_get_sb(struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *data,
+ struct vfsmount *mnt)
+{
+ return get_sb_nodev(fs_type, flags, data, syaoran_fill_super, mnt);
+}
+
+static void syaoran_put_super(struct super_block *sb)
+{
+ struct syaoran_sb_info *info;
+ struct dev_entry *entry;
+ struct dev_entry *tmp;
+ if (!sb)
+ return;
+ info = (struct syaoran_sb_info *) sb->s_fs_info;
+ if (!info)
+ return;
+ list_for_each_entry_safe(entry, tmp, &info->list, list) {
+ kfree(entry->name);
+ kfree(entry->symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->printable_symlink_data);
+ list_del(&entry->list);
+ /* printk("Entry removed.\n"); */
+ kfree(entry);
+ }
+ kfree(info);
+ sb->s_fs_info = NULL;
+ printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
+}
+
+static struct file_system_type syaoran_fs_type = {
+ .name = "syaoran",
+ .get_sb = syaoran_get_sb,
+ .kill_sb = kill_litter_super,
+};
+
+static struct file_operations wrapped_def_blk_fops;
+static struct file_operations wrapped_def_chr_fops;
+static struct inode_operations syaoran_file_inode_operations;
+static struct inode_operations syaoran_symlink_inode_operations;
+static int ramfs_setattr(struct dentry *dentry, struct iattr *attr);
+static const struct super_operations ramfs_ops;
+
+static void init_syaoran_inode(struct inode *inode, int mode)
+{
+ /* Set open() hook for tracking open request. */
+ if (S_ISBLK(mode))
+ inode->i_fop = &wrapped_def_blk_fops;
+ else if (S_ISCHR(mode))
+ inode->i_fop = &wrapped_def_chr_fops;
+ /*
+ * Set setattr() hook for tracking chmod/chwon request.
+ * The setattr() hook of derectory is already set by
+ * ramfs_dir_inode_operations.
+ */
+ if (S_ISLNK(mode))
+ inode->i_op = &syaoran_symlink_inode_operations;
+ else
+ inode->i_op = &syaoran_file_inode_operations;
+}
+
+static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
+{
+ int error = def_blk_fops.open(inode, filp);
+ if (error != -ENXIO)
+ syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
+{
+ int error = def_chr_fops.open(inode, filp);
+ if (error != -ENXIO)
+ syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+static int __init init_syaoran_fs(void)
+{
+ /* Set open() hook for tracking open operation of block devices. */
+ wrapped_def_blk_fops = def_blk_fops;
+ wrapped_def_blk_fops.open = wrapped_blkdev_open;
+ /* Set open() hook for tracking open operation of character devices. */
+ wrapped_def_chr_fops = def_chr_fops;
+ wrapped_def_chr_fops.open = wrapped_chrdev_open;
+ /* Set setattr() hook for tracking chmod/chown operations of file. */
+ syaoran_file_inode_operations = ramfs_file_inode_operations;
+ syaoran_file_inode_operations.setattr = ramfs_setattr;
+ /* Set setattr() hook for tracking chmod/chown operations of symlink. */
+ syaoran_symlink_inode_operations = page_symlink_inode_operations;
+ syaoran_symlink_inode_operations.setattr = ramfs_setattr;
+ /* Set umount() hook for freeing memory. */
+ syaoran_ops = ramfs_ops;
+ syaoran_ops.put_super = syaoran_put_super;
+ return register_filesystem(&syaoran_fs_type);
+}
+
+static void __exit exit_syaoran_fs(void)
+{
+ unregister_filesystem(&syaoran_fs_type);
+}
+module_init(init_syaoran_fs);
+module_exit(exit_syaoran_fs);
--- /dev/null
+++ linux-2.6-mm/fs/ramfs/syaoran_main.c
@@ -0,0 +1,207 @@
+/*
+ * fs/ramfs/syaoran_main.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+
+/* Get absolute pathname from mount point. */
+static int get_local_absolute_path(struct dentry *dentry, char *buffer,
+ int buflen)
+{
+ char *start = buffer;
+ char *end = buffer + buflen;
+ int namelen;
+
+ if (buflen < 256)
+ goto out;
+
+ *--end = '\0';
+ buflen--;
+ for (;;) {
+ struct dentry *parent;
+ if (IS_ROOT(dentry))
+ break;
+ parent = dentry->d_parent;
+ namelen = dentry->d_name.len;
+ buflen -= namelen + 1;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ *--end = '/';
+ dentry = parent;
+ }
+ if (*end == '/') {
+ buflen++;
+ end++;
+ }
+ namelen = dentry->d_name.len;
+ buflen -= namelen;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ memmove(start, end, strlen(end) + 1);
+ return 0;
+out:
+ return -ENOMEM;
+}
+
+/* Get absolute pathname of the given dentry from mount point. */
+static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
+ int newname_len)
+{
+ int error;
+ struct dentry *d_dentry;
+ if (!dentry || !newname || newname_len <= 0)
+ return -EINVAL;
+ d_dentry = dget(dentry);
+ /***** CRITICAL SECTION START *****/
+ spin_lock(&dcache_lock);
+ error = get_local_absolute_path(d_dentry, newname, newname_len);
+ spin_unlock(&dcache_lock);
+ /***** CRITICAL SECTION END *****/
+ dput(d_dentry);
+ return error;
+}
+
+static int syaoran_check_flags(struct syaoran_sb_info *info,
+ struct dentry *dentry, int mode, int dev,
+ unsigned int flags)
+{
+ int error = -EPERM;
+ struct dev_entry *entry;
+ /*
+ * Since local_realpath_from_dentry() holds dcache_lock,
+ * allocating buffer using kmalloc() won't help improving concurrency.
+ * Therefore, I use static buffer here.
+ */
+ static char filename[PAGE_SIZE];
+ static DEFINE_SPINLOCK(lock);
+ spin_lock(&lock);
+ memset(filename, 0, sizeof(filename));
+ if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1))
+ goto out;
+ list_for_each_entry(entry, &info->list, list) {
+ if ((mode & S_IFMT) != (entry->mode & S_IFMT))
+ continue;
+ if ((S_ISBLK(mode) || S_ISCHR(mode)) && dev != entry->kdev)
+ continue;
+ if (strcmp(entry->name, filename + 1))
+ continue;
+ if (info->is_permissive_mode) {
+ entry->flags |= flags;
+ error = 0;
+ } else {
+ if ((entry->flags & flags) == flags)
+ error = 0;
+ }
+ break;
+ }
+out:
+ if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {
+ const char *name;
+ const uid_t uid = current->fsuid;
+ const gid_t gid = current->fsgid;
+ const mode_t perm = mode & 0777;
+ flags &= ~DEVICE_USED;
+ {
+ char *end = filename + sizeof(filename) - 1;
+ const char *cp = strchr(filename, '\0') - 1;
+ while (cp > filename) {
+ const unsigned char c = *cp--;
+ if (c == '\\') {
+ *--end = '\\';
+ *--end = '\\';
+ } else if (c > ' ' && c < 127) {
+ *--end = c;
+ } else {
+ *--end = (c & 7) + '0';
+ *--end = ((c >> 3) & 7) + '0';
+ *--end = (c >> 6) + '0';
+ *--end = '\\';
+ }
+ }
+ name = end;
+ }
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFBLK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFIFO:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l', "unknown");
+ break;
+ case S_IFREG:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ }
+ spin_unlock(&lock);
+ return error;
+}
+
+/* Check whether the given dentry is allowed to mknod. */
+static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev)
+{
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (!info->initialize_done)
+ return 0;
+ return syaoran_check_flags(info, dentry, mode, dev, MAY_CREATE);
+}
+
+/* Check whether the given dentry is allowed to chmod/chown/unlink. */
+static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags)
+{
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (flags == DEVICE_USED && !info->is_permissive_mode)
+ return 0;
+ if (!dentry->d_inode)
+ return -ENOENT;
+ return syaoran_check_flags(info, dentry, dentry->d_inode->i_mode,
+ dentry->d_inode->i_rdev, flags);
+}
+

2008-01-06 06:34:54

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello,

On Sun, Jan 06, 2008 at 03:20:00PM +0900, Tetsuo Handa wrote:
> Hello.
>
> Changes from previous posting:
>
> (1) Added kernel config so that users can choose
> whether to compile this filesystem or not.
>
> I didn't receive any ACK/NACK regarding whether I'm permitted to
> implement this filesystem as an extension to tmpfs or not.
> So, I continued implementing this filesystem as an extension to tmpfs.
>
> (2) Removed indirect grabbing of blkdev_open() and chrdev_open().
>
> The previous posting was using indirect approach to call
> blkdev_open() and chrdev_open() so that users can compile
> this filesystem as a module without exporting blkdev_open()
> from fs/block_dev.c and chrdev_open() from fs/char_dev.c .
> But since tmpfs cannot be compiled as a module,
> I changed it to direct accessing.
>
> (3) Splitted single file into three files.
>
> syaoran_init.c: initialization part
> syaoran_main.c: access control part
> syaoran_debug.c: taking snapshot part
>
> This patch is for 2.6.24-rc6-mm1.
>
> Regards.
> ----------
> Subject: Simple tamper-proof device filesystem.
>
> The goal of this filesystem is to guarantee that
> "applications using well-known device locations under /dev
> get the device they want" (e.g. an application that accesses /dev/null can
> always get a character special device with major=1 and minor=3).
>
> This idea sounds silly? Indeed, if you think the root can do whatever
> he/she wants do do. But this filesystem makes sense when used with
> access control mechanisms like MAC (mandatory access control).
> I want to use this filesystem in case where a process with root privilege was
> hijacked but the behavior of the hijacked process is still restricted by MAC.
>
> Why not use FUSE?
>
> Because /dev has to be available through the lifetime of the kernel.
> It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.
>
> Why not use SELinux?
>
> Because SELinux doesn't guarantee filename and its attribute.
> As far as I know, no MAC implementation can handle filename and its attribute.
> I guess this is because
>
> Filename and its attributes pairs are conventionally considered as
> constant and reliable.
>
> It makes the MAC's policy syntax complicated to describe this attribute
> enforcement information in MAC's policy.
>
> I want to add functionality that the MACs are missing.
> Instead of adding this functionality per MAC,
> I propose to add it as ground work, to be combined with any MAC.
>
> Why not drop CAP_MKNOD?
>
> Dropping CAP_MKNOD is not enough for emulating this filesystem because
> a process can still rename()/unlink() to break filename and its attributes
> handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
> mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).
>
> This time, I'm implementing this filesystem as an extension to tmpfs
> because what this filesystem does are nothing but check filename and
> its attributes in addition to what tmpfs does.
>
> Signed-off-by: Tetsuo Handa <[email protected]>
> ---
> fs/Kconfig | 18 +
> fs/ramfs/inode.c | 177 ++++++++++++++
> fs/ramfs/syaoran.h | 75 ++++++
> fs/ramfs/syaoran_debug.c | 183 +++++++++++++++
> fs/ramfs/syaoran_init.c | 568 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/ramfs/syaoran_main.c | 207 +++++++++++++++++
> 6 files changed, 1222 insertions(+), 6 deletions(-)

Your patch is very confusing. In your description, as well as in the
comments you talk about tmpfs, but your patch does not touch even one
line of tmpfs and only changes ramfs. Even your variables and arguments
refer to tmpfs. The Kconfig entry indicates that the feature depends
on TMPFS too.

Judging from the following comment :
* Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.

I suspect that you confuse both filesystems.
- ramfs is in fs/ramfs and is always compiled in, you cannot disable it
- tmpfs is in mm/shmem.c and is optional. It also supports options that
ramfs does not (eg: size) and data may be swapped.

Please understand that I'm not discussing the usefulness of your patch,
I'm just trying to avoid a huge confusion.

Regards,
Willy

> --- linux-2.6-mm.orig/fs/ramfs/inode.c
> +++ linux-2.6-mm/fs/ramfs/inode.c
> @@ -36,6 +36,20 @@
> #include <asm/uaccess.h>
> #include "internal.h"
>
> +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
> + dev_t dev, bool tmpfs_with_mac);
> +
> +#define TMPFS_WITH_MAC 1
> +#define TMPFS_WITHOUT_MAC 0
> +#include <linux/quotaops.h>
> +
> +#ifdef CONFIG_SYAORAN
> +#include "syaoran.h"
> +#include "syaoran_init.c"
> +#include "syaoran_main.c"
> +#include "syaoran_debug.c"
> +#endif
> +
> /* some random number */
> #define RAMFS_MAGIC 0x858458f6
>
> @@ -51,6 +65,12 @@ static struct backing_dev_info ramfs_bac
>
> struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev)
> {
> + return __ramfs_get_inode(sb, mode, dev, TMPFS_WITHOUT_MAC);
> +}
> +
> +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
> + dev_t dev, const bool tmpfs_with_mac)
> +{
> struct inode * inode = new_inode(sb);
>
> if (inode) {
> @@ -65,10 +85,18 @@ struct inode *ramfs_get_inode(struct sup
> switch (mode & S_IFMT) {
> default:
> init_special_inode(inode, mode, dev);
> +#ifdef CONFIG_SYAORAN
> + if (tmpfs_with_mac)
> + init_syaoran_inode(inode, mode);
> +#endif
> break;
> case S_IFREG:
> inode->i_op = &ramfs_file_inode_operations;
> inode->i_fop = &ramfs_file_operations;
> +#ifdef CONFIG_SYAORAN
> + if (tmpfs_with_mac)
> + init_syaoran_inode(inode, mode);
> +#endif
> break;
> case S_IFDIR:
> inode->i_op = &ramfs_dir_inode_operations;
> @@ -79,6 +107,10 @@ struct inode *ramfs_get_inode(struct sup
> break;
> case S_IFLNK:
> inode->i_op = &page_symlink_inode_operations;
> +#ifdef CONFIG_SYAORAN
> + if (tmpfs_with_mac)
> + init_syaoran_inode(inode, mode);
> +#endif
> break;
> }
> }
> @@ -92,9 +124,19 @@ struct inode *ramfs_get_inode(struct sup
> static int
> ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
> {
> - struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
> + struct inode *inode;
> int error = -ENOSPC;
>
> +#ifdef CONFIG_SYAORAN
> + /*** SYAORAN start. ***/
> + if (dir->i_sb->s_op == &syaoran_ops) {
> + if (syaoran_may_create_node(dentry, mode, dev) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, mode, dev);
> + /*** SYAORAN end. ***/
> + } else
> +#endif
> + inode = ramfs_get_inode(dir->i_sb, mode, dev);
> if (inode) {
> if (dir->i_mode & S_ISGID) {
> inode->i_gid = dir->i_gid;
> @@ -127,7 +169,16 @@ static int ramfs_symlink(struct inode *
> struct inode *inode;
> int error = -ENOSPC;
>
> - inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> +#ifdef CONFIG_SYAORAN
> + /*** SYAORAN start. ***/
> + if (dir->i_sb->s_op == &syaoran_ops) {
> + if (syaoran_may_create_node(dentry, S_IFLNK, 0) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> + /*** SYAORAN end. ***/
> + } else
> +#endif
> + inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> if (inode) {
> int l = strlen(symname)+1;
> error = page_symlink(inode, symname, l);
> @@ -143,16 +194,130 @@ static int ramfs_symlink(struct inode *
> return error;
> }
>
> +static int ramfs_link(struct dentry *old_dentry, struct inode *dir,
> + struct dentry *dentry)
> +{
> +#ifdef CONFIG_SYAORAN
> + struct inode *inode;
> + if (dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + inode = old_dentry->d_inode;
> + if (!inode ||
> + syaoran_may_create_node(dentry, inode->i_mode, inode->i_rdev))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> +#endif
> + return simple_link(old_dentry, dir, dentry);
> +}
> +
> +static int ramfs_unlink(struct inode *dir, struct dentry *dentry)
> +{
> +#ifdef CONFIG_SYAORAN
> + if (dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + if (syaoran_may_modify_node(dentry, MAY_DELETE))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> +#endif
> + return simple_unlink(dir, dentry);
> +}
> +
> +static int ramfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> +{
> +#ifdef CONFIG_SYAORAN
> + struct inode *inode;
> + if (old_dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + inode = old_dentry->d_inode;
> + if (!inode || syaoran_may_modify_node(old_dentry, MAY_DELETE) ||
> + syaoran_may_create_node(new_dentry, inode->i_mode, inode->i_rdev))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> +#endif
> + return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
> +}
> +
> +static int ramfs_rmdir(struct inode *dir, struct dentry *dentry)
> +{
> +#ifdef CONFIG_SYAORAN
> + if (dir->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + if (syaoran_may_modify_node(dentry, MAY_DELETE))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> +#endif
> + return simple_rmdir(dir, dentry);
> +}
> +
> +/*
> + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
> + * Now I'm setting the field to share tmpfs/rootfs/syaoran code.
> + * Side effect is that the checking order of notify_change() has changed from
> + * inode_change_ok() -> security_inode_setattr() ->
> + * DQUOT_TRANSFER() -> inode_setattr()
> + * to
> + * security_inode_setattr() -> inode_change_ok() ->
> + * DQUOT_TRANSFER() -> inode_setattr()
> + *
> + * Is this change problematic? If problematic, I'll stop sharing the field.
> + */
> +static int ramfs_setattr(struct dentry *dentry, struct iattr *attr)
> +{
> + unsigned int ia_valid = attr->ia_valid;
> + struct inode *inode = dentry->d_inode;
> + int error = inode_change_ok(inode, attr);
> +#ifdef CONFIG_SYAORAN
> + unsigned int flags = 0;
> + if (inode->i_sb->s_op != &syaoran_ops)
> + goto ok;
> + /*** SYAORAN start. ***/
> + if (ia_valid & (ATTR_UID | ATTR_GID))
> + flags |= MAY_CHOWN;
> + if (ia_valid & ATTR_MODE)
> + flags |= MAY_CHMOD;
> + if (syaoran_may_modify_node(dentry, flags))
> + return -EPERM;
> + /*** SYAORAN end. ***/
> +ok:
> +#endif
> + if (!error) {
> + if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
> + (ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid))
> + error = DQUOT_TRANSFER(inode, attr) ? -EDQUOT : 0;
> + if (!error)
> + error = inode_setattr(inode, attr);
> + }
> + return error;
> +}
> +
> static const struct inode_operations ramfs_dir_inode_operations = {
> .create = ramfs_create,
> .lookup = simple_lookup,
> - .link = simple_link,
> - .unlink = simple_unlink,
> + /* Set link() hook for tracking link operation. */
> + .link = ramfs_link,
> + /* Set unlink() hook for tracking unlink operation. */
> + .unlink = ramfs_unlink,
> + /* Set symlink() hook for tracking symlink operation. */
> .symlink = ramfs_symlink,
> + /* Set mkdir() hook for tracking mkdir operation. */
> .mkdir = ramfs_mkdir,
> - .rmdir = simple_rmdir,
> + /* Set rmdir() hook for tracking rmdir operation. */
> + .rmdir = ramfs_rmdir,
> + /* Set mknod() hook for tracking mknod operation. */
> .mknod = ramfs_mknod,
> - .rename = simple_rename,
> + /* Set rename() hook for tracking rename operation. */
> + .rename = ramfs_rename,
> + /* Set setattr() hook for tracking chmod/chown operations. */
> + .setattr = ramfs_setattr,
> };
>
> static const struct super_operations ramfs_ops = {
> --- /dev/null
> +++ linux-2.6-mm/fs/ramfs/syaoran.h
> @@ -0,0 +1,75 @@
> +/*
> + * fs/ramfs/syaoran.h
> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Copyright (C) 2005-2008 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2008/01/06
> + */
> +
> +#ifndef SYAORAN_H
> +#define SYAORAN_H
> +
> +#include <linux/namei.h>
> +#include <linux/mm.h>
> +
> +static struct super_operations syaoran_ops;
> +static void init_syaoran_inode(struct inode *inode, int mode);
> +
> +static int syaoran_create_tracelog(struct super_block *sb,
> + const char *filename);
> +static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags);
> +
> +/* The following constants are used to restrict operations.*/
> +
> +#define MAY_CREATE 1 /* This file is allowed to be mknod()ed. */
> +#define MAY_DELETE 2 /* This file is allowed to be unlink()ed. */
> +#define MAY_CHMOD 4 /* This file is allowed to be chmod()ed. */
> +#define MAY_CHOWN 8 /* This file is allowed to be chown()ed. */
> +#define DEVICE_USED 16 /* This block or character device file is used. */
> +#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
> +
> +/* some random number */
> +#define SYAORAN_MAGIC 0x2F646576 /* = '/dev' */
> +
> +struct dev_entry {
> + struct list_head list;
> + /* Binary form of pathname under mount point. Never NULL. */
> + char *name;
> + /*
> + * Mode and permissions. setuid/setgid/sticky bits are not supported.
> + */
> + mode_t mode;
> + uid_t uid;
> + gid_t gid;
> + dev_t kdev;
> + /*
> + * Binary form of initial contents for the symlink.
> + * NULL if not symlink.
> + */
> + char *symlink_data;
> + /* File access control flags. */
> + unsigned int flags;
> + /* Text form of pathname under mount point. Never NULL. */
> + const char *printable_name;
> + /*
> + * Text form of initial contents for the symlink.
> + * NULL if not symlink.
> + */
> + const char *printable_symlink_data;
> +};
> +
> +struct syaoran_sb_info {
> + struct list_head list;
> + bool initialize_done; /* False if initialization is in progress. */
> + bool is_permissive_mode; /* True if permissive mode. */
> +};
> +
> +static inline struct inode *syaoran_get_inode(struct super_block *sb,
> + int mode, dev_t dev)
> +{
> + return __ramfs_get_inode(sb, mode, dev, TMPFS_WITH_MAC);
> +}
> +
> +#endif
> --- linux-2.6-mm.orig/fs/Kconfig
> +++ linux-2.6-mm/fs/Kconfig
> @@ -978,6 +978,24 @@ config TMPFS_POSIX_ACL
>
> If you don't know what Access Control Lists are, say N.
>
> +config SYAORAN
> + bool "Tamper-proof device filesystem support"
> + depends on TMPFS
> + help
> + If you mount this filesystem for /dev directory instead of tmpfs,
> + you can guarantee the following thing.
> +
> + "Applications using well-known device locations under /dev
> + get the device they want" (e.g. an application that accesses
> + /dev/null can always get a character special device
> + with major=1 and minor=3).
> +
> + The list of possible combinations of filename and its attributes
> + that can exist on this filesystem is defined at mount time
> + using a configuration file.
> +
> + If unsure, say N.
> +
> config HUGETLBFS
> bool "HugeTLB file system support"
> depends on X86 || IA64 || PPC64 || SPARC64 || (SUPERH && MMU) || BROKEN
> --- /dev/null
> +++ linux-2.6-mm/fs/ramfs/syaoran_debug.c
> @@ -0,0 +1,183 @@
> +/*
> + * fs/ramfs/syaoran_debug.c
> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Copyright (C) 2005-2008 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2008/01/06
> + */
> +/*
> + * The following structure and codes are used for transferring data
> + * to interfaces files.
> + */
> +
> +#define list_for_each_cookie(pos, cookie, head) \
> + for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
> + prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
> + (cookie) = pos, pos = pos->next)
> +
> +struct syaoran_read_struct {
> + char *buf; /* Buffer for reading. */
> + int avail; /* Bytes available for reading. */
> + struct super_block *sb; /* The super_block of this partition. */
> + struct dev_entry *entry; /* The entry currently reading from. */
> + bool read_all; /* Dump all entries? */
> + struct list_head *pos; /* Current position. */
> +};
> +
> +static void syaoran_read_table(struct syaoran_read_struct *head, char *buf,
> + int count)
> +{
> + struct super_block *sb = head->sb;
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) sb->s_fs_info;
> + struct list_head *pos;
> + const bool read_all = head->read_all;
> + if (!info)
> + return;
> + if (!head->pos)
> + return;
> + list_for_each_cookie(pos, head->pos, &info->list) {
> + struct dev_entry *entry =
> + list_entry(pos, struct dev_entry, list);
> + const unsigned int flags =
> + read_all ? entry->flags : entry->flags & ~DEVICE_USED;
> + const char *name = entry->printable_name;
> + const uid_t uid = entry->uid;
> + const gid_t gid = entry->gid;
> + const mode_t perm = entry->mode & 0777;
> + int len = 0;
> + switch (entry->mode & S_IFMT) {
> + case S_IFCHR:
> + if (!head->read_all && !(entry->flags & DEVICE_USED))
> + break;
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'c',
> + MAJOR(entry->kdev), MINOR(entry->kdev));
> + break;
> + case S_IFBLK:
> + if (!head->read_all && !(entry->flags & DEVICE_USED))
> + break;
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'b',
> + MAJOR(entry->kdev), MINOR(entry->kdev));
> + break;
> + case S_IFIFO:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'p');
> + break;
> + case S_IFSOCK:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 's');
> + break;
> + case S_IFDIR:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'd');
> + break;
> + case S_IFLNK:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %s\n",
> + name, perm, uid, gid, flags, 'l',
> + entry->printable_symlink_data);
> + break;
> + case S_IFREG:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'f');
> + break;
> + }
> + if (len < 0 || count <= len)
> + break;
> + count -= len;
> + buf += len;
> + head->avail += len;
> + }
> +}
> +
> +static int syaoran_trace_open(struct inode *inode, struct file *file)
> +{
> + struct syaoran_read_struct *head =
> + kzalloc(sizeof(*head), GFP_KERNEL);
> + if (!head)
> + return -ENOMEM;
> + head->sb = inode->i_sb;
> + head->read_all =
> + (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
> + head->pos = &((struct syaoran_sb_info *) head->sb->s_fs_info)->list;
> + head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
> + if (!head->buf) {
> + kfree(head);
> + return -ENOMEM;
> + }
> + file->private_data = head;
> + return 0;
> +}
> +
> +static int syaoran_trace_release(struct inode *inode, struct file *file)
> +{
> + struct syaoran_read_struct *head = file->private_data;
> + kfree(head->buf);
> + kfree(head);
> + file->private_data = NULL;
> + return 0;
> +}
> +
> +static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct syaoran_read_struct *head =
> + (struct syaoran_read_struct *) file->private_data;
> + int len = head->avail;
> + char *cp = head->buf;
> + if (!access_ok(VERIFY_WRITE, buf, count))
> + return -EFAULT;
> + syaoran_read_table(head, cp + len, PAGE_SIZE * 2 - len);
> + len = head->avail;
> + if (len > count)
> + len = count;
> + if (len > 0) {
> + if (copy_to_user(buf, cp, len))
> + return -EFAULT;
> + head->avail -= len;
> + memmove(cp, cp + len, head->avail);
> + }
> + return len;
> +}
> +
> +static struct file_operations syaoran_trace_operations = {
> + .open = syaoran_trace_open,
> + .release = syaoran_trace_release,
> + .read = syaoran_trace_read,
> +};
> +
> +/* Create interface files for reading status. */
> +static int syaoran_create_tracelog(struct super_block *sb, const char *filename)
> +{
> + struct inode *inode;
> + struct dentry *base = dget(sb->s_root);
> + struct dentry *dentry = lookup_create2(filename, base, 0);
> + int error = PTR_ERR(dentry);
> + if (IS_ERR(dentry))
> + goto out;
> + inode = syaoran_get_inode(sb, S_IFREG | 0400, 0);
> + if (!inode)
> + error = -ENOSPC;
> + else {
> + /* Override file operation. */
> + inode->i_fop = &syaoran_trace_operations;
> + d_instantiate(dentry, inode);
> + dget(dentry); /* Extra count - pin the dentry in core */
> + error = 0;
> + }
> + dput(dentry);
> +out:
> + mutex_unlock(&base->d_inode->i_mutex);
> + dput(base);
> + return error;
> +}
> --- /dev/null
> +++ linux-2.6-mm/fs/ramfs/syaoran_init.c
> @@ -0,0 +1,568 @@
> +/*
> + * fs/ramfs/syaoran_init.c
> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Copyright (C) 2005-2008 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2008/01/06
> + */
> +
> +/*
> + * The following codes are used for processing the policy file and
> + * creating initial nodes at mount time.
> + */
> +
> +/* lookup_create() without nameidata */
> +static struct dentry *lookup_create2(const char *name, struct dentry *base,
> + const bool is_dir)
> +{
> + struct dentry *dentry;
> + const int len = name ? strlen(name) : 0;
> + mutex_lock(&base->d_inode->i_mutex);
> + dentry = lookup_one_len(name, base, len);
> + if (IS_ERR(dentry))
> + goto fail;
> + if (!is_dir && name[len] && !dentry->d_inode)
> + goto enoent;
> + return dentry;
> +enoent:
> + dput(dentry);
> + dentry = ERR_PTR(-ENOENT);
> +fail:
> + return dentry;
> +}
> +
> +static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
> + uid_t user, gid_t group)
> +{
> + struct dentry *dentry = lookup_create2(pathname, base, 1);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_mkdir(base->d_inode, dentry, mode);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +static int fs_mknod(const char *filename, struct dentry *base, int mode,
> + dev_t dev, uid_t user, gid_t group)
> +{
> + struct dentry *dentry;
> + int error;
> + switch (mode & S_IFMT) {
> + case S_IFCHR:
> + case S_IFBLK:
> + case S_IFIFO:
> + case S_IFSOCK:
> + case S_IFREG:
> + break;
> + default:
> + return -EPERM;
> + }
> + dentry = lookup_create2(filename, base, 0);
> + error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_mknod(base->d_inode, dentry, mode, dev);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +static int fs_symlink(const char *pathname, struct dentry *base,
> + char *oldname, int mode, uid_t user, gid_t group)
> +{
> + struct dentry *dentry = lookup_create2(pathname, base, 0);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_mode = mode;
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/*
> + * Format string.
> + * Leading and trailing whitespaces are removed.
> + * Multiple whitespaces are packed into single space.
> + */
> +static void syaoran_normalize_line(unsigned char *buffer)
> +{
> + unsigned char *sp = buffer;
> + unsigned char *dp = buffer;
> + bool first = 1;
> + while (*sp && (*sp <= ' ' || *sp >= 127))
> + sp++;
> + while (*sp) {
> + if (!first)
> + *dp++ = ' ';
> + first = 0;
> + while (*sp > ' ' && *sp < 127)
> + *dp++ = *sp++;
> + while (*sp && (*sp <= ' ' || *sp >= 127))
> + sp++;
> + }
> + *dp = '\0';
> +}
> +
> +/* Convert text form of filename into binary form. */
> +static void syaoran_unescape(char *filename)
> +{
> + char *cp = filename;
> + char c, d, e;
> + if (!cp)
> + return;
> + while ((c = *filename++) != '\0') {
> + if (c != '\\') {
> + *cp++ = c;
> + continue;
> + }
> + if ((c = *filename++) == '\\') {
> + *cp++ = c;
> + continue;
> + }
> + if (c < '0' || c > '3')
> + break;
> + d = *filename++;
> + if (d < '0' || d > '7')
> + break;
> + e = *filename++;
> + if (e < '0' || e > '7')
> + break;
> + *(unsigned char *) cp++ = (unsigned char)
> + (((unsigned char) (c - '0') << 6) +
> + ((unsigned char) (d - '0') << 3) +
> + (unsigned char) (e - '0'));
> + }
> + *cp = '\0';
> +}
> +
> +static inline char *strdup(const char *data)
> +{
> + return kstrdup(data, GFP_KERNEL);
> +}
> +
> +static int register_node_info(char *buffer, struct super_block *sb)
> +{
> + enum {
> + ARG_FILENAME = 0,
> + ARG_PERMISSION = 1,
> + ARG_UID = 2,
> + ARG_GID = 3,
> + ARG_FLAGS = 4,
> + ARG_DEV_TYPE = 5,
> + ARG_SYMLINK_DATA = 6,
> + ARG_DEV_MAJOR = 6,
> + ARG_DEV_MINOR = 7,
> + MAX_ARG = 8
> + };
> + char *args[MAX_ARG];
> + int i;
> + int error = -EINVAL;
> + unsigned int perm, uid, gid, flags, major = 0, minor = 0;
> + struct syaoran_sb_info *info = (struct syaoran_sb_info *) sb->s_fs_info;
> + struct dev_entry *entry;
> + memset(args, 0, sizeof(args));
> + args[0] = buffer;
> + for (i = 1; i < MAX_ARG; i++) {
> + args[i] = strchr(args[i - 1] + 1, ' ');
> + if (!args[i])
> + break;
> + *args[i]++ = '\0';
> + }
> + /*
> + printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
> + args[0], args[1], args[2], args[3], args[4], args[5],
> + args[6], args[7]);
> + */
> + if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
> + !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
> + goto out;
> + if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
> + || sscanf(args[ARG_UID], "%u", &uid) != 1
> + || sscanf(args[ARG_GID], "%u", &gid) != 1
> + || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
> + || *(args[ARG_DEV_TYPE] + 1))
> + goto out;
> + switch (*args[ARG_DEV_TYPE]) {
> + case 'c':
> + perm |= S_IFCHR;
> + if (!args[ARG_DEV_MAJOR]
> + || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
> + || !args[ARG_DEV_MINOR]
> + || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
> + goto out;
> + break;
> + case 'b':
> + perm |= S_IFBLK;
> + if (!args[ARG_DEV_MAJOR]
> + || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
> + || !args[ARG_DEV_MINOR]
> + || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
> + goto out;
> + break;
> + case 'l':
> + perm |= S_IFLNK;
> + if (!args[ARG_SYMLINK_DATA])
> + goto out;
> + break;
> + case 'd':
> + perm |= S_IFDIR;
> + break;
> + case 's':
> + perm |= S_IFSOCK;
> + break;
> + case 'p':
> + perm |= S_IFIFO;
> + break;
> + case 'f':
> + perm |= S_IFREG;
> + break;
> + default:
> + goto out;
> + }
> + error = -ENOMEM;
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry)
> + goto out;
> + if (S_ISLNK(perm)) {
> + entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
> + if (!entry->printable_symlink_data)
> + goto out_freemem;
> + }
> + entry->printable_name = strdup(args[ARG_FILENAME]);
> + if (!entry->printable_name)
> + goto out_freemem;
> + if (S_ISLNK(perm)) {
> + entry->symlink_data = strdup(entry->printable_symlink_data);
> + if (!entry->symlink_data)
> + goto out_freemem;
> + syaoran_unescape(entry->symlink_data);
> + }
> + entry->name = strdup(entry->printable_name);
> + if (!entry->name)
> + goto out_freemem;
> + syaoran_unescape(entry->name);
> + /*
> + * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
> + * trailing '/'.
> + */
> + i = strlen(entry->name);
> + if (i && entry->name[i - 1] == '/')
> + entry->name[i - 1] = '\0';
> + entry->mode = perm;
> + entry->uid = uid;
> + entry->gid = gid;
> + entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
> + entry->flags = flags;
> + list_add_tail(&entry->list, &info->list);
> + /* printk("Entry added.\n"); */
> + error = 0;
> +out:
> + return error;
> +out_freemem:
> + kfree(entry->printable_symlink_data);
> + kfree(entry->printable_name);
> + kfree(entry->symlink_data);
> + kfree(entry);
> + goto out;
> +}
> +
> +static int read_config_file(struct file *file, struct super_block *sb)
> +{
> + char *buffer;
> + int error = -ENOMEM;
> + if (!file)
> + return -EINVAL;
> + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (buffer) {
> + int len;
> + char *cp;
> + unsigned long offset = 0;
> + while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
> + && (cp = memchr(buffer, '\n', len)) != NULL) {
> + *cp = '\0';
> + offset += cp - buffer + 1;
> + syaoran_normalize_line(buffer);
> + if (register_node_info(buffer, sb) == -ENOMEM)
> + goto out;
> + }
> + error = 0;
> + }
> +out:
> + kfree(buffer);
> + return error;
> +}
> +
> +static void make_node(struct dev_entry *entry, struct dentry *root)
> +{
> + struct dentry *base = dget(root);
> + char *filename = entry->name;
> + char *name = filename;
> + unsigned int c;
> + const mode_t perm = entry->mode;
> + const uid_t uid = entry->uid;
> + const gid_t gid = entry->gid;
> + goto start;
> + while ((c = *(unsigned char *) filename) != '\0') {
> + if (c == '/') {
> + struct dentry *new_base;
> + const int len = filename - name;
> + *filename = '\0';
> + mutex_lock(&base->d_inode->i_mutex);
> + new_base = lookup_one_len(name, base, len);
> + mutex_unlock(&base->d_inode->i_mutex);
> + dput(base);
> + *filename = '/';
> + filename++;
> + if (IS_ERR(new_base))
> + return;
> + if (!new_base->d_inode ||
> + !S_ISDIR(new_base->d_inode->i_mode)) {
> + dput(new_base);
> + return;
> + }
> + base = new_base;
> +start:
> + name = filename;
> + } else {
> + filename++;
> + }
> + }
> + filename = (char *) name;
> + if (S_ISLNK(perm)) {
> + fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
> + } else if (S_ISDIR(perm)) {
> + fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
> + } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
> + fs_mknod(filename, base, perm, 0, uid, gid);
> + } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
> + fs_mknod(filename, base, perm, entry->kdev, uid, gid);
> + }
> + dput(base);
> +}
> +
> +/* Create files according to the policy file. */
> +static void syaoran_make_initial_nodes(struct super_block *sb)
> +{
> + struct syaoran_sb_info *info;
> + struct dev_entry *entry;
> + if (!sb)
> + return;
> + info = (struct syaoran_sb_info *) sb->s_fs_info;
> + if (!info)
> + return;
> + if (info->is_permissive_mode) {
> + syaoran_create_tracelog(sb, ".syaoran");
> + syaoran_create_tracelog(sb, ".syaoran_all");
> + }
> + list_for_each_entry(entry, &info->list, list) {
> + if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
> + make_node(entry, sb->s_root);
> + }
> + info->initialize_done = 1;
> +}
> +
> +/* Read policy file. */
> +static int syaoran_initialize(struct super_block *sb, void *data)
> +{
> + int error = -EINVAL;
> + struct file *f;
> + char *filename = (char *) data;
> + bool is_permissive_mode = 0;
> + struct syaoran_sb_info *p;
> + static bool first = 1;
> + if (first) {
> + first = 0;
> + printk(KERN_INFO "SYAORAN: 1.5.3-pre 2008/01/06\n");
> + }
> + if (!filename) {
> + printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
> + return -EINVAL;
> + } else if (strncmp(filename, "accept=", 7) == 0) {
> + filename += 7;
> + is_permissive_mode = 1;
> + } else if (strncmp(filename, "enforce=", 8) == 0) {
> + filename += 8;
> + is_permissive_mode = 0;
> + } else {
> + printk(KERN_INFO "SYAORAN: Missing 'accept=' or 'enforce='.\n");
> + return -EINVAL;
> + }
> + f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
> + if (IS_ERR(f)) {
> + printk(KERN_INFO "SYAORAN: Can't open '%s'\n", filename);
> + return -EINVAL;
> + }
> + if (!S_ISREG(f->f_dentry->d_inode->i_mode))
> + goto out;
> + p = kzalloc(sizeof(*p), GFP_KERNEL);
> + if (!p)
> + goto out;
> + p->is_permissive_mode = is_permissive_mode;
> + sb->s_fs_info = p;
> + INIT_LIST_HEAD(&((struct syaoran_sb_info *) sb->s_fs_info)->list);
> + printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
> + error = read_config_file(f, sb);
> +out:
> + if (error)
> + printk(KERN_INFO "SYAORAN: Can't read '%s'\n", filename);
> + filp_close(f, NULL);
> + return error;
> +}
> +
> +static int syaoran_fill_super(struct super_block *sb, void *data, int silent)
> +{
> + struct inode *inode;
> + struct dentry *root;
> + int error;
> +
> + sb->s_maxbytes = MAX_LFS_FILESIZE;
> + sb->s_blocksize = PAGE_CACHE_SIZE;
> + sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
> + sb->s_magic = SYAORAN_MAGIC;
> + sb->s_op = &syaoran_ops;
> + sb->s_time_gran = 1;
> + error = syaoran_initialize(sb, data);
> + if (error < 0)
> + return error;
> + inode = syaoran_get_inode(sb, S_IFDIR | 0755, 0);
> + if (!inode)
> + return -ENOMEM;
> +
> + root = d_alloc_root(inode);
> + if (!root) {
> + iput(inode);
> + return -ENOMEM;
> + }
> + sb->s_root = root;
> + syaoran_make_initial_nodes(sb);
> + return 0;
> +}
> +
> +static int syaoran_get_sb(struct file_system_type *fs_type, int flags,
> + const char *dev_name, void *data,
> + struct vfsmount *mnt)
> +{
> + return get_sb_nodev(fs_type, flags, data, syaoran_fill_super, mnt);
> +}
> +
> +static void syaoran_put_super(struct super_block *sb)
> +{
> + struct syaoran_sb_info *info;
> + struct dev_entry *entry;
> + struct dev_entry *tmp;
> + if (!sb)
> + return;
> + info = (struct syaoran_sb_info *) sb->s_fs_info;
> + if (!info)
> + return;
> + list_for_each_entry_safe(entry, tmp, &info->list, list) {
> + kfree(entry->name);
> + kfree(entry->symlink_data);
> + kfree(entry->printable_name);
> + kfree(entry->printable_symlink_data);
> + list_del(&entry->list);
> + /* printk("Entry removed.\n"); */
> + kfree(entry);
> + }
> + kfree(info);
> + sb->s_fs_info = NULL;
> + printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
> +}
> +
> +static struct file_system_type syaoran_fs_type = {
> + .name = "syaoran",
> + .get_sb = syaoran_get_sb,
> + .kill_sb = kill_litter_super,
> +};
> +
> +static struct file_operations wrapped_def_blk_fops;
> +static struct file_operations wrapped_def_chr_fops;
> +static struct inode_operations syaoran_file_inode_operations;
> +static struct inode_operations syaoran_symlink_inode_operations;
> +static int ramfs_setattr(struct dentry *dentry, struct iattr *attr);
> +static const struct super_operations ramfs_ops;
> +
> +static void init_syaoran_inode(struct inode *inode, int mode)
> +{
> + /* Set open() hook for tracking open request. */
> + if (S_ISBLK(mode))
> + inode->i_fop = &wrapped_def_blk_fops;
> + else if (S_ISCHR(mode))
> + inode->i_fop = &wrapped_def_chr_fops;
> + /*
> + * Set setattr() hook for tracking chmod/chwon request.
> + * The setattr() hook of derectory is already set by
> + * ramfs_dir_inode_operations.
> + */
> + if (S_ISLNK(mode))
> + inode->i_op = &syaoran_symlink_inode_operations;
> + else
> + inode->i_op = &syaoran_file_inode_operations;
> +}
> +
> +static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
> +{
> + int error = def_blk_fops.open(inode, filp);
> + if (error != -ENXIO)
> + syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
> + return error;
> +}
> +
> +static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
> +{
> + int error = def_chr_fops.open(inode, filp);
> + if (error != -ENXIO)
> + syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
> + return error;
> +}
> +
> +static int __init init_syaoran_fs(void)
> +{
> + /* Set open() hook for tracking open operation of block devices. */
> + wrapped_def_blk_fops = def_blk_fops;
> + wrapped_def_blk_fops.open = wrapped_blkdev_open;
> + /* Set open() hook for tracking open operation of character devices. */
> + wrapped_def_chr_fops = def_chr_fops;
> + wrapped_def_chr_fops.open = wrapped_chrdev_open;
> + /* Set setattr() hook for tracking chmod/chown operations of file. */
> + syaoran_file_inode_operations = ramfs_file_inode_operations;
> + syaoran_file_inode_operations.setattr = ramfs_setattr;
> + /* Set setattr() hook for tracking chmod/chown operations of symlink. */
> + syaoran_symlink_inode_operations = page_symlink_inode_operations;
> + syaoran_symlink_inode_operations.setattr = ramfs_setattr;
> + /* Set umount() hook for freeing memory. */
> + syaoran_ops = ramfs_ops;
> + syaoran_ops.put_super = syaoran_put_super;
> + return register_filesystem(&syaoran_fs_type);
> +}
> +
> +static void __exit exit_syaoran_fs(void)
> +{
> + unregister_filesystem(&syaoran_fs_type);
> +}
> +module_init(init_syaoran_fs);
> +module_exit(exit_syaoran_fs);
> --- /dev/null
> +++ linux-2.6-mm/fs/ramfs/syaoran_main.c
> @@ -0,0 +1,207 @@
> +/*
> + * fs/ramfs/syaoran_main.c
> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Copyright (C) 2005-2008 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2008/01/06
> + */
> +
> +/* Get absolute pathname from mount point. */
> +static int get_local_absolute_path(struct dentry *dentry, char *buffer,
> + int buflen)
> +{
> + char *start = buffer;
> + char *end = buffer + buflen;
> + int namelen;
> +
> + if (buflen < 256)
> + goto out;
> +
> + *--end = '\0';
> + buflen--;
> + for (;;) {
> + struct dentry *parent;
> + if (IS_ROOT(dentry))
> + break;
> + parent = dentry->d_parent;
> + namelen = dentry->d_name.len;
> + buflen -= namelen + 1;
> + if (buflen < 0)
> + goto out;
> + end -= namelen;
> + memcpy(end, dentry->d_name.name, namelen);
> + *--end = '/';
> + dentry = parent;
> + }
> + if (*end == '/') {
> + buflen++;
> + end++;
> + }
> + namelen = dentry->d_name.len;
> + buflen -= namelen;
> + if (buflen < 0)
> + goto out;
> + end -= namelen;
> + memcpy(end, dentry->d_name.name, namelen);
> + memmove(start, end, strlen(end) + 1);
> + return 0;
> +out:
> + return -ENOMEM;
> +}
> +
> +/* Get absolute pathname of the given dentry from mount point. */
> +static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
> + int newname_len)
> +{
> + int error;
> + struct dentry *d_dentry;
> + if (!dentry || !newname || newname_len <= 0)
> + return -EINVAL;
> + d_dentry = dget(dentry);
> + /***** CRITICAL SECTION START *****/
> + spin_lock(&dcache_lock);
> + error = get_local_absolute_path(d_dentry, newname, newname_len);
> + spin_unlock(&dcache_lock);
> + /***** CRITICAL SECTION END *****/
> + dput(d_dentry);
> + return error;
> +}
> +
> +static int syaoran_check_flags(struct syaoran_sb_info *info,
> + struct dentry *dentry, int mode, int dev,
> + unsigned int flags)
> +{
> + int error = -EPERM;
> + struct dev_entry *entry;
> + /*
> + * Since local_realpath_from_dentry() holds dcache_lock,
> + * allocating buffer using kmalloc() won't help improving concurrency.
> + * Therefore, I use static buffer here.
> + */
> + static char filename[PAGE_SIZE];
> + static DEFINE_SPINLOCK(lock);
> + spin_lock(&lock);
> + memset(filename, 0, sizeof(filename));
> + if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1))
> + goto out;
> + list_for_each_entry(entry, &info->list, list) {
> + if ((mode & S_IFMT) != (entry->mode & S_IFMT))
> + continue;
> + if ((S_ISBLK(mode) || S_ISCHR(mode)) && dev != entry->kdev)
> + continue;
> + if (strcmp(entry->name, filename + 1))
> + continue;
> + if (info->is_permissive_mode) {
> + entry->flags |= flags;
> + error = 0;
> + } else {
> + if ((entry->flags & flags) == flags)
> + error = 0;
> + }
> + break;
> + }
> +out:
> + if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {
> + const char *name;
> + const uid_t uid = current->fsuid;
> + const gid_t gid = current->fsgid;
> + const mode_t perm = mode & 0777;
> + flags &= ~DEVICE_USED;
> + {
> + char *end = filename + sizeof(filename) - 1;
> + const char *cp = strchr(filename, '\0') - 1;
> + while (cp > filename) {
> + const unsigned char c = *cp--;
> + if (c == '\\') {
> + *--end = '\\';
> + *--end = '\\';
> + } else if (c > ' ' && c < 127) {
> + *--end = c;
> + } else {
> + *--end = (c & 7) + '0';
> + *--end = ((c >> 3) & 7) + '0';
> + *--end = (c >> 6) + '0';
> + *--end = '\\';
> + }
> + }
> + name = end;
> + }
> + switch (mode & S_IFMT) {
> + case S_IFCHR:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'c',
> + MAJOR(dev), MINOR(dev));
> + break;
> + case S_IFBLK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'b',
> + MAJOR(dev), MINOR(dev));
> + break;
> + case S_IFIFO:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'p');
> + break;
> + case S_IFSOCK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 's');
> + break;
> + case S_IFDIR:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'd');
> + break;
> + case S_IFLNK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
> + name, perm, uid, gid, flags, 'l', "unknown");
> + break;
> + case S_IFREG:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'f');
> + break;
> + }
> + }
> + spin_unlock(&lock);
> + return error;
> +}
> +
> +/* Check whether the given dentry is allowed to mknod. */
> +static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev)
> +{
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
> + if (!info) {
> + printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
> + __FUNCTION__);
> + return -EPERM;
> + }
> + if (!info->initialize_done)
> + return 0;
> + return syaoran_check_flags(info, dentry, mode, dev, MAY_CREATE);
> +}
> +
> +/* Check whether the given dentry is allowed to chmod/chown/unlink. */
> +static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags)
> +{
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
> + if (!info) {
> + printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
> + __FUNCTION__);
> + return -EPERM;
> + }
> + if (flags == DEVICE_USED && !info->is_permissive_mode)
> + return 0;
> + if (!dentry->d_inode)
> + return -ENOENT;
> + return syaoran_check_flags(info, dentry, dentry->d_inode->i_mode,
> + dentry->d_inode->i_rdev, flags);
> +}
> +
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-01-06 07:36:20

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Willy Tarreau wrote:
> Your patch is very confusing. In your description, as well as in the
> comments you talk about tmpfs, but your patch does not touch even one
> line of tmpfs and only changes ramfs. Even your variables and arguments
> refer to tmpfs. The Kconfig entry indicates that the feature depends
> on TMPFS too.
>
> Judging from the following comment :
> * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
>
> I suspect that you confuse both filesystems.
> - ramfs is in fs/ramfs and is always compiled in, you cannot disable it
> - tmpfs is in mm/shmem.c and is optional. It also supports options that
> ramfs does not (eg: size) and data may be swapped.
>
> Please understand that I'm not discussing the usefulness of your patch,
> I'm just trying to avoid a huge confusion.

Oh, I thought the filesystem mounted by "mount -t tmpfs none /tmp" is "tmpfs"
and the source code of "tmpfs" is located in fs/ramfs directory.
So, I should write the description as "an extension to ramfs" rather than
"an extension to tmpfs".
I'll fix it in next posting.

Thank you.

2008-01-06 07:54:31

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

On Sun, Jan 06, 2008 at 04:36:06PM +0900, Tetsuo Handa wrote:
> Hello.
>
> Willy Tarreau wrote:
> > Your patch is very confusing. In your description, as well as in the
> > comments you talk about tmpfs, but your patch does not touch even one
> > line of tmpfs and only changes ramfs. Even your variables and arguments
> > refer to tmpfs. The Kconfig entry indicates that the feature depends
> > on TMPFS too.
> >
> > Judging from the following comment :
> > * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
> >
> > I suspect that you confuse both filesystems.
> > - ramfs is in fs/ramfs and is always compiled in, you cannot disable it
> > - tmpfs is in mm/shmem.c and is optional. It also supports options that
> > ramfs does not (eg: size) and data may be swapped.
> >
> > Please understand that I'm not discussing the usefulness of your patch,
> > I'm just trying to avoid a huge confusion.
>
> Oh, I thought the filesystem mounted by "mount -t tmpfs none /tmp" is "tmpfs"

Yes, that is a tmpfs.

> and the source code of "tmpfs" is located in fs/ramfs directory.

No, ramfs is what you get by "mount -t ramfs none /tmp" :-)
You will notice that "df" will not report your ramfs by default because it
reports zero blocks. But "mount" or "df /tmp" will report it.

> So, I should write the description as "an extension to ramfs" rather than
> "an extension to tmpfs".

and please also the comments, macros and variable names in the code, as they
are what confused me first.

> I'll fix it in next posting.

Thanks,
Willy

2008-01-06 15:20:28

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Changes from previous posting:

(1) I rebased this patch using tmpfs.

I didn't know I was making this patch using ramfs...

This patch is for 2.6.24-rc6-mm1.

Regards.
----------
Subject: Simple tamper-proof device filesystem.

The goal of this filesystem is to guarantee that
"applications using well-known device locations under /dev
get the device they want" (e.g. an application that accesses /dev/null can
always get a character special device with major=1 and minor=3).

This idea sounds silly? Indeed, if you think the root can do whatever
he/she wants do do. But this filesystem makes sense when used with
access control mechanisms like MAC (mandatory access control).
I want to use this filesystem in case where a process with root privilege was
hijacked but the behavior of the hijacked process is still restricted by MAC.

Why not use FUSE?

Because /dev has to be available through the lifetime of the kernel.
It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

Because SELinux doesn't guarantee filename and its attribute.
As far as I know, no MAC implementation can handle filename and its attribute.
I guess this is because

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

I want to add functionality that the MACs are missing.
Instead of adding this functionality per MAC,
I propose to add it as ground work, to be combined with any MAC.

Why not drop CAP_MKNOD?

Dropping CAP_MKNOD is not enough for emulating this filesystem because
a process can still rename()/unlink() to break filename and its attributes
handling (e.g. mv /dev/sda1 /dev/sda1.tmp; mv /dev/sda2 /dev/sda1;
mv /dev/sda1.tmp /dev/sda2 or unlink /dev/null; touch /dev/null ).

This time, I'm implementing this filesystem as an extension to tmpfs
because what this filesystem does are nothing but check filename and
its attributes in addition to what tmpfs does.

Signed-off-by: Tetsuo Handa <[email protected]>
---
fs/Kconfig | 18 +
include/linux/shmem_fs.h | 5
mm/shmem.c | 124 +++++++++++
mm/shmem_mac.h | 57 +++++
mm/shmem_mac_debug.c | 183 +++++++++++++++++
mm/shmem_mac_init.c | 486 +++++++++++++++++++++++++++++++++++++++++++++++
mm/shmem_mac_main.c | 205 +++++++++++++++++++
7 files changed, 1077 insertions(+), 1 deletion(-)

--- linux-2.6-mm.orig/mm/shmem.c
+++ linux-2.6-mm/mm/shmem.c
@@ -736,11 +736,39 @@ static void shmem_truncate(struct inode
shmem_truncate_range(inode, inode->i_size, (loff_t)-1);
}

+#ifdef CONFIG_SYAORAN
+#include "shmem_mac.h"
+#include "shmem_mac_init.c"
+#include "shmem_mac_main.c"
+#include "shmem_mac_debug.c"
+
+static bool with_mac(struct super_block *sb)
+{
+ return sb->s_type == &syaoran_fs_type;
+}
+#else
+static inline bool with_mac(struct super_block *sb)
+{
+ return 0;
+}
+#endif
+
static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = dentry->d_inode;
struct page *page = NULL;
int error;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(inode->i_sb)) {
+ unsigned int flags = 0;
+ if (attr->ia_valid & (ATTR_UID | ATTR_GID))
+ flags |= MAY_CHOWN;
+ if (attr->ia_valid & ATTR_MODE)
+ flags |= MAY_CHMOD;
+ if (syaoran_may_modify_node(dentry, flags))
+ return -EPERM;
+ }
+#endif

if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
if (attr->ia_size < inode->i_size) {
@@ -1515,6 +1543,10 @@ shmem_get_inode(struct super_block *sb,
default:
inode->i_op = &shmem_special_inode_operations;
init_special_inode(inode, mode, dev);
+#ifdef CONFIG_SYAORAN
+ if (with_mac(sb))
+ init_syaoran_inode(inode, mode);
+#endif
break;
case S_IFREG:
inode->i_op = &shmem_inode_operations;
@@ -1739,8 +1771,15 @@ static int shmem_statfs(struct dentry *d
static int
shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{
- struct inode *inode = shmem_get_inode(dir->i_sb, mode, dev);
+ struct inode *inode;
int error = -ENOSPC;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(dir->i_sb)) {
+ if (syaoran_may_create_node(dentry, mode, dev) < 0)
+ return -EPERM;
+ }
+#endif
+ inode = shmem_get_inode(dir->i_sb, mode, dev);

if (inode) {
error = security_inode_init_security(inode, dir, NULL, NULL,
@@ -1792,6 +1831,13 @@ static int shmem_link(struct dentry *old
{
struct inode *inode = old_dentry->d_inode;
int ret;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(inode->i_sb)) {
+ if (syaoran_may_create_node(dentry, inode->i_mode,
+ inode->i_rdev))
+ return -EPERM;
+ }
+#endif

/*
* No ordinary (disk based) filesystem counts links as inodes;
@@ -1815,6 +1861,12 @@ out:
static int shmem_unlink(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(inode->i_sb)) {
+ if (syaoran_may_modify_node(dentry, MAY_DELETE))
+ return -EPERM;
+ }
+#endif

if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
shmem_free_inode(inode->i_sb);
@@ -1830,6 +1882,12 @@ static int shmem_rmdir(struct inode *dir
{
if (!simple_empty(dentry))
return -ENOTEMPTY;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(dir->i_sb)) {
+ if (syaoran_may_modify_node(dentry, MAY_DELETE))
+ return -EPERM;
+ }
+#endif

drop_nlink(dentry->d_inode);
drop_nlink(dir);
@@ -1849,6 +1907,14 @@ static int shmem_rename(struct inode *ol

if (!simple_empty(new_dentry))
return -ENOTEMPTY;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(inode->i_sb)) {
+ if (syaoran_may_modify_node(old_dentry, MAY_DELETE) ||
+ syaoran_may_create_node(new_dentry, inode->i_mode,
+ inode->i_rdev))
+ return -EPERM;
+ }
+#endif

if (new_dentry->d_inode) {
(void) shmem_unlink(new_dir, new_dentry);
@@ -1880,6 +1946,12 @@ static int shmem_symlink(struct inode *d
if (len > PAGE_CACHE_SIZE)
return -ENAMETOOLONG;

+#ifdef CONFIG_SYAORAN
+ if (with_mac(dir->i_sb)) {
+ if (syaoran_may_create_node(dentry, S_IFLNK, 0) < 0)
+ return -EPERM;
+ }
+#endif
inode = shmem_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
if (!inode)
return -ENOSPC;
@@ -1952,6 +2024,9 @@ static void shmem_put_link(struct dentry
static const struct inode_operations shmem_symlink_inline_operations = {
.readlink = generic_readlink,
.follow_link = shmem_follow_link_inline,
+#ifdef CONFIG_SYAORAN
+ .setattr = shmem_notify_change,
+#endif
};

static const struct inode_operations shmem_symlink_inode_operations = {
@@ -1959,6 +2034,9 @@ static const struct inode_operations shm
.readlink = generic_readlink,
.follow_link = shmem_follow_link,
.put_link = shmem_put_link,
+#ifdef CONFIG_SYAORAN
+ .setattr = shmem_notify_change,
+#endif
};

#ifdef CONFIG_TMPFS_POSIX_ACL
@@ -2152,6 +2230,13 @@ static int shmem_parse_options(char *opt
} else if (!strcmp(this_char,"mpol")) {
if (shmem_parse_mpol(value,policy,policy_nodes))
goto bad_val;
+#ifdef CONFIG_SYAORAN
+ /* These options are interpreted by SYAORAN filesystem. */
+ } else if (!strcmp(this_char, "accept")) {
+ this_char[6] = '=';
+ } else if (!strcmp(this_char, "enforce")) {
+ this_char[7] = '=';
+#endif
} else {
printk(KERN_ERR "tmpfs: Bad mount option %s\n",
this_char);
@@ -2215,6 +2300,24 @@ out:

static void shmem_put_super(struct super_block *sb)
{
+#ifdef CONFIG_SYAORAN
+ struct shmem_sb_info *info = SHMEM_SB(sb);
+ struct dev_entry *entry;
+ struct dev_entry *tmp;
+ if (!with_mac(sb))
+ goto no_mac;
+ list_for_each_entry_safe(entry, tmp, &info->list, list) {
+ kfree(entry->name);
+ kfree(entry->symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->printable_symlink_data);
+ list_del(&entry->list);
+ /* printk("Entry removed.\n"); */
+ kfree(entry);
+ }
+ printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
+no_mac:
+#endif
kfree(sb->s_fs_info);
sb->s_fs_info = NULL;
}
@@ -2279,6 +2382,15 @@ static int shmem_fill_super(struct super
sb->s_xattr = shmem_xattr_handlers;
sb->s_flags |= MS_POSIXACL;
#endif
+#ifdef CONFIG_SYAORAN
+ if (with_mac(sb)) {
+ int error = syaoran_initialize(sb, data);
+ if (error) {
+ err = error;
+ goto failed;
+ }
+ }
+#endif

inode = shmem_get_inode(sb, S_IFDIR | mode, 0);
if (!inode)
@@ -2289,6 +2401,10 @@ static int shmem_fill_super(struct super
if (!root)
goto failed_iput;
sb->s_root = root;
+#ifdef CONFIG_SYAORAN
+ if (with_mac(sb))
+ syaoran_make_initial_nodes(sb);
+#endif
return 0;

failed_iput:
@@ -2401,6 +2517,9 @@ static const struct inode_operations shm
.removexattr = generic_removexattr,
.permission = shmem_permission,
#endif
+#ifdef CONFIG_SYAORAN
+ .setattr = shmem_notify_change,
+#endif
};

static const struct inode_operations shmem_special_inode_operations = {
@@ -2412,6 +2531,9 @@ static const struct inode_operations shm
.removexattr = generic_removexattr,
.permission = shmem_permission,
#endif
+#ifdef CONFIG_SYAORAN
+ .setattr = shmem_notify_change,
+#endif
};

static const struct super_operations shmem_ops = {
--- /dev/null
+++ linux-2.6-mm/mm/shmem_mac.h
@@ -0,0 +1,57 @@
+/*
+ * mm/shm_mac.h
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+
+#include <linux/namei.h>
+#include <linux/mm.h>
+
+static void init_syaoran_inode(struct inode *inode, int mode);
+
+static int syaoran_create_tracelog(struct super_block *sb,
+ const char *filename);
+static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags);
+
+static struct inode *
+shmem_get_inode(struct super_block *sb, int mode, dev_t dev);
+
+/* The following constants are used to restrict operations.*/
+
+#define MAY_CREATE 1 /* This file is allowed to be mknod()ed. */
+#define MAY_DELETE 2 /* This file is allowed to be unlink()ed. */
+#define MAY_CHMOD 4 /* This file is allowed to be chmod()ed. */
+#define MAY_CHOWN 8 /* This file is allowed to be chown()ed. */
+#define DEVICE_USED 16 /* This block or character device file is used. */
+#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
+
+struct dev_entry {
+ struct list_head list;
+ /* Binary form of pathname under mount point. Never NULL. */
+ char *name;
+ /*
+ * Mode and permissions. setuid/setgid/sticky bits are not supported.
+ */
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ dev_t kdev;
+ /*
+ * Binary form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ char *symlink_data;
+ /* File access control flags. */
+ unsigned int flags;
+ /* Text form of pathname under mount point. Never NULL. */
+ const char *printable_name;
+ /*
+ * Text form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ const char *printable_symlink_data;
+};
--- /dev/null
+++ linux-2.6-mm/mm/shmem_mac_debug.c
@@ -0,0 +1,183 @@
+/*
+ * mm/shmem_mac_debug.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+/*
+ * The following structure and codes are used for transferring data
+ * to interfaces files.
+ */
+
+#define list_for_each_cookie(pos, cookie, head) \
+ for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
+ prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
+ (cookie) = pos, pos = pos->next)
+
+struct syaoran_read_struct {
+ char *buf; /* Buffer for reading. */
+ int avail; /* Bytes available for reading. */
+ struct super_block *sb; /* The super_block of this partition. */
+ struct dev_entry *entry; /* The entry currently reading from. */
+ bool read_all; /* Dump all entries? */
+ struct list_head *pos; /* Current position. */
+};
+
+static void syaoran_read_table(struct syaoran_read_struct *head, char *buf,
+ int count)
+{
+ struct super_block *sb = head->sb;
+ struct shmem_sb_info *info =
+ (struct shmem_sb_info *) sb->s_fs_info;
+ struct list_head *pos;
+ const bool read_all = head->read_all;
+ if (!info)
+ return;
+ if (!head->pos)
+ return;
+ list_for_each_cookie(pos, head->pos, &info->list) {
+ struct dev_entry *entry =
+ list_entry(pos, struct dev_entry, list);
+ const unsigned int flags =
+ read_all ? entry->flags : entry->flags & ~DEVICE_USED;
+ const char *name = entry->printable_name;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ const mode_t perm = entry->mode & 0777;
+ int len = 0;
+ switch (entry->mode & S_IFMT) {
+ case S_IFCHR:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFBLK:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFIFO:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l',
+ entry->printable_symlink_data);
+ break;
+ case S_IFREG:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ if (len < 0 || count <= len)
+ break;
+ count -= len;
+ buf += len;
+ head->avail += len;
+ }
+}
+
+static int syaoran_trace_open(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head =
+ kzalloc(sizeof(*head), GFP_KERNEL);
+ if (!head)
+ return -ENOMEM;
+ head->sb = inode->i_sb;
+ head->read_all =
+ (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
+ head->pos = &(SHMEM_SB(head->sb)->list);
+ head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
+ if (!head->buf) {
+ kfree(head);
+ return -ENOMEM;
+ }
+ file->private_data = head;
+ return 0;
+}
+
+static int syaoran_trace_release(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head = file->private_data;
+ kfree(head->buf);
+ kfree(head);
+ file->private_data = NULL;
+ return 0;
+}
+
+static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct syaoran_read_struct *head =
+ (struct syaoran_read_struct *) file->private_data;
+ int len = head->avail;
+ char *cp = head->buf;
+ if (!access_ok(VERIFY_WRITE, buf, count))
+ return -EFAULT;
+ syaoran_read_table(head, cp + len, PAGE_SIZE * 2 - len);
+ len = head->avail;
+ if (len > count)
+ len = count;
+ if (len > 0) {
+ if (copy_to_user(buf, cp, len))
+ return -EFAULT;
+ head->avail -= len;
+ memmove(cp, cp + len, head->avail);
+ }
+ return len;
+}
+
+static struct file_operations syaoran_trace_operations = {
+ .open = syaoran_trace_open,
+ .release = syaoran_trace_release,
+ .read = syaoran_trace_read,
+};
+
+/* Create interface files for reading status. */
+static int syaoran_create_tracelog(struct super_block *sb, const char *filename)
+{
+ struct inode *inode;
+ struct dentry *base = dget(sb->s_root);
+ struct dentry *dentry = lookup_create2(filename, base, 0);
+ int error = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
+ goto out;
+ inode = shmem_get_inode(sb, S_IFREG | 0400, 0);
+ if (!inode)
+ error = -ENOSPC;
+ else {
+ /* Override file operation. */
+ inode->i_fop = &syaoran_trace_operations;
+ d_instantiate(dentry, inode);
+ dget(dentry); /* Extra count - pin the dentry in core */
+ error = 0;
+ }
+ dput(dentry);
+out:
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ return error;
+}
--- /dev/null
+++ linux-2.6-mm/mm/shmem_mac_init.c
@@ -0,0 +1,486 @@
+/*
+ * mm/shmem_mac_init.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+
+/*
+ * The following codes are used for processing the policy file and
+ * creating initial nodes at mount time.
+ */
+
+/* lookup_create() without nameidata */
+static struct dentry *lookup_create2(const char *name, struct dentry *base,
+ const bool is_dir)
+{
+ struct dentry *dentry;
+ const int len = name ? strlen(name) : 0;
+ mutex_lock(&base->d_inode->i_mutex);
+ dentry = lookup_one_len(name, base, len);
+ if (IS_ERR(dentry))
+ goto fail;
+ if (!is_dir && name[len] && !dentry->d_inode)
+ goto enoent;
+ return dentry;
+enoent:
+ dput(dentry);
+ dentry = ERR_PTR(-ENOENT);
+fail:
+ return dentry;
+}
+
+static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
+ uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 1);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mkdir(base->d_inode, dentry, mode);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+static int fs_mknod(const char *filename, struct dentry *base, int mode,
+ dev_t dev, uid_t user, gid_t group)
+{
+ struct dentry *dentry;
+ int error;
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ case S_IFREG:
+ break;
+ default:
+ return -EPERM;
+ }
+ dentry = lookup_create2(filename, base, 0);
+ error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mknod(base->d_inode, dentry, mode, dev);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+static int fs_symlink(const char *pathname, struct dentry *base,
+ char *oldname, int mode, uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 0);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_mode = mode;
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/*
+ * Format string.
+ * Leading and trailing whitespaces are removed.
+ * Multiple whitespaces are packed into single space.
+ */
+static void syaoran_normalize_line(unsigned char *buffer)
+{
+ unsigned char *sp = buffer;
+ unsigned char *dp = buffer;
+ bool first = 1;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ while (*sp) {
+ if (!first)
+ *dp++ = ' ';
+ first = 0;
+ while (*sp > ' ' && *sp < 127)
+ *dp++ = *sp++;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ }
+ *dp = '\0';
+}
+
+/* Convert text form of filename into binary form. */
+static void syaoran_unescape(char *filename)
+{
+ char *cp = filename;
+ char c, d, e;
+ if (!cp)
+ return;
+ while ((c = *filename++) != '\0') {
+ if (c != '\\') {
+ * cp++ = c;
+ continue;
+ }
+ if ((c = *filename++) == '\\') {
+ * cp++ = c;
+ continue;
+ }
+ if (c < '0' || c > '3')
+ break;
+ d = *filename++;
+ if (d < '0' || d > '7')
+ break;
+ e = *filename++;
+ if (e < '0' || e > '7')
+ break;
+ *(unsigned char *) cp++ = (unsigned char)
+ (((unsigned char) (c - '0') << 6) +
+ ((unsigned char) (d - '0') << 3) +
+ (unsigned char) (e - '0'));
+ }
+ *cp = '\0';
+}
+
+static inline char *strdup(const char *data)
+{
+ return kstrdup(data, GFP_KERNEL);
+}
+
+static int register_node_info(char *buffer, struct super_block *sb)
+{
+ enum {
+ ARG_FILENAME = 0,
+ ARG_PERMISSION = 1,
+ ARG_UID = 2,
+ ARG_GID = 3,
+ ARG_FLAGS = 4,
+ ARG_DEV_TYPE = 5,
+ ARG_SYMLINK_DATA = 6,
+ ARG_DEV_MAJOR = 6,
+ ARG_DEV_MINOR = 7,
+ MAX_ARG = 8
+ };
+ char *args[MAX_ARG];
+ int i;
+ int error = -EINVAL;
+ unsigned int perm, uid, gid, flags, major = 0, minor = 0;
+ struct shmem_sb_info *info = SHMEM_SB(sb);
+ struct dev_entry *entry;
+ memset(args, 0, sizeof(args));
+ args[0] = buffer;
+ for (i = 1; i < MAX_ARG; i++) {
+ args[i] = strchr(args[i - 1] + 1, ' ');
+ if (!args[i])
+ break;
+ *args[i]++ = '\0';
+ }
+ /*
+ printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
+ args[0], args[1], args[2], args[3], args[4], args[5],
+ args[6], args[7]);
+ */
+ if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
+ !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
+ goto out;
+ if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
+ || sscanf(args[ARG_UID], "%u", &uid) != 1
+ || sscanf(args[ARG_GID], "%u", &gid) != 1
+ || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
+ || *(args[ARG_DEV_TYPE] + 1))
+ goto out;
+ switch (*args[ARG_DEV_TYPE]) {
+ case 'c':
+ perm |= S_IFCHR;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'b':
+ perm |= S_IFBLK;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'l':
+ perm |= S_IFLNK;
+ if (!args[ARG_SYMLINK_DATA])
+ goto out;
+ break;
+ case 'd':
+ perm |= S_IFDIR;
+ break;
+ case 's':
+ perm |= S_IFSOCK;
+ break;
+ case 'p':
+ perm |= S_IFIFO;
+ break;
+ case 'f':
+ perm |= S_IFREG;
+ break;
+ default:
+ goto out;
+ }
+ error = -ENOMEM;
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ goto out;
+ if (S_ISLNK(perm)) {
+ entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
+ if (!entry->printable_symlink_data)
+ goto out_freemem;
+ }
+ entry->printable_name = strdup(args[ARG_FILENAME]);
+ if (!entry->printable_name)
+ goto out_freemem;
+ if (S_ISLNK(perm)) {
+ entry->symlink_data = strdup(entry->printable_symlink_data);
+ if (!entry->symlink_data)
+ goto out_freemem;
+ syaoran_unescape(entry->symlink_data);
+ }
+ entry->name = strdup(entry->printable_name);
+ if (!entry->name)
+ goto out_freemem;
+ syaoran_unescape(entry->name);
+ /*
+ * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
+ * trailing '/'.
+ */
+ i = strlen(entry->name);
+ if (i && entry->name[i - 1] == '/')
+ entry->name[i - 1] = '\0';
+ entry->mode = perm;
+ entry->uid = uid;
+ entry->gid = gid;
+ entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
+ entry->flags = flags;
+ list_add_tail(&entry->list, &info->list);
+ /* printk("Entry added.\n"); */
+ error = 0;
+out:
+ return error;
+out_freemem:
+ kfree(entry->printable_symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->symlink_data);
+ kfree(entry);
+ goto out;
+}
+
+static int read_config_file(struct file *file, struct super_block *sb)
+{
+ char *buffer;
+ int error = -ENOMEM;
+ if (!file)
+ return -EINVAL;
+ buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (buffer) {
+ int len;
+ char *cp;
+ unsigned long offset = 0;
+ while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
+ && (cp = memchr(buffer, '\n', len)) != NULL) {
+ *cp = '\0';
+ offset += cp - buffer + 1;
+ syaoran_normalize_line(buffer);
+ if (register_node_info(buffer, sb) == -ENOMEM)
+ goto out;
+ }
+ error = 0;
+ }
+out:
+ kfree(buffer);
+ return error;
+}
+
+static void make_node(struct dev_entry *entry, struct dentry *root)
+{
+ struct dentry *base = dget(root);
+ char *filename = entry->name;
+ char *name = filename;
+ unsigned int c;
+ const mode_t perm = entry->mode;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ goto start;
+ while ((c = *(unsigned char *) filename) != '\0') {
+ if (c == '/') {
+ struct dentry *new_base;
+ const int len = filename - name;
+ *filename = '\0';
+ mutex_lock(&base->d_inode->i_mutex);
+ new_base = lookup_one_len(name, base, len);
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ *filename = '/';
+ filename++;
+ if (IS_ERR(new_base))
+ return;
+ if (!new_base->d_inode ||
+ !S_ISDIR(new_base->d_inode->i_mode)) {
+ dput(new_base);
+ return;
+ }
+ base = new_base;
+start:
+ name = filename;
+ } else {
+ filename++;
+ }
+ }
+ filename = (char *) name;
+ if (S_ISLNK(perm)) {
+ fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
+ } else if (S_ISDIR(perm)) {
+ fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
+ } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
+ fs_mknod(filename, base, perm, 0, uid, gid);
+ } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
+ fs_mknod(filename, base, perm, entry->kdev, uid, gid);
+ }
+ dput(base);
+}
+
+/* Create files according to the policy file. */
+static void syaoran_make_initial_nodes(struct super_block *sb)
+{
+ struct shmem_sb_info *info;
+ struct dev_entry *entry;
+ if (!sb)
+ return;
+ info = SHMEM_SB(sb);
+ if (!info)
+ return;
+ if (info->is_permissive_mode) {
+ syaoran_create_tracelog(sb, ".syaoran");
+ syaoran_create_tracelog(sb, ".syaoran_all");
+ }
+ list_for_each_entry(entry, &info->list, list) {
+ if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
+ make_node(entry, sb->s_root);
+ }
+ info->initialize_done = 1;
+}
+
+/* Read policy file. */
+static int syaoran_initialize(struct super_block *sb, void *data)
+{
+ int error = -EINVAL;
+ struct file *f;
+ char *filename = (char *) data;
+ bool is_permissive_mode = 0;
+ struct shmem_sb_info *p = SHMEM_SB(sb);
+ static bool first = 1;
+ if (first) {
+ first = 0;
+ printk(KERN_INFO "SYAORAN: 1.5.3-pre 2008/01/06\n");
+ }
+ INIT_LIST_HEAD(&(SHMEM_SB(sb)->list));
+ if (!filename) {
+ printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
+ return -EINVAL;
+ } else if (strncmp(filename, "accept=", 7) == 0) {
+ filename += 7;
+ is_permissive_mode = 1;
+ } else if (strncmp(filename, "enforce=", 8) == 0) {
+ filename += 8;
+ is_permissive_mode = 0;
+ } else {
+ printk(KERN_INFO "SYAORAN: Missing 'accept=' or 'enforce='.\n");
+ return -EINVAL;
+ }
+ f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
+ if (IS_ERR(f)) {
+ printk(KERN_INFO "SYAORAN: Can't open '%s'\n", filename);
+ return -EINVAL;
+ }
+ if (!S_ISREG(f->f_dentry->d_inode->i_mode))
+ goto out;
+ p->is_permissive_mode = is_permissive_mode;
+ printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
+ error = read_config_file(f, sb);
+out:
+ if (error)
+ printk(KERN_INFO "SYAORAN: Can't read '%s'\n", filename);
+ filp_close(f, NULL);
+ return error;
+}
+
+static int shmem_get_sb(struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *data,
+ struct vfsmount *mnt);
+
+static struct file_system_type syaoran_fs_type = {
+ .name = "syaoran",
+ .get_sb = shmem_get_sb,
+ .kill_sb = kill_litter_super,
+};
+
+static struct file_operations wrapped_def_blk_fops;
+static struct file_operations wrapped_def_chr_fops;
+
+static void init_syaoran_inode(struct inode *inode, int mode)
+{
+ /* Set open() hook for tracking open request. */
+ if (S_ISBLK(mode))
+ inode->i_fop = &wrapped_def_blk_fops;
+ else if (S_ISCHR(mode))
+ inode->i_fop = &wrapped_def_chr_fops;
+}
+
+static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
+{
+ int error = def_blk_fops.open(inode, filp);
+ if (error != -ENXIO)
+ syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
+{
+ int error = def_chr_fops.open(inode, filp);
+ if (error != -ENXIO)
+ syaoran_may_modify_node(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+static int __init init_syaoran_fs(void)
+{
+ /* Set open() hook for tracking open operation of block devices. */
+ wrapped_def_blk_fops = def_blk_fops;
+ wrapped_def_blk_fops.open = wrapped_blkdev_open;
+ /* Set open() hook for tracking open operation of character devices. */
+ wrapped_def_chr_fops = def_chr_fops;
+ wrapped_def_chr_fops.open = wrapped_chrdev_open;
+ return register_filesystem(&syaoran_fs_type);
+}
+
+static void __exit exit_syaoran_fs(void)
+{
+ unregister_filesystem(&syaoran_fs_type);
+}
+module_init(init_syaoran_fs);
+module_exit(exit_syaoran_fs);
--- /dev/null
+++ linux-2.6-mm/mm/shmem_mac_main.c
@@ -0,0 +1,205 @@
+/*
+ * mm/shmem_mac_main.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2008 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2008/01/06
+ */
+
+/* Get absolute pathname from mount point. */
+static int get_local_absolute_path(struct dentry *dentry, char *buffer,
+ int buflen)
+{
+ char *start = buffer;
+ char *end = buffer + buflen;
+ int namelen;
+
+ if (buflen < 256)
+ goto out;
+
+ *--end = '\0';
+ buflen--;
+ for (;;) {
+ struct dentry *parent;
+ if (IS_ROOT(dentry))
+ break;
+ parent = dentry->d_parent;
+ namelen = dentry->d_name.len;
+ buflen -= namelen + 1;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ *--end = '/';
+ dentry = parent;
+ }
+ if (*end == '/') {
+ buflen++;
+ end++;
+ }
+ namelen = dentry->d_name.len;
+ buflen -= namelen;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ memmove(start, end, strlen(end) + 1);
+ return 0;
+out:
+ return -ENOMEM;
+}
+
+/* Get absolute pathname of the given dentry from mount point. */
+static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
+ int newname_len)
+{
+ int error;
+ struct dentry *d_dentry;
+ if (!dentry || !newname || newname_len <= 0)
+ return -EINVAL;
+ d_dentry = dget(dentry);
+ /***** CRITICAL SECTION START *****/
+ spin_lock(&dcache_lock);
+ error = get_local_absolute_path(d_dentry, newname, newname_len);
+ spin_unlock(&dcache_lock);
+ /***** CRITICAL SECTION END *****/
+ dput(d_dentry);
+ return error;
+}
+
+static int syaoran_check_flags(struct shmem_sb_info *info,
+ struct dentry *dentry, int mode, int dev,
+ unsigned int flags)
+{
+ int error = -EPERM;
+ struct dev_entry *entry;
+ /*
+ * Since local_realpath_from_dentry() holds dcache_lock,
+ * allocating buffer using kmalloc() won't help improving concurrency.
+ * Therefore, I use static buffer here.
+ */
+ static char filename[PAGE_SIZE];
+ static DEFINE_SPINLOCK(lock);
+ spin_lock(&lock);
+ memset(filename, 0, sizeof(filename));
+ if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1))
+ goto out;
+ list_for_each_entry(entry, &info->list, list) {
+ if ((mode & S_IFMT) != (entry->mode & S_IFMT))
+ continue;
+ if ((S_ISBLK(mode) || S_ISCHR(mode)) && dev != entry->kdev)
+ continue;
+ if (strcmp(entry->name, filename + 1))
+ continue;
+ if (info->is_permissive_mode) {
+ entry->flags |= flags;
+ error = 0;
+ } else {
+ if ((entry->flags & flags) == flags)
+ error = 0;
+ }
+ break;
+ }
+out:
+ if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {
+ const char *name;
+ const uid_t uid = current->fsuid;
+ const gid_t gid = current->fsgid;
+ const mode_t perm = mode & 0777;
+ flags &= ~DEVICE_USED;
+ {
+ char *end = filename + sizeof(filename) - 1;
+ const char *cp = strchr(filename, '\0') - 1;
+ while (cp > filename) {
+ const unsigned char c = *cp--;
+ if (c == '\\') {
+ * --end = '\\';
+ * --end = '\\';
+ } else if (c > ' ' && c < 127) {
+ *--end = c;
+ } else {
+ *--end = (c & 7) + '0';
+ *--end = ((c >> 3) & 7) + '0';
+ *--end = (c >> 6) + '0';
+ *--end = '\\';
+ }
+ }
+ name = end;
+ }
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFBLK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFIFO:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l', "unknown");
+ break;
+ case S_IFREG:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ }
+ spin_unlock(&lock);
+ return error;
+}
+
+/* Check whether the given dentry is allowed to mknod. */
+static int syaoran_may_create_node(struct dentry *dentry, int mode, int dev)
+{
+ struct shmem_sb_info *info = SHMEM_SB(dentry->d_sb);
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (!info->initialize_done)
+ return 0;
+ return syaoran_check_flags(info, dentry, mode, dev, MAY_CREATE);
+}
+
+/* Check whether the given dentry is allowed to chmod/chown/unlink. */
+static int syaoran_may_modify_node(struct dentry *dentry, unsigned int flags)
+{
+ struct shmem_sb_info *info = SHMEM_SB(dentry->d_sb);
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (flags == DEVICE_USED && !info->is_permissive_mode)
+ return 0;
+ if (!dentry->d_inode)
+ return -ENOENT;
+ return syaoran_check_flags(info, dentry, dentry->d_inode->i_mode,
+ dentry->d_inode->i_rdev, flags);
+}
+
--- linux-2.6-mm.orig/fs/Kconfig
+++ linux-2.6-mm/fs/Kconfig
@@ -978,6 +978,24 @@ config TMPFS_POSIX_ACL

If you don't know what Access Control Lists are, say N.

+config SYAORAN
+ bool "Tamper-proof device filesystem support"
+ depends on TMPFS
+ help
+ If you mount this filesystem for /dev directory instead of tmpfs,
+ you can guarantee the following thing.
+
+ "Applications using well-known device locations under /dev
+ get the device they want" (e.g. an application that accesses
+ /dev/null can always get a character special device
+ with major=1 and minor=3).
+
+ The list of possible combinations of filename and its attributes
+ that can exist on this filesystem is defined at mount time
+ using a configuration file.
+
+ If unsure, say N.
+
config HUGETLBFS
bool "HugeTLB file system support"
depends on X86 || IA64 || PPC64 || SPARC64 || (SUPERH && MMU) || BROKEN
--- linux-2.6-mm.orig/include/linux/shmem_fs.h
+++ linux-2.6-mm/include/linux/shmem_fs.h
@@ -33,6 +33,11 @@ struct shmem_sb_info {
int policy; /* Default NUMA memory alloc policy */
nodemask_t policy_nodes; /* nodemask for preferred and bind */
spinlock_t stat_lock;
+#ifdef CONFIG_SYAORAN
+ struct list_head list; /* List of filename/attributes pairs. */
+ bool initialize_done; /* False if initialization is in progress. */
+ bool is_permissive_mode; /* True if permissive mode. */
+#endif
};

static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)

2008-01-07 17:09:56

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

On Sun, 06 Jan 2008 15:20:00 +0900, Tetsuo Handa said:

> --- linux-2.6-mm.orig/fs/ramfs/inode.c
> +++ linux-2.6-mm/fs/ramfs/inode.c
> @@ -36,6 +36,20 @@
> #include <asm/uaccess.h>
> #include "internal.h"
>
> +static struct inode *__ramfs_get_inode(struct super_block *sb, int mode,
> + dev_t dev, bool tmpfs_with_mac);
> +
> +#define TMPFS_WITH_MAC 1
> +#define TMPFS_WITHOUT_MAC 0
> +#include <linux/quotaops.h>
> +
> +#ifdef CONFIG_SYAORAN
> +#include "syaoran.h"
> +#include "syaoran_init.c"
> +#include "syaoran_main.c"
> +#include "syaoran_debug.c"
> +#endif

Ouch. The .c files should generally be built into their own .o files and
then the Makefile should do something like

obj-$(CONFIG_SYAORAN) += syaoran.o

unless there's *really* good reasons for including .c files (such as an
otherwise-messy variable-namespace issue or similar).

Also, has this been double-checked to Do The Right Thing if you have
*two* instances of ramfs mounted, one with Syaoran and one without? I don't
know the code well enough to know if you found *all* the places you need
something like:

> - inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> +#ifdef CONFIG_SYAORAN
> + /*** SYAORAN start. ***/
> + if (dir->i_sb->s_op == &syaoran_ops) {
> + if (syaoran_may_create_node(dentry, S_IFLNK, 0) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> + /*** SYAORAN end. ***/
> + } else
> +#endif
> + inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);

(incidentally, all of these should probably be abstracted into a helper
function that's 'static inline' so we have just one #ifdef in the definition
in a .h file, and none in open .c code).

Similarly for other places you have #ifdef CONFIG_ in ramfs .c code - see if
you can abstract it out.

> +/*
> + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
> + * Now I'm setting the field to share tmpfs/rootfs/syaoran code.

Question for the audience: *should* ramfs set that field so setattr works
on ramfs (even if it's just a stub similar to the SELinux fscontext= mount
stuff)?

Question for Tetsuo: What happens to this code if somebody actually does the
above change?

> --- linux-2.6-mm.orig/fs/Kconfig
> +++ linux-2.6-mm/fs/Kconfig
> @@ -978,6 +978,24 @@ config TMPFS_POSIX_ACL

> + "Applications using well-known device locations under /dev
> + get the device they want" (e.g. an application that accesses
> + /dev/null can always get a character special device
> + with major=1 and minor=3).

This should say "will always get", not "can always", as this code will
mandate, rather than just make possible.

> + The list of possible combinations of filename and its attributes
> + that can exist on this filesystem is defined at mount time
> + using a configuration file.

The format of this file needs to be documented. I'm not terribly thrilled by
the idea of passing a file to be read by the kernel, but I also understand
that if it isn't done before mount, you have a race condition betweet the
mount and the load. Perhaps write some configfs code so that you can
'mount /configfs; cat config.file > /configfs/syaoran; mount -t syaoran"?

Similarly, it looks like you create your debug files inside the ramfs - that
is probably a bad idea and possibly can exhaust resources. Convert it to
use debugfs instead?

> + if (!filename) {
> + printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
> + return -EINVAL;

Does this (and the code right after Do The Right Thing if somebody does this:

mount -t syaoran -o noatime,noexec /some/path

(I admit not knowing if mount options common to all mounts are stripped out
by the VFS code or passed down to this code).

Or even worse, "-o noatime,accept=/some/path/ramfs.cfg"?

> + f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
> + if (IS_ERR(f)) {
> + printk(KERN_INFO "SYAORAN: Can't open '%s'\n", filename);
> + return -EINVAL;
> + }

Does this do what you think it does if run in a chroot process or if
some creative person does "accept=../../path/to/bad_data.cfg"?

That printk should be KERN_ERR, I think.

That's all that's immediately obvious to me - somebody who actually understands
the filesystem code better will probably need to review it for all the stuff
I missed before it can be included.


Attachments:
(No filename) (226.00 B)

2008-01-07 20:37:54

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello,

Some questions:

On Sun, January 6, 2008 16:20, Tetsuo Handa wrote:
> I want to use this filesystem in case where a process with root privilege was
> hijacked but the behavior of the hijacked process is still restricted by MAC.

1) If the behaviour can be controlled, why can't the process be
disallowed to change anything badly in /dev? Like disallowing anything
from modifying existing nodes that weren't created by that process.
That would have practically the same effect as your filesystem,
won't it?

Or phrased differently, if the MAC system used can't protect /dev, it
won't be able to protect other directories either, and if it can't
protect e.g. my homedir, doesn't it make the whole MAC system
ineffective? And if the MAC system used is ineffective, your
filesystem is useless and you've bigger problems to fix.

2) The MAC system may not be able to guarantee certain combinations
of device names and properties, but isn't that policy that shouldn't
be in the kernel anyway? But if it is, shouldn't all device nodes be
checked? That is, shouldn't it be a global check instead of a filesystem
specific one?

3) Code efficiency. Thousand lines of code just to close one very specific
attack, which can be done in lots of different other ways that all need
to be prevented by the MAC system. (mounting over it, intercepting open
calls, duping the fd, etc.) Is it worth it?

I really don't care how you try to protect your system, but I don't think
this is an effective way to do it.

Good luck,

Indan

2008-01-08 13:51:04

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

[email protected] wrote:
> Ouch. The .c files should generally be built into their own .o files and
> then the Makefile should do something like
>
> obj-$(CONFIG_SYAORAN) += syaoran.o
>
> unless there's *really* good reasons for including .c files (such as an
> otherwise-messy variable-namespace issue or similar).
Yes. The final implementation will become so.
This is a temporal hack to keep all functions and variables "static".

> Also, has this been double-checked to Do The Right Thing if you have
> *two* instances of ramfs mounted, one with Syaoran and one without?
Yes. The memory for superblock is allocated for each instance.
Thus, mounting one as syaoran and the other as tmpfs won't cause problems.

> (incidentally, all of these should probably be abstracted into a helper
> function that's 'static inline' so we have just one #ifdef in the definition
> in a .h file, and none in open .c code).
Oh, good idea.

> Similarly for other places you have #ifdef CONFIG_ in ramfs .c code - see if
> you can abstract it out.
This patch replaces the previous patch and
this patch modifies only tmpfs (fs/shm*) files.
I'm no longer modifying ramfs (fs/ramfs/*) files.

> > +/*
> > + * Original tmpfs doesn't set ramfs_dir_inode_operations.setattr field.
> > + * Now I'm setting the field to share tmpfs/rootfs/syaoran code.
>
> Question for the audience: *should* ramfs set that field so setattr works
> on ramfs (even if it's just a stub similar to the SELinux fscontext= mount
> stuff)?
>
> Question for Tetsuo: What happens to this code if somebody actually does the
> above change?
Please forget this question.
I'm no longer setting "ramfs_dir_inode_operations.setattr" field.

> > + "Applications using well-known device locations under /dev
> > + get the device they want" (e.g. an application that accesses
> > + /dev/null can always get a character special device
> > + with major=1 and minor=3).
>
> This should say "will always get", not "can always", as this code will
> mandate, rather than just make possible.
OK.

> > + The list of possible combinations of filename and its attributes
> > + that can exist on this filesystem is defined at mount time
> > + using a configuration file.
>
> The format of this file needs to be documented.
Yes. It is a line-by-line processable format defined as:

filename permission owner group flags type [ symlink_data | major minor ]

where flags are bit-wised combinations of

* 1: Allow creation of the file.
* 2: Allow deletion of the file.
* 4: Allow changing permissions of the file.
* 8: Allow changing owner or group of the file.
* 16: For internal use. Remembers whether this file is opened or not.
* 32: Don't create this file at mount time.

and here are some example entries:

pts 755 0 0 0 d
shm 755 0 0 0 d
fd 777 0 0 0 l /proc/self/fd
stdin 777 0 0 0 l /proc/self/fd/0
stdout 777 0 0 0 l /proc/self/fd/1
stderr 777 0 0 0 l /proc/self/fd/2
null 666 0 0 0 c 1 3
zero 666 0 0 0 c 1 5
random 644 0 0 0 c 1 8
urandom 644 0 0 0 c 1 9
tty 666 0 0 0 c 5 0
tty0 600 0 0 12 c 4 0
cdrom 777 0 0 3 l /dev/scd0
console 600 0 0 1 c 5 1
hda 660 0 6 0 b 3 0
hda1 660 0 6 0 b 3 1
initctl 600 0 0 3 p
log 666 0 0 15 s
rtc 644 0 0 0 c 10 135
ptmx 666 0 0 0 c 5 2
ram 777 0 0 3 l /dev/ram0
ram0 660 0 6 0 b 1 0
ram1 660 0 6 0 b 1 1
sda 660 0 6 0 b 8 0
initrd 660 0 6 1 b 1 250

Full documentation of this filesystem is at
http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html

> I'm not terribly thrilled by
> the idea of passing a file to be read by the kernel, but I also understand
> that if it isn't done before mount, you have a race condition betweet the
> mount and the load.
What race condition is possible?
Are you worrying that the file gets modified while reading?

> Perhaps write some configfs code so that you can
> 'mount /configfs; cat config.file > /configfs/syaoran; mount -t syaoran"?
If you worry that the file gets modified while reading in kernel space,
you will also worry that the file gets modified while doing
"cat config.file > /configfs/syaoran".

To use configfs (or whatever approach that is done before mount syscall),
some tag for associating "list of permitted entries" and "mount point" is needed
so that an administrator can mount this filesystem for each chroot'ed environment
with different "list of permitted entries" (e.g. /dev with all entries,
/var/jail1/dev with only "null", /var/jail2/dev with only "null" and "random").

It would be possible to pass "list of permitted entries" (which is the content of
a config file) through mount syscall's parameter. But since the number of entries
in "list of permitted entries" is not constant, it sometimes requires much memory
for passing whole entries at once upon mount syscall.

I wonder why many of kernel developers hate opening files in kernel space.
I think it won't cause bugs as long as the file is alive within single syscall.
I'm using a path to config file as a tag for associating "list of permitted entries"
and "mount point". I'm opening a config file and reading the file and closing the file
within a single mount() syscall.

> Similarly, it looks like you create your debug files inside the ramfs - that
> is probably a bad idea and possibly can exhaust resources. Convert it to
> use debugfs instead?
It is named as "debug", but is not a debug interface.
It is a interface for obtaining snapshots of "flags" values.
The kernel updates "flags" values if this filesytem is mounted with
"accept=" option. (The kernel doesn't update if mounted with "enforce=" option.)

> Does this (and the code right after Do The Right Thing if somebody does this:
>
> mount -t syaoran -o noatime,noexec /some/path
>
> (I admit not knowing if mount options common to all mounts are stripped out
> by the VFS code or passed down to this code).
Yes. /bin/mount parses common mount options like "noatime" and "noexec"
and passes common mount options stored in "mountflags" variable of mount(2)
and passes non-common mount options stored in "data" variable of mount(2).

> Or even worse, "-o noatime,accept=/some/path/ramfs.cfg"?
In that case, mount(2) receives MS_NOATIME stored in "mountflags" and
"accept=/some/path/ramfs.cfg" stored in "data".

> > + f = open_pathname(AT_FDCWD, filename, O_RDONLY, 0600);
> > + if (IS_ERR(f)) {
> > + printk(KERN_INFO "SYAORAN: Can't open '%s'\n", filename);
> > + return -EINVAL;
> > + }
>
> Does this do what you think it does if run in a chroot process or if
> some creative person does "accept=../../path/to/bad_data.cfg"?
sys_open() calls open_pathname() with AT_FDCWD.
So, it is the same thing as calling
open("../../path/to/bad_data.cfg", O_RDONLY) from the userland.

> That printk should be KERN_ERR, I think.
May be. But I think KERN_WARNING is enough because this is not such emergent error.


> That's all that's immediately obvious to me - somebody who actually understands
> the filesystem code better will probably need to review it for all the stuff
> I missed before it can be included.
Thank you.

2008-01-08 13:51:25

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.


Indan Zupancic wrote:
> > I want to use this filesystem in case where a process with root privilege was
> > hijacked but the behavior of the hijacked process is still restricted by MAC.
>
> 1) If the behaviour can be controlled, why can't the process be
> disallowed to change anything badly in /dev? Like disallowing anything
> from modifying existing nodes that weren't created by that process.
> That would have practically the same effect as your filesystem,
> won't it?
MAC system can prevent hijacked processes from changing anything badly in /dev .
But MAC system can't prevent hijacked processes from doing
"mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2"
if permissions to rename device nodes in /dev are given to hijacked processes.
This is because MAC implementation doesn't check filename/attribute pairs.

But this filesystem can prevent hijacked processes from doing
"mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2"
even if permissions to rename device nodes in /dev are given to hijacked processes.

This filesystem is not designed to
"forbid modifying nodes if that process needn't to modify nodes".
This filesystem is designed to
"forbid breaking filename/attribute pairs of nodes
even if that process need to (or permitted to) modify nodes".

> Or phrased differently, if the MAC system used can't protect /dev, it
> won't be able to protect other directories either, and if it can't
> protect e.g. my homedir, doesn't it make the whole MAC system
> ineffective? And if the MAC system used is ineffective, your
> filesystem is useless and you've bigger problems to fix.
You can use "nodev" mount option to prevent attackers from opening device files.
You can use MAC system to prevent attackers from mounting partitions (other than
/dev partition) without "nodev" option.


> 2) The MAC system may not be able to guarantee certain combinations
> of device names and properties, but isn't that policy that shouldn't
> be in the kernel anyway? But if it is, shouldn't all device nodes be
> checked? That is, shouldn't it be a global check instead of a filesystem
> specific one?
I think the reason why MAC system doesn't handle filename/attributes pairs is that:

Filename and its attributes pairs are conventionally considered as
constant and reliable.

It makes the MAC's policy syntax complicated to describe this attribute
enforcement information in MAC's policy.

Thus, this should be a global check. But usually device nodes are only in /dev .



> 3) Code efficiency. Thousand lines of code just to close one very specific
> attack, which can be done in lots of different other ways that all need
> to be prevented by the MAC system. (mounting over it, intercepting open
> calls, duping the fd, etc.) Is it worth it?
This filesystem is doing what MAC system is not doing.
So, please don't complain about inability of this filesystem to close all attacks.
You can use MAC system to prevent attackers from mounting other filesystem
over this filesystem.

The filename/attribute pairs are something like system call entry tables.
The application will go wrong if __NR_read is mapped to sys_write() and
__NR_write is mapped to sys_read().
Userland applications access special functionalities (e.g. /dev/zero and /dev/random)
by name (i.e. syscall numbers). Therefore, keeping the filename/attribute pairs
tamper-proof is important.

You recognize that there is a threat that device nodes may have irregular
attribute (e.g. /dev/null existing as a regular file), do you?
You don't deny implementing mechanisms somehow to avoid such threat, do you?
OK. Then the matter is the comparison of code efficiency.

This patch is less than 1100 lines in total.
Large part of this patch is for parsing and managing policy file.
If you try to extend every MAC implementation (SELinux, SMACK, AppArmor, TOMOYO)
so that they can handle filename/attributes pairs (i.e. expand policy file's syntax
and both in-kernel and userland data structures, manage strings with variant length
and non-printable characters etc.), I think that modification exceeds this patch.
I think guaranteeing filename/attribute pairs in filesystem layer can keep
MAC system implementation simple and compact.
http://www.mail-archive.com/[email protected]/msg10653.html


Thank you.

2008-01-08 15:47:22

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hi Tetsuo,

I think you focus too much on your way of enforcing filename/attributes
pairs. The same can be achieved by creating the device nodes with
expected attributes, and preventing processes from changing those files.
This because expected combinations are known beforehand. And once
those files are present, the MAC system used doesn't have to have special
device nodes attributes support. Protecting those files is enough to
guarantee filename/attributes pairs.

On Tue, January 8, 2008 14:50, Tetsuo Handa wrote:
> Hello.
>
>
> Indan Zupancic wrote:
>> > I want to use this filesystem in case where a process with root privilege
>> was
>> > hijacked but the behavior of the hijacked process is still restricted by
>> MAC.
>>
>> 1) If the behaviour can be controlled, why can't the process be
>> disallowed to change anything badly in /dev? Like disallowing anything
>> from modifying existing nodes that weren't created by that process.
>> That would have practically the same effect as your filesystem,
>> won't it?
> MAC system can prevent hijacked processes from changing anything badly in /dev
> .
> But MAC system can't prevent hijacked processes from doing
> "mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp
> /dev/hda2"
> if permissions to rename device nodes in /dev are given to hijacked processes.
> This is because MAC implementation doesn't check filename/attribute pairs.

No, this is because rename permission was given for files that it shouldn't had.
Either you want a process to manage device names and attributes, and then you
give it permission to do that, or you want to enforce certain filename/attribute
pairs and then you just do it yourself.

Will your filesystem prevent the trivial case of

rm /dev/hda1
ln -s /dev/hda2 /dev/hda1

>
> But this filesystem can prevent hijacked processes from doing
> "mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp
> /dev/hda2"
> even if permissions to rename device nodes in /dev are given to hijacked
> processes.

Rename permission can be given for /dev in general, but prohibited for
certain files in /dev, the ones you want to have specific attributes.
It isn't all or nothing.

>
> This filesystem is not designed to
> "forbid modifying nodes if that process needn't to modify nodes".
> This filesystem is designed to
> "forbid breaking filename/attribute pairs of nodes
> even if that process need to (or permitted to) modify nodes".

It's "forbid modifying certain nodes that process needn't to modify"
versus "forbid breaking filename/attribute pairs of certain nodes".

Both have the same effect, except that the first one is generic and
can be done by existing MAC systems, while the second one needs
a special filesystem and a handful of MAC rules to make it effective.


>> 2) The MAC system may not be able to guarantee certain combinations
>> of device names and properties, but isn't that policy that shouldn't
>> be in the kernel anyway? But if it is, shouldn't all device nodes be
>> checked? That is, shouldn't it be a global check instead of a filesystem
>> specific one?
> I think the reason why MAC system doesn't handle filename/attributes pairs is
> that:
>
> Filename and its attributes pairs are conventionally considered as
> constant and reliable.
>
> It makes the MAC's policy syntax complicated to describe this attribute
> enforcement information in MAC's policy.
>
> Thus, this should be a global check. But usually device nodes are only in /dev
> .

It doesn't matter where they are, it's that a different fs than yours could be
mounted over it. You say a MAC can prevent that from happening, but a
MAC can also prevent all processes except for udev from modifying /dev.
Done globally instead of as a filesystem it can actually guarantee name/attr
pairs, now it can't even do that on its own.

>
>> 3) Code efficiency. Thousand lines of code just to close one very specific
>> attack, which can be done in lots of different other ways that all need
>> to be prevented by the MAC system. (mounting over it, intercepting open
>> calls, duping the fd, etc.) Is it worth it?
> This filesystem is doing what MAC system is not doing.
> So, please don't complain about inability of this filesystem to close all
> attacks.

I don't. What I complain about is that it's too specific and does it one chosen
job badly. It lacks abstraction. As far as I can see any decent MAC can achieve
the same end result as your filesystem, without directly enforcing name/attr
pairs.

> You can use MAC system to prevent attackers from mounting other filesystem
> over this filesystem.
>
> The filename/attribute pairs are something like system call entry tables.
> The application will go wrong if __NR_read is mapped to sys_write() and
> __NR_write is mapped to sys_read().
> Userland applications access special functionalities (e.g. /dev/zero and
> /dev/random)
> by name (i.e. syscall numbers). Therefore, keeping the filename/attribute
> pairs
> tamper-proof is important.
>
> You recognize that there is a threat that device nodes may have irregular
> attribute (e.g. /dev/null existing as a regular file), do you?
> You don't deny implementing mechanisms somehow to avoid such threat, do you?
> OK. Then the matter is the comparison of code efficiency.

The thing is, all special device nodes that are expected to exist by applications
are known beforehand. Thus they can be created statically and can be protected
against any modifications with any MAC system.

The dynamic nodes aren't known beforehand, so applications can't expect anything
specific. And for things like usb-sticks andwhatnot, so what if the app gets hda2
instead of the proper sdc1? It shouldn't matter, because at that point the
malicious
process has access to the device anyway, so all potential harm that could've been
caused by the confusion (if any, which I doubt) it could do itself already.

>
> This patch is less than 1100 lines in total.
> Large part of this patch is for parsing and managing policy file.

That doesn't make it better.

> If you try to extend every MAC implementation (SELinux, SMACK, AppArmor,
> TOMOYO)
> so that they can handle filename/attributes pairs (i.e. expand policy file's
> syntax
> and both in-kernel and userland data structures, manage strings with variant
> length
> and non-printable characters etc.), I think that modification exceeds this
> patch.
> I think guaranteeing filename/attribute pairs in filesystem layer can keep
> MAC system implementation simple and compact.
> http://www.mail-archive.com/[email protected]/msg10653.html

That's because the way you would do it in MACs is the same wrong way as
you do it now.

Call me silly, but implementing your checks in udev, or whatever handles /dev,
and disallowing everything else from modifying /dev would also have the same
effect. Or if you don't trust udevd write your own tiny replacement which does
the checking, I'm sure that can be done in little extra code. Or modify udev so
that it doesn't handle /dev directly, but passes it to your daemon who does the
ckecks you want.

Because all your filesystem does is handling the case that udev is exploited,
when a proper MAC system is used.

There are so many ways to achieve the same goal, but better, and if you're really
serious about guarantees it shouldn't be in the filesystem anyway.

Greetings,

Indan

2008-01-09 04:39:42

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Indan Zupancic wrote:
> I think you focus too much on your way of enforcing filename/attributes
> pairs.
So?

> The same can be achieved by creating the device nodes with
> expected attributes, and preventing processes from changing those files.
The device nodes have to be deletable if some process (including udev) needs to delete.
Thus, you cannot unconditionally prevent processes from changing those files.

> This because expected combinations are known beforehand.
Yes.

> And once those files are present, the MAC system used doesn't have to have special
> device nodes attributes support. Protecting those files is enough to
> guarantee filename/attributes pairs.
If MAC system needn't to support this filesystem's functionality,
who creates those files with warrantee of expected attributes? The udev does?
If udev is exploited, who can guarantee?

> No, this is because rename permission was given for files that it shouldn't had.
Do you think all MAC implementation have the same granularity and functionalities?
I don't think so. Not all MAC implementation can control with such granularity.
This filesystem is designed to be combined with any MAC,
although the MAC used with this filesystem should be able to restrict
namespace manipulation requests so that this filesystem can remain /dev
and visible to userland applications.

> Either you want a process to manage device names and attributes, and then you
> give it permission to do that, or you want to enforce certain filename/attribute
> pairs and then you just do it yourself.
If I modify udev to enforce certain filename/attribute pairs and the modified udev
was exploited, who can guarantee?
"Don't trust userland application" is the basis of restricting access in kernel space.
If you can trust userland application, you don't need in-kernel access control.


> Will your filesystem prevent the trivial case of
>
> rm /dev/hda1
> ln -s /dev/hda2 /dev/hda1
>
Of course. To permit the above operation, the following permissions are needed.

hda1 660 0 6 2 b 3 1
hda1 777 0 0 33 l .

> Rename permission can be given for /dev in general, but prohibited for
> certain files in /dev, the ones you want to have specific attributes.
> It isn't all or nothing.
Do you think all MAC implementation can prohibit renaming for certain files in /dev ?

> It's "forbid modifying certain nodes that process needn't to modify"
> versus "forbid breaking filename/attribute pairs of certain nodes".
>
> Both have the same effect, except that the first one is generic and
> can be done by existing MAC systems, while the second one needs
> a special filesystem and a handful of MAC rules to make it effective.
Do you think all MAC implementation can do?
I think the first one is implementation specific and the second one is generic.

> It doesn't matter where they are, it's that a different fs than yours could be
> mounted over it. You say a MAC can prevent that from happening, but a
> MAC can also prevent all processes except for udev from modifying /dev.
But MAC cannot prevent udev from modifying /dev . And what if exploited?
Not all MAC can enforce access control over all processes with the granularity
you are talking. And what if a process that cannot be controlled with your
boolean level granularity exists (e.g. an administrator running his/her
administrative applications that require modification of /dev )?

A crazy example of administrative applications:
(Please don't say "Don't use such crazy application".)

#! /bin/sh
rm -f /dev/either-null-or-zero
read
mknod /dev/either-null-or-zero c 1 $REPLY && echo "Administrative task finished successfully." | mail root

This filesystem can guarantee /dev/either-null-or-zero is either char-1-3 or char-1-5 by using a policy

either-null-or-zero 666 0 0 3 c 1 3
either-null-or-zero 666 0 0 35 c 1 5

The boolean level granularity (e.g. forbid all processes except for udev ,
and modify udev to perform name/attribute pair enforcement) is not generic.
Userland application sometimes misbehaves.
I assume kernel process doesn't misbehave.
If you doubt my assumption, you have to doubt in-kernel MAC implementation too.

> I don't. What I complain about is that it's too specific and does it one chosen
> job badly. It lacks abstraction. As far as I can see any decent MAC can achieve
> the same end result as your filesystem, without directly enforcing name/attr
> pairs.
Can SELinux guarantee the same result as my filesystem even if udev or
administrative programs have to be able to modify /dev ?

> The thing is, all special device nodes that are expected to exist by applications
> are known beforehand.
Yes.

> Thus they can be created statically and can be protected
> against any modifications with any MAC system.
But sometimes some modifications needs to be permitted.
Who can guarantee that there is no application (other than udev)
that creates/deletes /dev/zero instead of /dev/either-null-or-zero ?

> The dynamic nodes aren't known beforehand, so applications can't expect anything
> specific. And for things like usb-sticks andwhatnot, so what if the app gets hda2
> instead of the proper sdc1? It shouldn't matter, because at that point the
> malicious process has access to the device anyway, so all potential harm that could've been
> caused by the confusion (if any, which I doubt) it could do itself already.
Yes, they are the boundary.

> Call me silly, but implementing your checks in udev, or whatever handles /dev,
> and disallowing everything else from modifying /dev would also have the same
> effect. Or if you don't trust udevd write your own tiny replacement which does
> the checking, I'm sure that can be done in little extra code. Or modify udev so
> that it doesn't handle /dev directly, but passes it to your daemon who does the
> ckecks you want.
If everyone can always get source code and modify the source code
and make the code always error-free, I don't need in-kernel implementation.
As I said, userland application sometimes misbehaves.
I trust only in-kernel access control implementations
about guaranteeing name/attributes pairs.


Thanks.

2008-01-09 05:04:32

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

On Tue, 08 Jan 2008 22:50:43 +0900, Tetsuo Handa said:

> Yes. It is a line-by-line processable format defined as:
>
> filename permission owner group flags type [ symlink_data | major minor ]
>
> where flags are bit-wised combinations of
>
> * 1: Allow creation of the file.
> * 2: Allow deletion of the file.
> * 4: Allow changing permissions of the file.
> * 8: Allow changing owner or group of the file.
> * 16: For internal use. Remembers whether this file is opened or not.
> * 32: Don't create this file at mount time.
>
> and here are some example entries:
>
> pts 755 0 0 0 d

Good summary - probably should add that to the patch, drop it into
Documentation/syaoran-config.txt or similar...

> > the idea of passing a file to be read by the kernel, but I also understand
> > that if it isn't done before mount, you have a race condition betweet the
> > mount and the load.
> What race condition is possible?
> Are you worrying that the file gets modified while reading?

Modification while reading *is* an issue, but can probably be worked around
with some clever locking. The race condition I was thinking of was if you
had the mount and the policy load be 2 separate events, you could see:

(a) issue mount request
(b) do something malicious in /dev while..
(c) load the policy that would have prevented (b).

This is partly why SELinux has init load the policy *very* early on, before
any other userspace have had a chance to run and do things that would have
been prevented by policy.

>> Does this do what you think it does if run in a chroot process or if
>> some creative person does "accept=../../path/to/bad_data.cfg"?
> sys_open() calls open_pathname() with AT_FDCWD.
> So, it is the same thing as calling
> open("../../path/to/bad_data.cfg", O_RDONLY) from the userland.

Which basically ends up meaning that anybody who can trick the mount into
happening can reset the permitted list and create (for example) a mode 666
entry for a hard drive, and go scribbling around at will. Note that you
don't seem to do any sanity checking on the path (for instance, that each
component is owned by root, and not world-writable) - so anybody who finds
a way to get the mount to happen can supply their own list in /home/joeuser/blat
or /tmp/surprise-mount-list or wherever.

>> That printk should be KERN_ERR, I think.
> May be. But I think KERN_WARNING is enough because this is not such emergent error.

OK, I can live with WARNING. You just want to be sure it's above INFO...


Attachments:
(No filename) (226.00 B)

2008-01-09 06:26:55

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

[email protected] wrote:
> Good summary - probably should add that to the patch, drop it into
> Documentation/syaoran-config.txt or similar...
I see.

> Modification while reading *is* an issue, but can probably be worked around
> with some clever locking. The race condition I was thinking of was if you
> had the mount and the policy load be 2 separate events, you could see:
>
> (a) issue mount request
> (b) do something malicious in /dev while..
> (c) load the policy that would have prevented (b).
>
> This is partly why SELinux has init load the policy *very* early on, before
> any other userspace have had a chance to run and do things that would have
> been prevented by policy.
So, you suggested to load policy before mount() request so that
this filesystem can prevent attackers from doing something malicious
by minimizing (i.e. implement as non-blocking operation) the latency
between the userland process's call of mount() and the nodes become visible
to userland process.

I didn't take such cases into account.
My assumed usage of this filesystem is that run a script with

#!/bin/sh
mount -t syaoran -o accept=/etc/ccs/syaoran.conf none /dev
exec /sbin/init "$@"

by passing "init=/path/to/this/script" to the kernel command line
so that /sbin/init can create /dev/initlog on this filesystem.
If you mount this filesystem after /sbin/init starts,
it will shadow /dev/initctl opened by /sbin/init .

> Which basically ends up meaning that anybody who can trick the mount into
> happening can reset the permitted list and create (for example) a mode 666
> entry for a hard drive, and go scribbling around at will. Note that you
> don't seem to do any sanity checking on the path (for instance, that each
> component is owned by root, and not world-writable) - so anybody who finds
> a way to get the mount to happen can supply their own list in /home/joeuser/blat
> or /tmp/surprise-mount-list or wherever.
I assume that being able to reach this location means the caller of mount() is root.
But, the patches to allow mount() by non-root is in progress? http://lkml.org/lkml/2008/1/8/131
May be I should add some sanity checking on the path.

Thank you.

2008-01-09 14:00:17

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello,

On Wed, January 9, 2008 05:39, Tetsuo Handa wrote:
> Hello.
>
> Indan Zupancic wrote:
>> I think you focus too much on your way of enforcing filename/attributes
>> pairs.
> So?

So that you miss alternatives and don't see the bigger picture.

>
>> The same can be achieved by creating the device nodes with
>> expected attributes, and preventing processes from changing those files.
> The device nodes have to be deletable if some process (including udev) needs
> to delete.
> Thus, you cannot unconditionally prevent processes from changing those files.
>
>> This because expected combinations are known beforehand.
> Yes.
>
>> And once those files are present, the MAC system used doesn't have to have
>> special
>> device nodes attributes support. Protecting those files is enough to
>> guarantee filename/attributes pairs.
> If MAC system needn't to support this filesystem's functionality,
> who creates those files with warrantee of expected attributes? The udev does?
> If udev is exploited, who can guarantee?

The person that would write the config file for your fs, the one who wants
that guarantee.

>
>> No, this is because rename permission was given for files that it shouldn't
>> had.
> Do you think all MAC implementation have the same granularity and
> functionalities?
> I don't think so. Not all MAC implementation can control with such
> granularity.
> This filesystem is designed to be combined with any MAC,
> although the MAC used with this filesystem should be able to restrict
> namespace manipulation requests so that this filesystem can remain /dev
> and visible to userland applications.

Good point, but I assume they all have at least a directory granularity, and then
/dev/ can be static and udev and other can have free reign in e.g. /dev/dynamic/.
Just use subdirs for the dynamic stuff and this granularity problem is, with
slight
inconvenience, solved.

>
>> Either you want a process to manage device names and attributes, and then
>> you
>> give it permission to do that, or you want to enforce certain
>> filename/attribute
>> pairs and then you just do it yourself.
> If I modify udev to enforce certain filename/attribute pairs and the modified
> udev
> was exploited, who can guarantee?
> "Don't trust userland application" is the basis of restricting access in
> kernel space.
> If you can trust userland application, you don't need in-kernel access
> control.

Funny, I thought that it was in the kernel because that's the way to protect
processes against eachother, the fs against processes, and for performance
reasons.

Exploits are in code, and where that code is doesn't matter that much, either
kernel or userspace, though if it's exploitable you'll rather not have it in the
kernel. So I think it's more secure if the checking would be done by udev than
in a special filesystem, even if that means that you're screwed if udev is
exploited. Of course you fully trust your own code, naturally.

A tiny daemon that communicates with udev and does the checking you have
now, and if ok it creates the node is really not much more code than your fs,
so as hard to exploit too. Then if udev is hacked you have the same guarantee
as you have now.

I can think of more alternatives that are as secure or more secure than the
current solution.

>
>
>> Will your filesystem prevent the trivial case of
>>
>> rm /dev/hda1
>> ln -s /dev/hda2 /dev/hda1
>>
> Of course. To permit the above operation, the following permissions are
> needed.
>
> hda1 660 0 6 2 b 3 1
> hda1 777 0 0 33 l .

Yes, I should've read the code before asking that, instead of the other way
round.

>
>> Rename permission can be given for /dev in general, but prohibited for
>> certain files in /dev, the ones you want to have specific attributes.
>> It isn't all or nothing.
> Do you think all MAC implementation can prohibit renaming for certain files in
> /dev ?
>
>> It's "forbid modifying certain nodes that process needn't to modify"
>> versus "forbid breaking filename/attribute pairs of certain nodes".
>>
>> Both have the same effect, except that the first one is generic and
>> can be done by existing MAC systems, while the second one needs
>> a special filesystem and a handful of MAC rules to make it effective.
> Do you think all MAC implementation can do?
> I think the first one is implementation specific and the second one is
> generic.

Protecting certain files from being modified seems to me more generic than
enforcing filename/attributes pairs on device nodes. And if they can't do it
surely they can do it per directory, and the using subdirs solves it.

>
>> It doesn't matter where they are, it's that a different fs than yours could
>> be
>> mounted over it. You say a MAC can prevent that from happening, but a
>> MAC can also prevent all processes except for udev from modifying /dev.
> But MAC cannot prevent udev from modifying /dev . And what if exploited?
> Not all MAC can enforce access control over all processes with the granularity
> you are talking. And what if a process that cannot be controlled with your
> boolean level granularity exists (e.g. an administrator running his/her
> administrative applications that require modification of /dev )?
>
> A crazy example of administrative applications:
> (Please don't say "Don't use such crazy application".)
>
> #! /bin/sh
> rm -f /dev/either-null-or-zero
> read
> mknod /dev/either-null-or-zero c 1 $REPLY && echo "Administrative task
> finished successfully." | mail root
>
> This filesystem can guarantee /dev/either-null-or-zero is either char-1-3 or
> char-1-5 by using a policy
>
> either-null-or-zero 666 0 0 3 c 1 3
> either-null-or-zero 666 0 0 35 c 1 5
>
> The boolean level granularity (e.g. forbid all processes except for udev ,
> and modify udev to perform name/attribute pair enforcement) is not generic.

This is one solution. The other is to protect the files you want to guaruante
with
MAC and then all apps can do whatever they want, not only udev, except for
breaking the guaranteed filename/attributes pairs. And if that can't happen
within dev on a per filename base, then it can happen per directory, and apps
may create only nodes in certain subdirs of /dev/, instead of /dev/ itself.

And those other programs could be taught to create the nodes via udev who
does the checking, or they all modify a /dyndev/ and a daemon who does the
checking copies nodes over to the real /dev/ when it's sane. There are plenty
of ways to solve those details.

rm -f /dev/either-null-or-zero

as said before, if this is possible then the MAC config used is wrong. Exactly
the same as for your filesystem with

mknod /dev/tmp1 c 1 X
mount --bind /dev/tmp1 /dev/either-null-or-zero

and you count on the MAC to prevent that.

And as for that app, if you trust it to create device nodes, why don't you
trust it
to make the right nodes too? If an administrator wants something else than
3 or 5, you're breaking something. The worst is that as it's an administrator
app it's made for policy handling, but you just moved that to somewhere else,
basically making the app useless and spreading around config stuff.

This app is unrealistic anyway. The standard way to choose between /dev/null
and /dev/zero is to open one instead of the other, instead of changing the
content.
The interface is also crap because if the choice is really between null and
zero, the
argument shouldn't be a cryptic number but the choice. And then it are two if
checks with fixed mknod commands. If you trust this script you get what you
deserve when something else is passed than 3 or 5, because it's obvious that
that's
possible.

> Userland application sometimes misbehaves.
> I assume kernel process doesn't misbehave.
> If you doubt my assumption, you have to doubt in-kernel MAC implementation
> too.

See above. But who says that the MAC used can provide the additional protection
that's needed to make your fs work at all?

>
>> I don't. What I complain about is that it's too specific and does it one
>> chosen
>> job badly. It lacks abstraction. As far as I can see any decent MAC can
>> achieve
>> the same end result as your filesystem, without directly enforcing name/attr
>> pairs.
> Can SELinux guarantee the same result as my filesystem even if udev or
> administrative programs have to be able to modify /dev ?

More, because your filesystem doesn't guarantee anything at all on its own.
But assuming the MAC is decent enough to protect your fs from being bypassed,
I'm sure it can do what's needed fine without your fs. I can't answer for SELinux
because I don't know it well. But I trust it can protect files and/or
directories, and
that's all that's needed to achieve the same end result.

>
>> The thing is, all special device nodes that are expected to exist by
>> applications
>> are known beforehand.
> Yes.
>
>> Thus they can be created statically and can be protected
>> against any modifications with any MAC system.
> But sometimes some modifications needs to be permitted.
> Who can guarantee that there is no application (other than udev)
> that creates/deletes /dev/zero instead of /dev/either-null-or-zero ?

I hope the MAC can do it. If not, I hope it can protect /dev/ and all
modifications need to be done in subdirs of dev, which practically
already happens anyway. And if the MAC can't even do that, I think
it's a useless piece of junk.

>
>> The dynamic nodes aren't known beforehand, so applications can't expect
>> anything
>> specific. And for things like usb-sticks andwhatnot, so what if the app gets
>> hda2
>> instead of the proper sdc1? It shouldn't matter, because at that point the
>> malicious process has access to the device anyway, so all potential harm
>> that could've been
>> caused by the confusion (if any, which I doubt) it could do itself already.
> Yes, they are the boundary.
>
>> Call me silly, but implementing your checks in udev, or whatever handles
>> /dev,
>> and disallowing everything else from modifying /dev would also have the same
>> effect. Or if you don't trust udevd write your own tiny replacement which
>> does
>> the checking, I'm sure that can be done in little extra code. Or modify udev
>> so
>> that it doesn't handle /dev directly, but passes it to your daemon who does
>> the
>> ckecks you want.
> If everyone can always get source code and modify the source code
> and make the code always error-free, I don't need in-kernel implementation.
> As I said, userland application sometimes misbehaves.
> I trust only in-kernel access control implementations
> about guaranteeing name/attributes pairs.

You seem to assume that the in-kernel implementation is suddenly
guaranteed bugfree. And the alternative above doesn't require any
source code except of udev, which is available. And it doesn't need to
be more bugfree than your current code either.

You didn't answer my question why the checking isn't done globally
if it's so important.

Greetings,

Indan

2008-01-09 23:38:54

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Quoting Indan Zupancic ([email protected]):
> Hello,
>
> On Wed, January 9, 2008 05:39, Tetsuo Handa wrote:
> > Hello.
> >
> > Indan Zupancic wrote:
> >> I think you focus too much on your way of enforcing filename/attributes
> >> pairs.
> > So?
>
> So that you miss alternatives and don't see the bigger picture.

These emails again are getting really long, but I think the gist of
Indan's suggestion can be concisely summarized:

"To confine process P3 to /dev/hda2 being 'b 3 2', create
/dev/p3, launch P3 in a new mounts namespace, mount --bind
/dev/p3 /dev, exec what you want p3 running, and have
MAC prevent umount /dev/p3."

This is a neat idea, but Tetsuo's rebutall is

"P3 may be legacy code needing to create or delete
/dev/floppy, where -EPERM confuses P3 and prevents
it working correctly."

Indan's idea is interesting and I like it, but is there an answer to
Tetsuo's problem with it?

thanks,
-serge

PS - Indan, you also said in essence "if P3 can be trusted to create
/dev/floppy why can't it be trusted to create /dev/hda1". I trust that,
phrased that way, the question answers itself?

2008-01-10 01:06:47

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

On Thu, January 10, 2008 00:08, Serge E. Hallyn wrote:
> These emails again are getting really long, but I think the gist of
> Indan's suggestion can be concisely summarized:

No worry, I wasn't planning on extending it, I've said what I've to say.

Except...

>
> "To confine process P3 to /dev/hda2 being 'b 3 2', create
> /dev/p3, launch P3 in a new mounts namespace, mount --bind
> /dev/p3 /dev, exec what you want p3 running, and have
> MAC prevent umount /dev/p3."
>
> This is a neat idea, but Tetsuo's rebutall is
>
> "P3 may be legacy code needing to create or delete
> /dev/floppy, where -EPERM confuses P3 and prevents
> it working correctly."
>
> Indan's idea is interesting and I like it, but is there an answer to
> Tetsuo's problem with it?

...that I didn't mean that, but a more simple

/dev/ directory protected from any modifications by MAC,

/dev/* all the nodes that need to have guaranteed name/attribute pairs,
like /dev/null, /dev/zero, /dev/random, etc. and:

/dev/dynamic/ being a dir where apps who really need to create/modify
device nodes can do whatever they want to do. It can be multiple dirs
too, like /dev/snd/, /dev/input/ etc.

I guess this covers about 96% of the usecases of this tamper-proof dev fs.

You can think of unlikely cases that aren't solved by this, but those can
be solved in another way if really wanted (like a checking daemon,
modified udev, shadow /dev/, to name a few).

But I think doing more is getting ridiculous, because if a process can
create a device node, it can also access it and do whatever harm could
be done by the confusion caused by unexpected name/attribute pairs.

As for information snooping, that's mostly about /dev/null or other
things that are known beforehand.

> PS - Indan, you also said in essence "if P3 can be trusted to create
> /dev/floppy why can't it be trusted to create /dev/hda1". I trust that,
> phrased that way, the question answers itself?

Not exactly. If there's a process that dynamically created certain device
nodes, and it wants to create one that doesn't fit the rules, you can't
know if it's wrong or if your rules are wrong. The process has a certain
policy of naming/creating the devices, but you also have a policy at the
kernel side with this fs. If it mismatches you don't know which one is
right.

If you trust a process to create /dev/hd*, you can also trust it to create
the proper /dev/hdXn, no need to verify if /dev/hda1 is really 3 1.

The whole thing about filename/attribute pairs is that it's about what
applications expect. There aren't many expectations about dynamically
created device nodes which might not always be there, because their
name isn't stable.

The use case for this fs is a malicious app that can create device nodes,
and we're worried about mismatching name/attribute pairs. Not about
our data, or anything else. Call me an optimist, but I think you don't
need to worry about name/attribute pairs.

Greetings,

Indan

2008-01-10 05:08:53

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.

Indan Zupancic wrote:
> Good point, but I assume they all have at least a directory granularity, and then
> /dev/ can be static and udev and other can have free reign in e.g. /dev/dynamic/.
> Just use subdirs for the dynamic stuff and this granularity problem is, with
> slight inconvenience, solved.

It seems to me that the alternatives you are proposing include modification of
userland applications. But my assumption is that
"Don't require modification of userland applications".
In other words, I want to implement without asking applications
to use /dev/dynamic/ or something.
This filesystem is intended to provide support for legacy applications.
(In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and later.)



> Exploits are in code, and where that code is doesn't matter that much, either
> kernel or userspace, though if it's exploitable you'll rather not have it in the
> kernel. So I think it's more secure if the checking would be done by udev than
> in a special filesystem, even if that means that you're screwed if udev is
> exploited. Of course you fully trust your own code, naturally.

I'm keeping the mechanism as simple as possible
so that there is unlikely room (e.g. buffer overflow) for running exploits.



> A tiny daemon that communicates with udev and does the checking you have
> now, and if ok it creates the node is really not much more code than your fs,
> so as hard to exploit too. Then if udev is hacked you have the same guarantee
> as you have now.

Use of a tiny daemon that communicates with udev is not sufficient.
The udev is not the only application that modifies /dev files.
At least, the tiny daemon should communicate with the kernel
so that all requests are checked by the tiny daemon.
But use of the tiny daemon (which is a process running in userland)
causes a lot of troubles.
See the block after the "---------- boundary ----------" of this posting.

My assumption is that "Don't require userland process's assistance",
as written at "Why not use FUSE?".



> Protecting certain files from being modified seems to me more generic than
> enforcing filename/attributes pairs on device nodes.
OK. You are saying that from the point of view of "what it can".
I thought you were saying "enforcing filename/attributes pairs
from out-of-this-filesystem (e.g. MAC) is more flexible than this-filesystem".



> rm -f /dev/either-null-or-zero
>
> as said before, if this is possible then the MAC config used is wrong. Exactly
> the same as for your filesystem with
>
> mknod /dev/tmp1 c 1 X
> mount --bind /dev/tmp1 /dev/either-null-or-zero
>
> and you count on the MAC to prevent that.

An administrator asks MAC to prevent processes
(except specific processes who need to do "rm -f /dev/either-null-or-zero")
from doing "rm -f /dev/either-null-or-zero".

An administrator asks this filesystem to prevent processes from doing
"mknod /dev/tmp1 c 1 X".

An administrator asks MAC to prevent processes from doing
"mount --bind /dev/tmp1 /dev/either-null-or-zero".



> And as for that app, if you trust it to create device nodes, why don't you
> trust it to make the right nodes too?

If that app has a bug that triggers
mknod /dev/either-null-or-zero 1$REPLY
instead of
mknod /dev/either-null-or-zero $REPLY
under an unexpected circumstance, it will create unwanted nodes.
Thus I don't trust the app.



> If an administrator wants something else than
> 3 or 5, you're breaking something.
That's the fate of white-list based access control.

Does this filesystem sound too strict to support dynamic device?
May be this filesystem should be able to permit creation of device nodes
that are not listed in the policy file.



> > Can SELinux guarantee the same result as my filesystem even if udev or
> > administrative programs have to be able to modify /dev ?
>
> More, because your filesystem doesn't guarantee anything at all on its own.
> But assuming the MAC is decent enough to protect your fs from being bypassed,
> I'm sure it can do what's needed fine without your fs. I can't answer for SELinux
> because I don't know it well. But I trust it can protect files and/or
> directories, and that's all that's needed to achieve the same end result.

I don't know SELinux well, but as far as seeing an example
(found by Googling "selinux allow mknod")

allow udev_t self:capability { chown dac_override dac_read_search fowner fsetid sys_admin sys_nice mknod net_raw net_admin sys_rawio };

I can't find a place to specify filename/attributes pairs in this syntax.
So, if the process who is permitted to create device nodes misbehaves,
it will generate unexpected filename/attribute pairs.
I think SELinux can't guarantee the same result as my filesystem.



> You seem to assume that the in-kernel implementation is suddenly
> guaranteed bugfree.
I keep the implementation as simple as possible.



>From your next posting:
> But I think doing more is getting ridiculous, because if a process can
> create a device node, it can also access it and do whatever harm could
> be done by the confusion caused by unexpected name/attribute pairs.

FYI. Being able to create a device node is different from being able to access it
and do whatever harm. You will need read and/or write permission to open that device.
It is possible to write an application who can create a device node but cannot open that node.
What this filesystem is trying to solve is that
"guarantee filename/attribute pairs for device nodes".


---------- boundary ----------


This filesystem is a kind of MAC implementation dedicated for
guaranteeing filename/attributes pairs.
But to avoid confusion with generic MAC implementation
(which performs access controls based on subject's rights and
object's attributes, such as SELinux),
I'm not calling this filesystem as a MAC implementation.

The problem is caused by the Unix's way of calling special functions
(e.g. a function that always gets EOF on read()) is associated with
filenames (e.g. /dev/null) and applications calls special functions
using the filenames.

If the way of calling special functions were
int fd = opendev("char", "1", "3", O_RDONLY);
instead of
int fd = open("/dev/null", O_RDONLY);
I don't need to implement this filesystem.

And the association of filename and attributes is performed
when a device node is created by mknod() (e.g. shmem_get_inode() for tmpfs).
So, I believe it is natural to implement filename/attributes pair enforcement
in the same layer (i.e. filesystem layer).

Fortunately, administrators use a dedicated partition for /dev
(i.e. mount tmpfs on /dev and let udev manipulate device nodes).
So, I can implement this filename/attributes enforcement within tmpfs.

Implementing this filename/attributes enforcement out of filesystem layer
causes a lot of troubles (i.e. amount of source code that developers have to write
and the amount of policy configuration that administrators have to configure).

If you don't rely on help from a userland process,
you have to be careful with the following points.

Guarantee the MAC implementation has enough granularities to support
filename/attribute information for mknod permission.

Guarantee the MAC policy never contains an entry that can cause
filename/attribute mismatching (e.g. a permission to link from
/dev/hda1 (which has char-3-1 attributes) to /dev/hda2 (which has
char-3-2 attributes)).

If you rely on help from a userland process, you have to be careful with
the following points.

Guarantee that the daemon process the kernel is communicating with is genuine.
(If the daemon is a fake, no one can guarantee.)

Guarantee that the daemon process the kernel is communicating with
is free from segmentation faults or signals or ptraces.
(If the daemon dies by SIGSEGV or SIGKILL, the kernel can no longer judge.
If the daemon is controlled by other process using ptrace(),
the kernel will receive fake response from the daemon.
I'm sure the daemon is killed by a shutdown script in /etc/init.d/ directory.)

Do you want to let developers implement these functionalities and
let administrators configure all policy for these functionalities?

I'm sure it is impossible.
Not all MAC supports these functionalities.
Not all administrators have enough knowledge and skills for configuring policy.

But implementing this filename/attributes enforcement in filesystem layer
causes less troubles. It just requires that:

Keep this filesystem mounted on /dev always visible to userland.

In other words,

Prevent unauthorized users/processes from mounting other filesystem over this filesystem.
Prevent unauthorized users/processes from unmounting this filesystem.

This is much easier and likely to configure proper policy for this filesystem.

Regards.

2008-01-10 23:06:18

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

On Thu, January 10, 2008 05:57, Tetsuo Handa wrote:
> It seems to me that the alternatives you are proposing include
> modification of userland applications. But my assumption is
> that "Don't require modification of userland applications".

If you want a secure system it isn't that unreasonable to expect
applications to not do brain dead things, so not requiring any
modifications or config changes seems a bit optimistic to me.

> In other words, I want to implement without asking applications
> to use /dev/dynamic/ or something.
> This filesystem is intended to provide support for legacy applications.
> (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and
> later.)

Legacy applications should cope with a static /dev/.

What is the advantage of your filesystem compared to a static /dev/?

> Use of a tiny daemon that communicates with udev is not sufficient.
> The udev is not the only application that modifies /dev files.

Oh, it isn't? Which other applications do modify /dev files? I'd like to
hear about a few, no matter how obscure or proprietary. And please
tell how many of those will stop working with a static /dev with all
nodes they might create already existing.

> At least, the tiny daemon should communicate with the kernel
> so that all requests are checked by the tiny daemon.

No, why should the kernel be involved? The tiny daemon would be
the only one allowed to modify /dev/, so all mknod commands will
be done by it. Of course it means that you might need to modify
the two or three apps wanting to create device nodes, or you can
make an LD_PRELOAD lib that intercepts mknod commands and
sends them to the daemon.

The ammount of code will be the current parsing code + a few hundred
lines of code, including the preloaded library.

> But use of the tiny daemon (which is a process running in userland)
> causes a lot of troubles.

No, it doesn't, and most of those problems are true for all programs
that access /dev! If those are straced or whatever they can be forced
to open the wrong file, practically breaking the filename/attribute pairs.
So all security you think you need to have for the daemon process is
the same security you already need for all processes anyway to protect
them against each other.

>> If an administrator wants something else than
>> 3 or 5, you're breaking something.
> That's the fate of white-list based access control.
>
> Does this filesystem sound too strict to support dynamic device?
> May be this filesystem should be able to permit creation of device
> nodes that are not listed in the policy file.

Actually, I assumed that was the case, because if it's strictly white-list
based it's almost the same as a static /dev with some nodes hidden.
Without it has even less value, because it just complicates matters
compared to a normal static dev.

I thought it checked that if a device name was in the list, it has the
correct attributes, and was free to create nodes without restricted
names.

> From your next posting:
>> But I think doing more is getting ridiculous, because if a process can
>> create a device node, it can also access it and do whatever harm could
>> be done by the confusion caused by unexpected name/attribute pairs.
>
> FYI. Being able to create a device node is different from being able to access
> it and do whatever harm. You will need read and/or write permission to open
> that device.

Yes, but as the process creates the device it can also choose the file mode and
probably also ownership. And as it creates a new file there likely aren't strict
MAC rules in place restricting the process from reading or writing to it. So
yes, you're right, but in practise it isn't as easy to close that hole,
especially
not if the applications isn't very clean and single purpose. If it creates the
node
it probably wans to use it too, and that means read/write access. Even if it can
live without it, it could give access to the node to another process and let the
other process do the dirty work. Very tricky.

Greetings,

Indan

2008-01-11 08:47:13

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.



Indan Zupancic wrote:
> > It seems to me that the alternatives you are proposing include
> > modification of userland applications. But my assumption is
> > that "Don't require modification of userland applications".
>
> If you want a secure system it isn't that unreasonable to expect
> applications to not do brain dead things, so not requiring any
> modifications or config changes seems a bit optimistic to me.

It depends.
Some users have to continue using brain dead legacy applications
without modification because ...

the application's source code is not available.

the distributor no longer supports the application.

the application is too difficult/complicated to reconstruct.

For cases where you can expect "application won't do brain dead things"
and/or "we can reconstruct application", your approach is OK.



> > In other words, I want to implement without asking applications
> > to use /dev/dynamic/ or something.
> > This filesystem is intended to provide support for legacy applications.
> > (In fact, this filesystem in TOMOYO Linux is for kernel 2.4.30/2.6.11 and
> > later.)
>
> Legacy applications should cope with a static /dev/.
> What is the advantage of your filesystem compared to a static /dev/?

I assume "a static /dev/" means a /dev/ directory in 2.4 kernels.
This filesystem's advantage:

(1) Can guarantee filename/attribute pairs.

A process with "root" privilege can do
"mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2"
if /dev is in / partition or is a devfs partition, whereas
a process with "root" privilege cannot do
"mv /dev/hda1 /dev/hda1.tmp; mv /dev/hda2 /dev/hda1; mv /dev/hda1.tmp /dev/hda2"
if /dev is this filesystem unless granted by the configuration file.

So, you can guarantee that /dev/hda1 is block-3-1 and /dev/hda2 is block-3-2 .
(e.g. "mount /dev/hda1 /home" won't mount block-3-2 partition on /home .)

(2) Can keep nodes that needn't to be deleted/modified for read-only.

A process with "root" privilege can delete /dev/null on / partition or
on devfs partition, whereas a process with "root" privilege cannot delete
/dev/null on this filesystem unless granted by the configuration file.

So, you can guarantee the node which needn't to be deleted/modified
won't be deleted/modified.
(e.g. /dev/null is always there with char-1-3 attribute.)

(3) Can hide unwanted device nodes.

A process with "root" privilege can create new nodes on / partition or on devfs,
whereas a process with "root" privilege cannot create new nodes on this filesystem
that are not specified by configuration file.

So, you can expose specific nodes selectively.
(e.g. Allow accessing /dev/hda1 , but forbid accessing /dev/hda2 .)



> > Use of a tiny daemon that communicates with udev is not sufficient.
> > The udev is not the only application that modifies /dev files.
>
> Oh, it isn't? Which other applications do modify /dev files? I'd like to
> hear about a few, no matter how obscure or proprietary. And please
> tell how many of those will stop working with a static /dev with all
> nodes they might create already existing.

I don't know. I'm not using rare software.



> > At least, the tiny daemon should communicate with the kernel
> > so that all requests are checked by the tiny daemon.
>
> No, why should the kernel be involved? The tiny daemon would be
> the only one allowed to modify /dev/, so all mknod commands will
> be done by it. Of course it means that you might need to modify
> the two or three apps wanting to create device nodes, or you can
> make an LD_PRELOAD lib that intercepts mknod commands and
> sends them to the daemon.

No. The kernel must be involved.

Suppose the tiny daemon is the only one allowed to modify /dev/ .
"foo" requests "mknod /dev/null" from chroot() environment.
"bar" requests "mknod /dev/null" from clone(CLONE_FS) + mount() environment.

How can the daemon know where to create the node?
How can the daemon determine whether the requested pathname is
in /dev directory or not?
The process who requests "mknod" and the process who performs "mknod"
are not always using the same "/" directory.
The daemon must not forbid creation of /dev/null if the realpath() is
/tmp/dev/null (i.e. "mknod /dev/null" after "chroot /tmp"),
because the daemon is not asked to manage /tmp/dev directory.

Who can guarantee that the daemon can access all namespaces?
The process who requests "mknod" and the process who performs "mknod"
are not always using the same namespace.

If "foo" or "bar" is a statically linked or suid-root application
(where LD_PRELOAD is ignored), they would attempt to create device nodes
directly (i.e. call sys_mknod() instead of communicating with the daemon)
and abort due to failure.
Not only applications who wants to create device nodes in /dev/ ,
but also all applications who wants to modify entries in /dev/ .


>From the beginning, the kernel is deeply involved because in-kernel MAC
is essential to realize "only the tiny daemon can modify /dev/".
Why not do this "filename/attribute" checking in the kernel too?



> The ammount of code will be the current parsing code + a few hundred
> lines of code, including the preloaded library.

You will be bothered with "what is the realpath of /dev/null?" and
"how can I reach the realpath?" because you have to manage
namespace information.
In the LSM list, there was a discussion of
"How to implement (pathname based) AppArmor using (label based) SELinux",
and somebody proposed to use a daemon that immediately updates labels of
accessed files.
But I think such daemon cannot always access to all namespaces.



> >> If an administrator wants something else than
> >> 3 or 5, you're breaking something.
> > That's the fate of white-list based access control.
> >
> > Does this filesystem sound too strict to support dynamic device?
> > May be this filesystem should be able to permit creation of device
> > nodes that are not listed in the policy file.
>
> Actually, I assumed that was the case, because if it's strictly white-list
> based it's almost the same as a static /dev with some nodes hidden.
> Without it has even less value, because it just complicates matters
> compared to a normal static dev.
>
> I thought it checked that if a device name was in the list, it has the
> correct attributes, and was free to create nodes without restricted
> names.

OK. I'll consider adding this feature.
But I'd like to use approach (B) to keep the advantage (3).

(A) White-listing + Black-listing approach.

"Permit any operations if the filename didn't appear
in the configuration file".

(B) White-listing + Wild-card approach.

"Support wildcard and permit only operations if
the filename-with-wildcard/attributes-with-wildcard appeared
in the configuration file".



Thanks.

2008-01-11 12:22:34

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hi,

On Fri, January 11, 2008 09:46, Tetsuo Handa wrote:
> It depends.
> Some users have to continue using brain dead legacy applications
> without modification because ...
>
> the application's source code is not available.

Source isn't needed, as long as the vendor has it.

> the distributor no longer supports the application.

Then why should anyone else support it?

> the application is too difficult/complicated to reconstruct.

Then you can't trust it and it shouldn't have permission to do
potentially dangerous things in /dev/ either. Even if you can
contain the device node creation, it most likely does other
potentially dangerous things too. As a whole it can't be trusted.

> I assume "a static /dev/" means a /dev/ directory in 2.4 kernels.
> This filesystem's advantage:

I'm not talking about devfs, I'm talking about a real static /dev.
I'm using it now and it works fine (I let udev manage /udev/ to see
what's it's doing).

> (1) Can guarantee filename/attribute pairs.

Wrong. All nodes are created and thus there's never a need to create
new nodes. So /dev/ can't be modified by anyone. This works because
all nodes that anyone might want to create already exist.

> (2) Can keep nodes that needn't to be deleted/modified for read-only.

This would also be true for all nodes in a static /dev I think.

> (3) Can hide unwanted device nodes.

In a static /dev you only create the nodes you want. It's true that it
can't hide nodes for hardware that doesn't exist (other than deleting
the nodes manually), but that was the norm for years before the
whole dynamic /dev thing catched up.

> I don't know. I'm not using rare software.

It doesn't have to be rare, anything is fine. You don't know
anything else than udev? (And shell commands like mknod etc.)

Then why all the talk about mysterious apps that might need to
do all kind of crazy things in /dev?

> No. The kernel must be involved.

> Who can guarantee that the daemon can access all namespaces?
> The process who requests "mknod" and the process who performs "mknod"
> are not always using the same namespace.

This is true on a theoretical level. But practically I think you can either
run multiple daemons, one for each namespace where you want to
control /dev/, or if you really want one daemon you can pass the
directory fd to it where the node should be created and use mknodat().
I believe that crosses namespaces correctly.

If the daemon can't be contacted or doesn't want to do a mknod for you,
the preloaded lib can fallback to doing the mknod itself, though normally
that would be disallowed by MAC.

But I think that the chance that any process needs to create device nodes
in a chroot is at the level of fairy existance.

> If "foo" or "bar" is a statically linked or suid-root application
> (where LD_PRELOAD is ignored), they would attempt to create device nodes
> directly (i.e. call sys_mknod() instead of communicating with the daemon)
> and abort due to failure.
> Not only applications who wants to create device nodes in /dev/ ,
> but also all applications who wants to modify entries in /dev/ .

If the preloaded library is setuid, it will also work for setuid programs.
It's true that it won't work for statically linked apps, but so what?

Device node creating apps are rare enough, let alone the ones that are
also statically linked. Nice theoretical problem, but I don't think anyone
will care in practice.

> From the beginning, the kernel is deeply involved because in-kernel MAC
> is essential to realize "only the tiny daemon can modify /dev/".
> Why not do this "filename/attribute" checking in the kernel too?

That "only the tiny daemon can modify /dev/" is done with MAC rules,
the ones that should be the default for all applications except udev by
default already. For teh kernel nothing changes.

>> The ammount of code will be the current parsing code + a few hundred
>> lines of code, including the preloaded library.
>
> You will be bothered with "what is the realpath of /dev/null?" and
> "how can I reach the realpath?" because you have to manage
> namespace information.

Or ignore the problem and see if it's a real problem or a nice theoretical
case. And when it turns out to be a real problem, there are probably
ways to fix it (See above). But you know what exactly is needed only
after problems do turn up.

> OK. I'll consider adding this feature.
> But I'd like to use approach (B) to keep the advantage (3).
>
> (A) White-listing + Black-listing approach.
>
> "Permit any operations if the filename didn't appear
> in the configuration file".
>
> (B) White-listing + Wild-card approach.
>
> "Support wildcard and permit only operations if
> the filename-with-wildcard/attributes-with-wildcard appeared
> in the configuration file".

With this the filesystem at least adds some unique abilities.

If anyone really needs it and where/how it should be implemented is
another matter.

Without it it's a glorified and complicated drop-in replacement for
a static /dev/.

Regards,

Indan

2008-01-11 14:05:25

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

Hello.



Indan Zupancic wrote:
> That "only the tiny daemon can modify /dev/" is done with MAC rules,
> the ones that should be the default for all applications except udev by
> default already. For teh kernel nothing changes.

OK. You assume use of MAC with enough fine grained access control.



> Wrong. All nodes are created and thus there's never a need to create
> new nodes. So /dev/ can't be modified by anyone. This works because
> all nodes that anyone might want to create already exist.

Already exist is not enough.
These nodes have to be deletable if requested by appropriate process.
These nodes have to be protected by MAC from directly calling
mknod()/rename()/unlink()/link()/mount() etc.



> This is true on a theoretical level. But practically I think you can either
> run multiple daemons, one for each namespace where you want to control /dev/,

If the daemon does not exist in that namespace?

> or if you really want one daemon you can pass the
> directory fd to it where the node should be created and use mknodat().
> I believe that crosses namespaces correctly.

The "fd" passed to mknodat() is used for starting from
specified directory instead for current directory.
The object obtained by resolving the rest "pathname" depends on
the "/" of the calling process.

If /var/jail/dev/dyndev/link is a symlink to /dev ,
a process in chroot("/var/jail/") + chdir("/") will get "/var/jail/dev/node"
and a process not in chroot("/var/jail/") + chdir("/") will get "/dev/node"
by resolving mknodat(fd_for_"/var/jail/", "dev/dyndev/link/node") .
If the process is in the chroot() but the daemon is not in the chroot() ,
the daemon will create nodes in a wrong location.

So, you let the LD_PRELOAD library to solve all directory components
before passing the "fd" to the daemon using UNIX domain socket
so that the daemon won't create nodes in a wrong location.

OK. It looks like working, although I'm not taking racy condition into account.



> But I think that the chance that any process needs to create device nodes
> in a chroot is at the level of fairy existance.

Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may
cause filename/attribute mismatching.

How can the daemon know whether the request is trying to manipulate nodes
in /dev directory or not?
If "mount --bind /dev/ /var/dir/" is used, the daemon must check
filename/attribute pair when mknod("/var/dir/null") is requested
because permitting the request will modify /dev state.
If "mount --bind /dev/ /var/dir/" is not used, the daemon must not check
filename/attribute pair when mknod("/var/dir/null") is requested
because permitting the request will not modify /dev state.



What does the daemon do? It receives requests from the LD_PRELOAD library
using UNIX domain socket and checks filename/attribute pair and issue
mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is appropriate?

What does the LD_PRELOAD library do? It intercepts all pathname related syscalls
(except open()) and solve directory component and determine whether the request is
trying to manipulate nodes in /dev direcrtory and forward request to the daemon
using UNIX domain socket?

"Make the daemon and the LD_PRELOAD library bug-and-race free and
develop the MAC policy for the daemon and the LD_PRELOAD library"
and "Make this filesystem bug-and-race free". Which one is easier?



Regards.

2008-01-11 14:47:14

by Lennart Sorensen

[permalink] [raw]
Subject: Re: [PATCH][RFC] Simple tamper-proof device filesystem.

On Fri, Jan 11, 2008 at 11:05:07PM +0900, Tetsuo Handa wrote:
> Not only mknod() but also rename()/unlink()/link()/mount(bind) etc. that may
> cause filename/attribute mismatching.
>
> How can the daemon know whether the request is trying to manipulate nodes
> in /dev directory or not?
> If "mount --bind /dev/ /var/dir/" is used, the daemon must check
> filename/attribute pair when mknod("/var/dir/null") is requested
> because permitting the request will modify /dev state.
> If "mount --bind /dev/ /var/dir/" is not used, the daemon must not check
> filename/attribute pair when mknod("/var/dir/null") is requested
> because permitting the request will not modify /dev state.
>
>
>
> What does the daemon do? It receives requests from the LD_PRELOAD library
> using UNIX domain socket and checks filename/attribute pair and issue
> mknodat()/renameat()/unlinkat()/linkat() etc. when the combination is appropriate?
>
> What does the LD_PRELOAD library do? It intercepts all pathname related syscalls
> (except open()) and solve directory component and determine whether the request is
> trying to manipulate nodes in /dev direcrtory and forward request to the daemon
> using UNIX domain socket?
>
> "Make the daemon and the LD_PRELOAD library bug-and-race free and
> develop the MAC policy for the daemon and the LD_PRELOAD library"
> and "Make this filesystem bug-and-race free". Which one is easier?

I think a good question is:

What kind of idiot wrote a program that thinks it is allowed to go
messing with the contents of /dev? There simply can't be a good reason
for an application to do that. Device nodes should match up with
devices, so as long as the device nodes exist for all your devices, then
everything should just work and no one should ever have a reason to go
changing things for any reason.

Perhaps the real solution is a preload library that blocks the idiotic
program from touching anything in /dev with anything other than
open/close/read/write.

Of course it could also help to simply tell people what this stupid
program is actually doing and why it should be allowed to mess in places
it doesn't belong.

--
Len Sorensen