LinuxLists.cc - [patch 0/2] [RFC] Simple tamper-proof device filesystem.

[permalink] [raw]

Subject: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

A brief description about SYAORAN:

SYAORAN stands for "Simple Yet All-important Object Realizing Abiding
Nexus". SYAORAN is a filesystem for /dev with Mandatory Access Control.

/dev needs to be writable, but this means that files on /dev might be
tampered with. SYAORAN can restrict combinations of (pathname, attribute)
that the system can create. The attribute is one of directory, regular
file, FIFO, UNIX domain socket, symbolic link, character or block device
file with major/minor device numbers.

SYAORAN can ensure /dev/null is a character device file with major=1 minor=3.

Policy specifications for this filesystem is at
http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html

Why not use FUSE?

Because /dev has to be available through the lifetime of the kernel.
It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.

Why not use SELinux?

Because SELinux doesn't guarantee filename and its attribute.
The purpose of this filesystem is to ensure filename and its attribute
(e.g. /dev/null is guaranteed to be a character device file
with major=1 and minor=3).

Signed-off-by: Tetsuo Handa <[email protected]>
---
fs/syaoran/syaoran.c | 338 +++++++++++++++++
fs/syaoran/syaoran.h | 964 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 1302 insertions(+)

--- /dev/null
+++ linux-2.6.24-rc5/fs/syaoran/syaoran.c
@@ -0,0 +1,338 @@
+/*
+ * fs/syaoran/syaoran.c
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Portions Copyright (C) 2005-2007 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2007/12/16
+ *
+ * This filesystem is developed using the ramfs implementation.
+ *
+ */
+/*
+ * Resizable simple ram filesystem for Linux.
+ *
+ * Copyright (C) 2000 Linus Torvalds.
+ * 2000 Transmeta Corp.
+ *
+ * Usage limits added by David Gibson, Linuxcare Australia.
+ * This file is released under the GPL.
+ */
+
+/*
+ * NOTE! This filesystem is probably most useful
+ * not as a real filesystem, but as an example of
+ * how virtual filesystems can be written.
+ *
+ * It doesn't get much simpler than this. Consider
+ * that this file implements the full semantics of
+ * a POSIX-compliant read-write filesystem.
+ *
+ * Note in particular how the filesystem does not
+ * need to implement any data structures of its own
+ * to keep track of the virtual data: using the VFS
+ * caches is sufficient.
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+#include <linux/highmem.h>
+#include <linux/time.h>
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/backing-dev.h>
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+
+static struct super_operations syaoran_ops;
+static struct address_space_operations syaoran_aops;
+static struct inode_operations syaoran_file_inode_operations;
+static struct inode_operations syaoran_dir_inode_operations;
+static struct inode_operations syaoran_symlink_inode_operations;
+static struct file_operations syaoran_file_operations;
+
+static struct backing_dev_info syaoran_backing_dev_info = {
+ .ra_pages = 0, /* No readahead */
+ .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK |
+ BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY |
+ BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP,
+};
+
+#include "syaoran.h"
+
+static struct inode *syaoran_get_inode(struct super_block *sb, int mode,
+ dev_t dev)
+{
+ struct inode *inode = new_inode(sb);
+
+ if (inode) {
+ struct timespec now = CURRENT_TIME;
+ inode->i_mode = mode;
+ inode->i_uid = current->fsuid;
+ inode->i_gid = current->fsgid;
+ inode->i_blocks = 0;
+ inode->i_mapping->a_ops = &syaoran_aops;
+ inode->i_mapping->backing_dev_info = &syaoran_backing_dev_info;
+ inode->i_atime = now;
+ inode->i_mtime = now;
+ inode->i_ctime = now;
+ switch (mode & S_IFMT) {
+ default:
+ init_special_inode(inode, mode, dev);
+ if (S_ISBLK(mode))
+ inode->i_fop = &wrapped_def_blk_fops;
+ else if (S_ISCHR(mode))
+ inode->i_fop = &wrapped_def_chr_fops;
+ inode->i_op = &syaoran_file_inode_operations;
+ break;
+ case S_IFREG:
+ inode->i_op = &syaoran_file_inode_operations;
+ inode->i_fop = &syaoran_file_operations;
+ break;
+ case S_IFDIR:
+ inode->i_op = &syaoran_dir_inode_operations;
+ inode->i_fop = &simple_dir_operations;
+ /*
+ * directory inodes start off with i_nlink == 2
+ * (for "." entry)
+ */
+ inode->i_nlink++;
+ break;
+ case S_IFLNK:
+ inode->i_op = &syaoran_symlink_inode_operations;
+ break;
+ }
+ }
+ return inode;
+}
+
+/*
+ * File creation. Allocate an inode, and we're done..
+ */
+/* SMP-safe */
+static int syaoran_mknod(struct inode *dir, struct dentry *dentry, int mode,
+ dev_t dev)
+{
+ struct inode *inode;
+ int error = -ENOSPC;
+ if (MayCreateNode(dentry, mode, dev) < 0)
+ return -EPERM;
+ inode = syaoran_get_inode(dir->i_sb, mode, dev);
+ if (inode) {
+ if (dir->i_mode & S_ISGID) {
+ inode->i_gid = dir->i_gid;
+ if (S_ISDIR(mode))
+ inode->i_mode |= S_ISGID;
+ }
+ d_instantiate(dentry, inode);
+ dget(dentry); /* Extra count - pin the dentry in core */
+ error = 0;
+ }
+ return error;
+}
+
+static int syaoran_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+ int retval = syaoran_mknod(dir, dentry, mode | S_IFDIR, 0);
+ if (!retval)
+ dir->i_nlink++;
+ return retval;
+}
+
+static int syaoran_create(struct inode *dir, struct dentry *dentry, int mode,
+ struct nameidata *nd)
+{
+ return syaoran_mknod(dir, dentry, mode | S_IFREG, 0);
+}
+
+static int syaoran_symlink(struct inode *dir, struct dentry *dentry,
+ const char *symname)
+{
+ struct inode *inode;
+ int error = -ENOSPC;
+ if (MayCreateNode(dentry, S_IFLNK, 0) < 0)
+ return -EPERM;
+ inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
+ if (inode) {
+ int l = strlen(symname)+1;
+ error = page_symlink(inode, symname, l);
+ if (!error) {
+ if (dir->i_mode & S_ISGID)
+ inode->i_gid = dir->i_gid;
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ } else
+ iput(inode);
+ }
+ return error;
+}
+
+static int syaoran_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
+{
+ struct inode *inode = old_dentry->d_inode;
+ if (!inode || MayCreateNode(dentry, inode->i_mode, inode->i_rdev) < 0)
+ return -EPERM;
+ return simple_link(old_dentry, dir, dentry);
+}
+
+static int syaoran_unlink(struct inode *dir, struct dentry *dentry)
+{
+ if (MayModifyNode(dentry, MAY_DELETE) < 0)
+ return -EPERM;
+ return simple_unlink(dir, dentry);
+}
+
+static int syaoran_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ struct inode *inode = old_dentry->d_inode;
+ if (!inode || MayModifyNode(old_dentry, MAY_DELETE) < 0 ||
+ MayCreateNode(new_dentry, inode->i_mode, inode->i_rdev) < 0)
+ return -EPERM;
+ return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
+static int syaoran_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ if (MayModifyNode(dentry, MAY_DELETE) < 0)
+ return -EPERM;
+ return simple_rmdir(dir, dentry);
+}
+
+static int syaoran_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ struct inode *inode = dentry->d_inode;
+ int error = inode_change_ok(inode, attr);
+ if (!error) {
+ unsigned int ia_valid = attr->ia_valid;
+ unsigned int flags = 0;
+ if (ia_valid & (ATTR_UID | ATTR_GID))
+ flags |= MAY_CHOWN;
+ if (ia_valid & ATTR_MODE)
+ flags |= MAY_CHMOD;
+ if (MayModifyNode(dentry, flags) < 0)
+ return -EPERM;
+ if (!error)
+ error = inode_setattr(inode, attr);
+ }
+ return error;
+}
+
+/*
+ * Copied from mm/page-writeback.c since
+ * __set_page_dirty_no_writeback() is not exported.
+ */
+static int syaoran_set_page_dirty_no_writeback(struct page *page)
+{
+ if (!PageDirty(page))
+ SetPageDirty(page);
+ return 0;
+}
+
+static struct address_space_operations syaoran_aops = {
+ .readpage = simple_readpage,
+ .write_begin = simple_write_begin,
+ .write_end = simple_write_end,
+ .set_page_dirty = syaoran_set_page_dirty_no_writeback,
+};
+
+static struct file_operations syaoran_file_operations = {
+ .aio_read = generic_file_aio_read,
+ .read = do_sync_read,
+ .aio_write = generic_file_aio_write,
+ .write = do_sync_write,
+ .mmap = generic_file_mmap,
+ .fsync = simple_sync_file,
+ .splice_read = generic_file_splice_read,
+ .llseek = generic_file_llseek,
+};
+
+static struct inode_operations syaoran_file_inode_operations = {
+ .getattr = simple_getattr,
+ .setattr = syaoran_setattr,
+};
+
+static struct inode_operations syaoran_dir_inode_operations = {
+ .create = syaoran_create,
+ .lookup = simple_lookup,
+ .link = syaoran_link,
+ .unlink = syaoran_unlink,
+ .symlink = syaoran_symlink,
+ .mkdir = syaoran_mkdir,
+ .rmdir = syaoran_rmdir,
+ .mknod = syaoran_mknod,
+ .rename = syaoran_rename,
+ .setattr = syaoran_setattr,
+};
+
+static struct inode_operations syaoran_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = page_follow_link_light,
+ .put_link = page_put_link,
+ .setattr = syaoran_setattr,
+};
+
+static struct super_operations syaoran_ops = {
+ .statfs = simple_statfs,
+ .drop_inode = generic_delete_inode,
+ .put_super = syaoran_put_super,
+};
+
+static int syaoran_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct inode *inode;
+ struct dentry *root;
+ int error;
+
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_blocksize = PAGE_CACHE_SIZE;
+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
+ sb->s_magic = SYAORAN_MAGIC;
+ sb->s_op = &syaoran_ops;
+ sb->s_time_gran = 1;
+ error = Syaoran_Initialize(sb, data);
+ if (error < 0)
+ return error;
+ inode = syaoran_get_inode(sb, S_IFDIR | 0755, 0);
+ if (!inode)
+ return -ENOMEM;
+
+ root = d_alloc_root(inode);
+ if (!root) {
+ iput(inode);
+ return -ENOMEM;
+ }
+ sb->s_root = root;
+ MakeInitialNodes(sb);
+ return 0;
+}
+
+static int syaoran_get_sb(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *data, struct vfsmount *mnt)
+{
+ return get_sb_nodev(fs_type, flags, data, syaoran_fill_super, mnt);
+}
+
+static struct file_system_type syaoran_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "syaoran",
+ .get_sb = syaoran_get_sb,
+ .kill_sb = kill_litter_super,
+};
+
+static int __init init_syaoran_fs(void)
+{
+ return register_filesystem(&syaoran_fs_type);
+}
+
+static void __exit exit_syaoran_fs(void)
+{
+ unregister_filesystem(&syaoran_fs_type);
+}
+module_init(init_syaoran_fs);
+module_exit(exit_syaoran_fs);
+
+MODULE_LICENSE("GPL");
--- /dev/null
+++ linux-2.6.24-rc5/fs/syaoran/syaoran.h
@@ -0,0 +1,964 @@
+/*
+ * fs/syaoran/internal.h
+ *
+ * Implementation of the Tamper-Proof Device Filesystem.
+ *
+ * Copyright (C) 2005-2007 NTT DATA CORPORATION
+ *
+ * Version: 1.5.3-pre 2007/12/16
+ *
+ * A brief description about SYAORAN:
+ *
+ * SYAORAN stands for "Simple Yet All-important Object Realizing Abiding
+ * Nexus". SYAORAN is a filesystem for /dev with Mandatory Access Control.
+ *
+ * /dev needs to be writable, but this means that files on /dev might be
+ * tampered with. SYAORAN can restrict combinations of (pathname, attribute)
+ * that the system can create. The attribute is one of directory, regular
+ * file, FIFO, UNIX domain socket, symbolic link, character or block device
+ * file with major/minor device numbers.
+ *
+ * Why not use FUSE?
+ *
+ * Because /dev has to be available through the lifetime of the kernel.
+ * It is not acceptable if /dev stops working due to SIGKILL or OOM-killer .
+ */
+
+#ifndef _LINUX_SYAORAN_H
+#define _LINUX_SYAORAN_H
+
+#include <linux/namei.h>
+#include <linux/mm.h>
+
+/***** SYAORAN start. *****/
+
+#define list_for_each_cookie(pos, cookie, head) \
+ for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
+ prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
+ (cookie) = pos, pos = pos->next)
+
+/* The following constants are used to restrict operations.*/
+
+#define MAY_CREATE 1 /* This file is allowed to mknod() */
+#define MAY_DELETE 2 /* This file is allowed to unlink() */
+#define MAY_CHMOD 4 /* This file is allowed to chmod() */
+#define MAY_CHOWN 8 /* This file is allowed to chown() */
+#define DEVICE_USED 16 /* This block or character device file is used. */
+#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
+
+/* some random number */
+#define SYAORAN_MAGIC 0x2F646576 /* = '/dev' */
+
+static void syaoran_put_super(struct super_block *sb);
+static int Syaoran_Initialize(struct super_block *sb, void *data);
+static void MakeInitialNodes(struct super_block *sb);
+static int MayCreateNode(struct dentry *dentry, int mode, int dev);
+static int MayModifyNode(struct dentry *dentry, unsigned int flags);
+static int syaoran_create_tracelog(struct super_block *sb,
+ const char *filename);
+
+/* Wraps blkdev_open() to trace open operation for block devices. */
+static int (*org_blkdev_open) (struct inode *inode, struct file *filp);
+static struct file_operations wrapped_def_blk_fops;
+
+static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
+{
+ int error = org_blkdev_open(inode, filp);
+ if (error != -ENXIO)
+ MayModifyNode(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+/* Wraps chrdev_open() to trace open operation for character devices. */
+static int (*org_chrdev_open) (struct inode *inode, struct file *filp);
+static struct file_operations wrapped_def_chr_fops;
+
+static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
+{
+ int error = org_chrdev_open(inode, filp);
+ if (error != -ENXIO)
+ MayModifyNode(filp->f_dentry, DEVICE_USED);
+ return error;
+}
+
+/* lookup_create() without nameidata. Called only while initialization. */
+static struct dentry *lookup_create2(const char *name, struct dentry *base,
+ const bool is_dir)
+{
+ struct dentry *dentry;
+ const int len = name ? strlen(name) : 0;
+ mutex_lock(&base->d_inode->i_mutex);
+ dentry = lookup_one_len(name, base, len);
+ if (IS_ERR(dentry))
+ goto fail;
+ if (!is_dir && name[len] && !dentry->d_inode)
+ goto enoent;
+ return dentry;
+enoent:
+ dput(dentry);
+ dentry = ERR_PTR(-ENOENT);
+fail:
+ return dentry;
+}
+
+/* mkdir(). Called only while initialization. */
+static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
+ uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 1);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mkdir(base->d_inode, dentry, mode);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/* mknod(). Called only while initialization. */
+static int fs_mknod(const char *filename, struct dentry *base, int mode,
+ dev_t dev, uid_t user, gid_t group)
+{
+ struct dentry *dentry;
+ int error;
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ case S_IFBLK:
+ case S_IFIFO:
+ case S_IFSOCK:
+ case S_IFREG:
+ break;
+ default:
+ return -EPERM;
+ }
+ dentry = lookup_create2(filename, base, 0);
+ error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_mknod(base->d_inode, dentry, mode, dev);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/* symlink(). Called only while initialization. */
+static int fs_symlink(const char *pathname, struct dentry *base,
+ char *oldname, int mode, uid_t user, gid_t group)
+{
+ struct dentry *dentry = lookup_create2(pathname, base, 0);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
+ if (!error) {
+ lock_kernel();
+ dentry->d_inode->i_mode = mode;
+ dentry->d_inode->i_uid = user;
+ dentry->d_inode->i_gid = group;
+ unlock_kernel();
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ return error;
+}
+
+/*
+ * Format string.
+ * Leading and trailing whitespaces are removed.
+ * Multiple whitespaces are packed into single space.
+ */
+static void NormalizeLine(unsigned char *buffer)
+{
+ unsigned char *sp = buffer;
+ unsigned char *dp = buffer;
+ bool first = 1;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ while (*sp) {
+ if (!first)
+ *dp++ = ' ';
+ first = 0;
+ while (*sp > ' ' && *sp < 127)
+ *dp++ = *sp++;
+ while (*sp && (*sp <= ' ' || *sp >= 127))
+ sp++;
+ }
+ *dp = '\0';
+}
+
+/* Convert text form of filename into binary form. */
+static void UnEscape(char *filename)
+{
+ char *cp = filename;
+ char c, d, e;
+ if (!cp)
+ return;
+ while ((c = *filename++) != '\0') {
+ if (c != '\\') {
+ *cp++ = c;
+ continue;
+ }
+ if ((c = *filename++) == '\\') {
+ *cp++ = c;
+ continue;
+ }
+ if (c < '0' || c > '3')
+ break;
+ d = *filename++;
+ if (d < '0' || d > '7')
+ break;
+ e = *filename++;
+ if (e < '0' || e > '7')
+ break;
+ *(unsigned char *) cp++ = (unsigned char)
+ (((unsigned char) (c - '0') << 6) +
+ ((unsigned char) (d - '0') << 3) +
+ (unsigned char) (e - '0'));
+ }
+ *cp = '\0';
+}
+
+struct dev_entry {
+ struct list_head list;
+ /* Binary form of pathname under mount point. Never NULL. */
+ char *name;
+ /*
+ * Mode and permissions. setuid/setgid/sticky bits are not supported.
+ */
+ mode_t mode;
+ uid_t uid;
+ gid_t gid;
+ dev_t kdev;
+ /*
+ * Binary form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ char *symlink_data;
+ /* File access control flags. */
+ unsigned int flags;
+ /* Text form of pathname under mount point. Never NULL. */
+ const char *printable_name;
+ /*
+ * Text form of initial contents for the symlink.
+ * NULL if not symlink.
+ */
+ const char *printable_symlink_data;
+};
+
+struct syaoran_sb_info {
+ struct list_head list;
+ bool initialize_done; /* False if initialization is in progress. */
+ bool is_permissive_mode; /* True if permissive mode. */
+};
+
+static inline char *strdup(const char *data)
+{
+ return kstrdup(data, GFP_KERNEL);
+}
+
+static int RegisterNodeInfo(char *buffer, struct super_block *sb)
+{
+ enum {
+ ARG_FILENAME = 0,
+ ARG_PERMISSION = 1,
+ ARG_UID = 2,
+ ARG_GID = 3,
+ ARG_FLAGS = 4,
+ ARG_DEV_TYPE = 5,
+ ARG_SYMLINK_DATA = 6,
+ ARG_DEV_MAJOR = 6,
+ ARG_DEV_MINOR = 7,
+ MAX_ARG = 8
+ };
+ char *args[MAX_ARG];
+ int i;
+ int error = -EINVAL;
+ unsigned int perm, uid, gid, flags, major = 0, minor = 0;
+ struct syaoran_sb_info *info = (struct syaoran_sb_info *) sb->s_fs_info;
+ struct dev_entry *entry;
+ memset(args, 0, sizeof(args));
+ args[0] = buffer;
+ for (i = 1; i < MAX_ARG; i++) {
+ args[i] = strchr(args[i - 1] + 1, ' ');
+ if (!args[i])
+ break;
+ *args[i]++ = '\0';
+ }
+ /*
+ printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
+ args[0], args[1], args[2], args[3], args[4], args[5],
+ args[6], args[7]);
+ */
+ if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
+ !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
+ goto out;
+ if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
+ || sscanf(args[ARG_UID], "%u", &uid) != 1
+ || sscanf(args[ARG_GID], "%u", &gid) != 1
+ || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
+ || *(args[ARG_DEV_TYPE] + 1))
+ goto out;
+ switch (*args[ARG_DEV_TYPE]) {
+ case 'c':
+ perm |= S_IFCHR;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'b':
+ perm |= S_IFBLK;
+ if (!args[ARG_DEV_MAJOR]
+ || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
+ || !args[ARG_DEV_MINOR]
+ || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
+ goto out;
+ break;
+ case 'l':
+ perm |= S_IFLNK;
+ if (!args[ARG_SYMLINK_DATA])
+ goto out;
+ break;
+ case 'd':
+ perm |= S_IFDIR;
+ break;
+ case 's':
+ perm |= S_IFSOCK;
+ break;
+ case 'p':
+ perm |= S_IFIFO;
+ break;
+ case 'f':
+ perm |= S_IFREG;
+ break;
+ default:
+ goto out;
+ }
+ error = -ENOMEM;
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ goto out;
+ if (S_ISLNK(perm)) {
+ entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
+ if (!entry->printable_symlink_data)
+ goto out_freemem;
+ }
+ entry->printable_name = strdup(args[ARG_FILENAME]);
+ if (!entry->printable_name)
+ goto out_freemem;
+ if (S_ISLNK(perm)) {
+ entry->symlink_data = strdup(entry->printable_symlink_data);
+ if (!entry->symlink_data)
+ goto out_freemem;
+ UnEscape(entry->symlink_data);
+ }
+ entry->name = strdup(entry->printable_name);
+ if (!entry->name)
+ goto out_freemem;
+ UnEscape(entry->name);
+ /*
+ * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
+ * trailing '/'.
+ */
+ i = strlen(entry->name);
+ if (i && entry->name[i - 1] == '/')
+ entry->name[i - 1] = '\0';
+ entry->mode = perm;
+ entry->uid = uid;
+ entry->gid = gid;
+ entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
+ entry->flags = flags;
+ list_add_tail(&entry->list, &info->list);
+ /* printk("Entry added.\n"); */
+ error = 0;
+out:
+ return error;
+out_freemem:
+ kfree(entry->printable_symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->symlink_data);
+ kfree(entry);
+ goto out;
+}
+
+static void syaoran_put_super(struct super_block *sb)
+{
+ struct syaoran_sb_info *info;
+ struct dev_entry *entry;
+ struct dev_entry *tmp;
+ if (!sb)
+ return;
+ info = (struct syaoran_sb_info *) sb->s_fs_info;
+ if (!info)
+ return;
+ list_for_each_entry_safe(entry, tmp, &info->list, list) {
+ kfree(entry->name);
+ kfree(entry->symlink_data);
+ kfree(entry->printable_name);
+ kfree(entry->printable_symlink_data);
+ list_del(&entry->list);
+ /* printk("Entry removed.\n"); */
+ kfree(entry);
+ }
+ kfree(info);
+ sb->s_fs_info = NULL;
+ printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
+}
+
+static int ReadConfigFile(struct file *file, struct super_block *sb)
+{
+ char *buffer;
+ int error = -ENOMEM;
+ if (!file)
+ return -EINVAL;
+ buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (buffer) {
+ int len;
+ char *cp;
+ unsigned long offset = 0;
+ while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
+ && (cp = memchr(buffer, '\n', len)) != NULL) {
+ *cp = '\0';
+ offset += cp - buffer + 1;
+ NormalizeLine(buffer);
+ if (RegisterNodeInfo(buffer, sb) == -ENOMEM)
+ goto out;
+ }
+ error = 0;
+ }
+out:
+ kfree(buffer);
+ return error;
+}
+
+static void MakeNode(struct dev_entry *entry, struct dentry *root)
+{
+ struct dentry *base = dget(root);
+ char *filename = entry->name;
+ char *name = filename;
+ unsigned int c;
+ const mode_t perm = entry->mode;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ goto start;
+ while ((c = *(unsigned char *) filename) != '\0') {
+ if (c == '/') {
+ struct dentry *new_base;
+ const int len = filename - name;
+ *filename = '\0';
+ mutex_lock(&base->d_inode->i_mutex);
+ new_base = lookup_one_len(name, base, len);
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ *filename = '/';
+ filename++;
+ if (IS_ERR(new_base))
+ return;
+ if (!new_base->d_inode ||
+ !S_ISDIR(new_base->d_inode->i_mode)) {
+ dput(new_base);
+ return;
+ }
+ base = new_base;
+start:
+ name = filename;
+ } else {
+ filename++;
+ }
+ }
+ filename = (char *) name;
+ if (S_ISLNK(perm)) {
+ fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
+ } else if (S_ISDIR(perm)) {
+ fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
+ } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
+ fs_mknod(filename, base, perm, 0, uid, gid);
+ } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
+ fs_mknod(filename, base, perm, entry->kdev, uid, gid);
+ }
+ dput(base);
+}
+
+/* Create files according to the policy file. */
+static void MakeInitialNodes(struct super_block *sb)
+{
+ struct syaoran_sb_info *info;
+ struct dev_entry *entry;
+ if (!sb)
+ return;
+ info = (struct syaoran_sb_info *) sb->s_fs_info;
+ if (!info)
+ return;
+ if (info->is_permissive_mode) {
+ syaoran_create_tracelog(sb, ".syaoran");
+ syaoran_create_tracelog(sb, ".syaoran_all");
+ }
+ list_for_each_entry(entry, &info->list, list) {
+ if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
+ MakeNode(entry, sb->s_root);
+ }
+ info->initialize_done = 1;
+}
+
+/* Read policy file. */
+static int Syaoran_Initialize(struct super_block *sb, void *data)
+{
+ int error = -EINVAL;
+ static bool first = 1;
+ if (first) {
+ first = 0;
+ printk(KERN_INFO "SYAORAN: 1.5.3-pre 2007/12/16\n");
+ }
+ {
+ struct inode *inode = new_inode(sb);
+ if (!inode)
+ return -EINVAL;
+ /* Create /dev/ram0 to get the value of blkdev_open(). */
+ init_special_inode(inode, S_IFBLK | 0666, MKDEV(1, 0));
+ wrapped_def_blk_fops = *inode->i_fop;
+ iput(inode);
+ org_blkdev_open = wrapped_def_blk_fops.open;
+ wrapped_def_blk_fops.open = wrapped_blkdev_open;
+ }
+ {
+ struct inode *inode = new_inode(sb);
+ if (!inode)
+ return -EINVAL;
+ /* Create /dev/null to get the value of chrdev_open(). */
+ init_special_inode(inode, S_IFCHR | 0666, MKDEV(1, 3));
+ wrapped_def_chr_fops = *inode->i_fop;
+ iput(inode);
+ org_chrdev_open = wrapped_def_chr_fops.open;
+ wrapped_def_chr_fops.open = wrapped_chrdev_open;
+ }
+ if (data) {
+ struct file *f;
+ char *filename = (char *) data;
+ bool is_permissive_mode = 0;
+ if (strncmp(filename, "accept=", 7) == 0) {
+ filename += 7;
+ is_permissive_mode = 1;
+ } else if (strncmp(filename, "enforce=", 8) == 0) {
+ filename += 8;
+ is_permissive_mode = 0;
+ } else {
+ printk(KERN_INFO
+ "SYAORAN: Missing 'accept=' or 'enforce='.\n");
+ return -EINVAL;
+ }
+ f = filp_open(filename, O_RDONLY, 0600);
+ if (!IS_ERR(f)) {
+ struct syaoran_sb_info *p;
+ if (!S_ISREG(f->f_dentry->d_inode->i_mode))
+ goto out;
+ p = kzalloc(sizeof(*p), GFP_KERNEL);
+ if (!p)
+ goto out;
+ p->is_permissive_mode = is_permissive_mode;
+ sb->s_fs_info = p;
+ INIT_LIST_HEAD(&((struct syaoran_sb_info *)
+ sb->s_fs_info)->list);
+ printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
+ error = ReadConfigFile(f, sb);
+out:
+ if (error)
+ printk(KERN_INFO "SYAORAN: Can't read '%s'\n",
+ filename);
+ filp_close(f, NULL);
+ } else {
+ printk(KERN_INFO "SYAORAN: Can't open '%s'\n",
+ filename);
+ }
+ } else {
+ printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
+ }
+ return error;
+}
+
+/* Get absolute pathname from mount point. */
+static int GetLocalAbsolutePath(struct dentry *dentry, char *buffer, int buflen)
+{
+ char *start = buffer;
+ char *end = buffer + buflen;
+ int namelen;
+
+ if (buflen < 256)
+ goto out;
+
+ *--end = '\0';
+ buflen--;
+ for (;;) {
+ struct dentry *parent;
+ if (IS_ROOT(dentry))
+ break;
+ parent = dentry->d_parent;
+ namelen = dentry->d_name.len;
+ buflen -= namelen + 1;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ *--end = '/';
+ dentry = parent;
+ }
+ if (*end == '/') {
+ buflen++;
+ end++;
+ }
+ namelen = dentry->d_name.len;
+ buflen -= namelen;
+ if (buflen < 0)
+ goto out;
+ end -= namelen;
+ memcpy(end, dentry->d_name.name, namelen);
+ memmove(start, end, strlen(end) + 1);
+ return 0;
+out:
+ return -ENOMEM;
+}
+
+/* Get absolute pathname of the given dentry from mount point. */
+static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
+ int newname_len)
+{
+ int error;
+ struct dentry *d_dentry;
+ if (!dentry || !newname || newname_len <= 0)
+ return -EINVAL;
+ d_dentry = dget(dentry);
+ /***** CRITICAL SECTION START *****/
+ spin_lock(&dcache_lock);
+ error = GetLocalAbsolutePath(d_dentry, newname, newname_len);
+ spin_unlock(&dcache_lock);
+ /***** CRITICAL SECTION END *****/
+ dput(d_dentry);
+ return error;
+}
+
+static int CheckFlags(struct syaoran_sb_info *info, struct dentry *dentry,
+ int mode, int dev, unsigned int flags)
+{
+ int error = -EPERM;
+ /*
+ * I use static buffer, for local_realpath_from_dentry() needs
+ * dcache_lock.
+ */
+ static char filename[PAGE_SIZE];
+ static DEFINE_SPINLOCK(lock);
+ spin_lock(&lock);
+ memset(filename, 0, sizeof(filename));
+ if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1)
+ == 0) {
+ struct dev_entry *entry;
+ list_for_each_entry(entry, &info->list, list) {
+ if ((mode & S_IFMT) != (entry->mode & S_IFMT))
+ continue;
+ if ((S_ISBLK(mode) || S_ISCHR(mode)) &&
+ dev != entry->kdev)
+ continue;
+ if (strcmp(entry->name, filename + 1))
+ continue;
+ if (info->is_permissive_mode) {
+ entry->flags |= flags;
+ error = 0;
+ } else {
+ if ((entry->flags & flags) == flags)
+ error = 0;
+ }
+ break;
+ }
+ }
+ if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {
+ const char *name;
+ const uid_t uid = current->fsuid;
+ const gid_t gid = current->fsgid;
+ const mode_t perm = mode & 0777;
+ flags &= ~DEVICE_USED;
+ {
+ char *end = filename + sizeof(filename) - 1;
+ const char *cp = strchr(filename, '\0') - 1;
+ while (cp > filename) {
+ const unsigned char c = *cp--;
+ if (c == '\\') {
+ *--end = '\\';
+ *--end = '\\';
+ } else if (c > ' ' && c < 127) {
+ *--end = c;
+ } else {
+ *--end = (c & 7) + '0';
+ *--end = ((c >> 3) & 7) + '0';
+ *--end = (c >> 6) + '0';
+ *--end = '\\';
+ }
+ }
+ name = end;
+ }
+ switch (mode & S_IFMT) {
+ case S_IFCHR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFBLK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(dev), MINOR(dev));
+ break;
+ case S_IFIFO:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l', "unknown");
+ break;
+ case S_IFREG:
+ printk(KERN_DEBUG
+ "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ }
+ spin_unlock(&lock);
+ return error;
+}
+
+/* Check whether the given dentry is allowed to mknod. */
+static int MayCreateNode(struct dentry *dentry, int mode, int dev)
+{
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (!info->initialize_done)
+ return 0;
+ return CheckFlags(info, dentry, mode, dev, MAY_CREATE);
+}
+
+/* Check whether the given dentry is allowed to chmod/chown/unlink. */
+static int MayModifyNode(struct dentry *dentry, unsigned int flags)
+{
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
+ if (!info) {
+ printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
+ __FUNCTION__);
+ return -EPERM;
+ }
+ if (flags == DEVICE_USED && !info->is_permissive_mode)
+ return 0;
+ if (!dentry->d_inode)
+ return -ENOENT;
+ return CheckFlags(info, dentry, dentry->d_inode->i_mode,
+ dentry->d_inode->i_rdev, flags);
+}
+
+/*
+ * The following structure and codes are used for transferring data
+ * to interfaces files.
+ */
+
+struct syaoran_read_struct {
+ char *buf; /* Buffer for reading. */
+ int avail; /* Bytes available for reading. */
+ struct super_block *sb; /* The super_block of this partition. */
+ struct dev_entry *entry; /* The entry currently reading from. */
+ _Bool read_all; /* Dump all entries? */
+ struct list_head *pos; /* Current position. */
+};
+
+static void ReadTable(struct syaoran_read_struct *head, char *buf, int count)
+{
+ struct super_block *sb = head->sb;
+ struct syaoran_sb_info *info =
+ (struct syaoran_sb_info *) sb->s_fs_info;
+ struct list_head *pos;
+ const _Bool read_all = head->read_all;
+ if (!info)
+ return;
+ if (!head->pos)
+ return;
+ list_for_each_cookie(pos, head->pos, &info->list) {
+ struct dev_entry *entry =
+ list_entry(pos, struct dev_entry, list);
+ const unsigned int flags =
+ read_all ? entry->flags : entry->flags & ~DEVICE_USED;
+ const char *name = entry->printable_name;
+ const uid_t uid = entry->uid;
+ const gid_t gid = entry->gid;
+ const mode_t perm = entry->mode & 0777;
+ int len = 0;
+ switch (entry->mode & S_IFMT) {
+ case S_IFCHR:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'c',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFBLK:
+ if (!head->read_all && !(entry->flags & DEVICE_USED))
+ break;
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %3u %3u\n",
+ name, perm, uid, gid, flags, 'b',
+ MAJOR(entry->kdev), MINOR(entry->kdev));
+ break;
+ case S_IFIFO:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'p');
+ break;
+ case S_IFSOCK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 's');
+ break;
+ case S_IFDIR:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'd');
+ break;
+ case S_IFLNK:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c %s\n",
+ name, perm, uid, gid, flags, 'l',
+ entry->printable_symlink_data);
+ break;
+ case S_IFREG:
+ len = snprintf(buf, count,
+ "%-20s %3o %3u %3u %2u %c\n",
+ name, perm, uid, gid, flags, 'f');
+ break;
+ }
+ if (len < 0 || count <= len)
+ break;
+ count -= len;
+ buf += len;
+ head->avail += len;
+ }
+}
+
+static int syaoran_trace_open(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head =
+ kzalloc(sizeof(*head), GFP_KERNEL);
+ if (!head)
+ return -ENOMEM;
+ head->sb = inode->i_sb;
+ head->read_all =
+ (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
+ head->pos = &((struct syaoran_sb_info *) head->sb->s_fs_info)->list;
+ head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
+ if (!head->buf) {
+ kfree(head);
+ return -ENOMEM;
+ }
+ file->private_data = head;
+ return 0;
+}
+
+static int syaoran_trace_release(struct inode *inode, struct file *file)
+{
+ struct syaoran_read_struct *head = file->private_data;
+ kfree(head->buf);
+ kfree(head);
+ file->private_data = NULL;
+ return 0;
+}
+
+static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct syaoran_read_struct *head =
+ (struct syaoran_read_struct *) file->private_data;
+ int len = head->avail;
+ char *cp = head->buf;
+ if (!access_ok(VERIFY_WRITE, buf, count))
+ return -EFAULT;
+ ReadTable(head, cp + len, PAGE_SIZE * 2 - len);
+ len = head->avail;
+ if (len > count)
+ len = count;
+ if (len > 0) {
+ if (copy_to_user(buf, cp, len))
+ return -EFAULT;
+ head->avail -= len;
+ memmove(cp, cp + len, head->avail);
+ }
+ return len;
+}
+
+static struct file_operations syaoran_trace_operations = {
+ .open = syaoran_trace_open,
+ .release = syaoran_trace_release,
+ .read = syaoran_trace_read,
+};
+
+/* Create interface files for reading status. */
+static int syaoran_create_tracelog(struct super_block *sb,
+ const char *filename)
+{
+ struct dentry *base = dget(sb->s_root);
+ struct dentry *dentry = lookup_create2(filename, base, 0);
+ int error = PTR_ERR(dentry);
+ if (!IS_ERR(dentry)) {
+ struct inode *inode = new_inode(sb);
+ if (inode) {
+ struct timespec now = CURRENT_TIME;
+ inode->i_mode = S_IFREG | 0400;
+ inode->i_uid = 0;
+ inode->i_gid = 0;
+ inode->i_blocks = 0;
+ inode->i_mapping->a_ops = &syaoran_aops;
+ inode->i_mapping->backing_dev_info =
+ &syaoran_backing_dev_info;
+ inode->i_op = &syaoran_file_inode_operations;
+ inode->i_atime = now;
+ inode->i_mtime = now;
+ inode->i_ctime = now;
+ inode->i_fop = &syaoran_trace_operations;
+ d_instantiate(dentry, inode);
+ dget(dentry); /* Extra count - pin the dentry in core */
+ error = 0;
+ }
+ dput(dentry);
+ }
+ mutex_unlock(&base->d_inode->i_mutex);
+ dput(base);
+ return error;
+}
+
+/***** SYAORAN end. *****/
+#endif

2007-12-16 11:04:05

[permalink] [raw]

Subject: [patch 2/2] [RFC] Simple tamper-proof device filesystem.

Signed-off-by: Tetsuo Handa <[email protected]>
---
fs/Kconfig | 21 +++++++++++++++++++++
fs/Makefile | 1 +
2 files changed, 22 insertions(+)

--- linux-2.6.24-rc5.orig/fs/Kconfig
+++ linux-2.6.24-rc5/fs/Kconfig
@@ -1555,6 +1555,27 @@ config UFS_DEBUG
Y here. This will result in _many_ additional debugging messages to be
written to the system log.

+config SYAORAN_FS
+ tristate "SYAORAN (Tamper-Proof Device Filesystem) support"
+ help
+ Say Y or M here to support the Tamper-Proof Device Filesystem.
+
+ SYAORAN stands for
+ "Simple Yet All-important Object Realizing Abiding Nexus".
+ SYAORAN is a filesystem for /dev with Mandatory Access Control.
+
+ The system can't work if /dev is read-only.
+ Therefore you need to mount a writable filesystem (such as tmpfs)
+ for /dev if root fs is read-only.
+
+ But the writable /dev means that files on /dev might be tampered.
+ For example, if /dev/null is deleted and re-created as a symbolic
+ link to /dev/hda by an attacker, the contents of the IDE HDD
+ will be destroyed at a blow.
+
+ SYAORAN can ensure /dev/null is a character device file
+ with major=1 minor=3.
+
endmenu

menuconfig NETWORK_FILESYSTEMS
--- linux-2.6.24-rc5.orig/fs/Makefile
+++ linux-2.6.24-rc5/fs/Makefile
@@ -118,3 +118,4 @@ obj-$(CONFIG_HPPFS) += hppfs/
obj-$(CONFIG_DEBUG_FS) += debugfs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_GFS2_FS) += gfs2/
+obj-$(CONFIG_SYAORAN_FS) += syaoran/syaoran.o

2007-12-16 11:21:47

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Tetsuo Handa wrote:
> /dev needs to be writable, but this means that files on /dev might be
> tampered with.

I infer that you mean /dev needs to be writable by anyone, not by just
its owner or owner and group (conventionally root/root.) This goes
against conventional wisdom, which is that /dev must be writable only by
the administrator. Why do you say otherwise?

2007-12-16 11:26:56

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

David Newall wrote:
> Tetsuo Handa wrote:
> > /dev needs to be writable, but this means that files on /dev might be
> > tampered with.
>
> I infer that you mean /dev needs to be writable by anyone, not by just
> its owner or owner and group (conventionally root/root.) This goes
> against conventional wisdom, which is that /dev must be writable only by
> the administrator. Why do you say otherwise?
I didn't mean that "/dev is writable by everybody".
I meant that "/dev must be mounted for read-write mode"
(even if one wants to mount / for read-only mode).

Regards.

2007-12-16 11:31:20

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Tetsuo Handa wrote:
> David Newall wrote:
>
>> Tetsuo Handa wrote:
>>
>>> /dev needs to be writable, but this means that files on /dev might be
>>> tampered with.
>>>
>> I infer that you mean /dev needs to be writable by anyone, not by just
>> its owner or owner and group (conventionally root/root.) This goes
>> against conventional wisdom, which is that /dev must be writable only by
>> the administrator. Why do you say otherwise?
>>
> I didn't mean that "/dev is writable by everybody".
>

Glad to hear it! :)

> I meant that "/dev must be mounted for read-write mode"
>

Again, why?

2007-12-16 11:36:27

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

> > I meant that "/dev must be mounted for read-write mode"
>
> Again, why?

You can mount / partition for read-only mode if you wish to do so.
But you cannot make /dev directory for read-only.
You won't be able to login to the system because /sbin/mingetty
fails to "chown/chmod" /dev/tty* if /dev is mounted for read-only mode.

2007-12-16 11:58:40

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Tetsuo Handa wrote:
>>> I meant that "/dev must be mounted for read-write mode"
>>>
>> Again, why?
>>
>
> You won't be able to login to the system because /sbin/mingetty
> fails to "chown/chmod" /dev/tty* if /dev is mounted for read-only mode.
>

Good point. So, if only root can modify files in /dev, what's the
problem you're fixing? (I'm sure you tried to explain this in your
original post, but your reasons weren't clear to me.)

2007-12-16 12:03:46

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

David Newall wrote:
> > You won't be able to login to the system because /sbin/mingetty
> > fails to "chown/chmod" /dev/tty* if /dev is mounted for read-only mode.
>
> Good point. So, if only root can modify files in /dev, what's the
> problem you're fixing? (I'm sure you tried to explain this in your
> original post, but your reasons weren't clear to me.)

In 2003, I was trying to make / partition read-only to avoid tampering system files.
Use of policy based mandatory access control (such as SELinux) is
one of ways to avoid tampering, but management of policy was a daunting task.
So, I tried to store / partition in a read-only medium so that
the system is free from tampering system files.

When I attended at Security Stadium 2003 as a defense side,
I was using devfs for /dev directory. The files in /dev directory
were deleted by attckers and the administrator was unable to login.
So I developed this filesystem so that attackers who got root privilege
can't tamper files in /dev directory.
Not many systems mount / partition for read-only mode,
thus there may be few needs for read-only / partition.

But use of this filesystem is still valid when this filesystem is used with
policy based mandatory access control (such as SELinux, TOMOYO Linux)
because this filesystem guarantees where policy based mandatory access control
can't guarantee (i.e. filename and its attribute).

2007-12-16 12:15:12

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

> But use of this filesystem is still valid when this filesystem is used with
> policy based mandatory access control (such as SELinux, TOMOYO Linux)
> because this filesystem guarantees where policy based mandatory access control
> can't guarantee (i.e. filename and its attribute).
>
Policy based mandatory access control guarantees that
"Only Bob can create block device file named sda1 in /dev directory".
But it can't guarantee that /dev/sda1 will have block-8-1 attribute.
If Bob is malicious and creates /dev/sda1 with block-8-2 attribute,
other applications that depends on the attributes of /dev/sda1 goes wrong.
So, this filesystem guarantees that /dev/sda1 has block-8-1 attribute.

2007-12-16 17:32:01

by Indan Zupancic

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hi,

On Sun, December 16, 2007 13:03, Tetsuo Handa wrote:
> Hello.
>
> David Newall wrote:
>> > You won't be able to login to the system because /sbin/mingetty
>> > fails to "chown/chmod" /dev/tty* if /dev is mounted for read-only mode.
>>
>> Good point. So, if only root can modify files in /dev, what's the
>> problem you're fixing? (I'm sure you tried to explain this in your
>> original post, but your reasons weren't clear to me.)
>
> In 2003, I was trying to make / partition read-only to avoid tampering system
> files.
> Use of policy based mandatory access control (such as SELinux) is
> one of ways to avoid tampering, but management of policy was a daunting task.
> So, I tried to store / partition in a read-only medium so that
> the system is free from tampering system files.
>
> When I attended at Security Stadium 2003 as a defense side,
> I was using devfs for /dev directory. The files in /dev directory
> were deleted by attckers and the administrator was unable to login.
> So I developed this filesystem so that attackers who got root privilege
> can't tamper files in /dev directory.

What prevents them from mounting tmpfs on top of /dev, bypassing your fs?

Also, if they have root there are plenty of ways to prevent an administrator
from logging in, e.g. using iptables or changing the password.

Greetings,

Indan

2007-12-16 19:48:55

by Al Viro

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

On Sun, Dec 16, 2007 at 05:52:08PM +0100, Indan Zupancic wrote:

> What prevents them from mounting tmpfs on top of /dev, bypassing your fs?

Or binding /dev/null over nodes they want to get rid of...

> Also, if they have root there are plenty of ways to prevent an administrator
> from logging in, e.g. using iptables or changing the password.

Indeed.

BTW, tmpfs with root marked append-only and populated in normal ways on boot
would get a comparable effect without spending so much efforts. Still won't
really help if attacker gains root, but then neither will your variant.

2007-12-17 00:40:36

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Indan Zupancic wrote:
> What prevents them from mounting tmpfs on top of /dev, bypassing your fs?
Mandatory access control (MAC) prevents them from mounting tmpfs on top of /dev .
MAC mediates namespace manipulation requests such as mount()/umount().

> Also, if they have root there are plenty of ways to prevent an administrator
> from logging in, e.g. using iptables or changing the password.
MAC mediates execution of /sbin/iptables or /usr/bin/passwd .

So, use of this filesystem alone is meaningless because
attackers with root privileges can do what you are saying.
But use of this filesystem with MAC is still valid because
MAC can prevent attackers with root privileges from doing what you are saying.

Regards.

2007-12-17 02:12:56

by David Wagner

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Tetsuo Handa writes:
>When I attended at Security Stadium 2003 as a defense side,
>I was using devfs for /dev directory. The files in /dev directory
>were deleted by attckers and the administrator was unable to login.

If the attacker gets full administrator-level access on your machine,
there are a gazillion ways the attacker can prevent other admins from
logging on. This patch can't prevent that. It sounds like this patch
is trying to solve a fundamentally unsolveable problem.

A useful slogan: "Don't forbid what you cannot prevent."

2007-12-17 06:00:50

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Tetsuo Handa wrote:
> If Bob is malicious and creates /dev/sda1 with block-8-2 attribute [...]

Bob can't do that. Only root can.

2007-12-17 06:42:59

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

David Wagner wrote:
> If the attacker gets full administrator-level access on your machine,
> there are a gazillion ways the attacker can prevent other admins from
> logging on. This patch can't prevent that. It sounds like this patch
> is trying to solve a fundamentally unsolveable problem.

Please be aware that I'm saying "if this filesystem is used with MAC".

Without MAC, an attacker who got root privilege can do whatever he/she want to do.
But with MAC, an attacker who got root privilege can't do whatever he/she want to do.
Only actions permitted by MAC's policy are permitted for the attacker who got root privilege.

I'm not saying that
"this filesystem can prevent attackers from mounting other filesystem over this filesystem",
nor "this filesystem can prevent attackers from executing /sbin/iptables or /usr/bin/passwd".
They are MAC's business.
What this filesystem can do is "guarantee filename and its attribute".

If MAC(such as SELinux, TOMOYO Linux) allows attackers to
"mount other filesystem over this filesystem", this filesystem is no longer tamper-proof.
But as long as MAC prevents attackers from mounting other filesystem over this filesystem,
this filesystem can remain tamper-proof.

Regards.

2007-12-17 08:38:49

by David Wagner

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

David Wagner wrote:
> If the attacker gets full administrator-level access on your machine,
> there are a gazillion ways the attacker can prevent other admins from
> logging on. This patch can't prevent that. It sounds like this patch
> is trying to solve a fundamentally unsolveable problem.

Tetsuo Handa wrote:
> Please be aware that I'm saying "if this filesystem is used with MAC".

I'm aware. I'm sticking with my argument.

I doubt that any we're likely to see a MAC system that is strict enough
to prevent an attacker with administrator access from locking out other
admins, and is yet is loose enough to be useful in practice. I think
the proposed patch is like sticking a thumb in the dike and is trying
to solve a problem that cannot be solved with any reasonable application
of effort. I think if the attacker has gotten administrator level, then
we'll never be able to prevent the attacker from doing all sorts of bad
things we don't like, like locking out other admins. Of course if we
have a proposed defense that only stops one particular attack pathway
but leaves dozens others open, it's always convenient to say that "the
other attack pathways aren't my problem, that's the MAC's business".
Sure, if we want to hypothesize the existence of a "magic fairy dust"
MAC system that somehow closes every other path via which admin-level
attackers could lock out other admins, except for this one pathway, then
this patch might make sense. But I see no reason to expect ordinary
MAC systems to have that property.

Trying to put in place a defense that only prevents on particular attack
path, when there are a thousand other ways an attacker might achieve the
same ends, does not seem like a good way to go about securing your system.
For every one attack path that you shut down, the attacker can probably
think up a dozen new paths that you haven't shut down yet. That isn't
a good basis for security.

Personally, I'd argue that we should learn a different lesson from
the attack you experienced. The lesson is not "oh boy, we better shut
down this particular way that the attacker misused administrator-level
access". I think a better lesson is "let's think about ways to reduce
the likelihood that attackers will get administrator-level access,
because once the attacker has administrator-level access, the attacker
can do a lot of harm".

>If MAC(such as SELinux, TOMOYO Linux) allows attackers to
>"mount other filesystem over this filesystem", this filesystem is no
>longer tamper-proof.
>But as long as MAC prevents attackers from mounting other filesystem
>over this filesystem,
>this filesystem can remain tamper-proof.

But the point is that it's not enough just to prevent attackers
from mounting other filesystems over this filesystem. I can think
of all sorts of ways that an admin-level attacker might be able to
prevent other administrators from logging in. If your defense strategy
involves trying to enumerate all of those possible ways and then shut
them down one by one, you're relying upon a defense strategy known as
"blacklisting". Blacklisting has a terrible track record in the
security field, because it's too easy to overlook one pathway.

2007-12-17 11:45:16

by Indan Zupancic

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hi,

On Mon, December 17, 2007 01:40, Tetsuo Handa wrote:
> Hello.
>
> Indan Zupancic wrote:
>> What prevents them from mounting tmpfs on top of /dev, bypassing your fs?
> Mandatory access control (MAC) prevents them from mounting tmpfs on top of
> /dev .
> MAC mediates namespace manipulation requests such as mount()/umount().
>
>> Also, if they have root there are plenty of ways to prevent an administrator
>> from logging in, e.g. using iptables or changing the password.
> MAC mediates execution of /sbin/iptables or /usr/bin/passwd .
>
> So, use of this filesystem alone is meaningless because
> attackers with root privileges can do what you are saying.
> But use of this filesystem with MAC is still valid because
> MAC can prevent attackers with root privileges from doing what you are saying.

If MAC can avoid all that, then why can't it also avoid tampering with /dev?
What security does your filesystem add at all, if it's useless without a MAC
doing
all the hard work?

I think you can better spend your time on read-only bind mounts.

Greetings,

Indan

2007-12-17 12:59:45

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Indan Zupancic wrote:
> If MAC can avoid all that, then why can't it also avoid tampering with /dev?

If MAC implementation handles filename and its attributes pair, this filesystem is not needed.
But I don't know MAC implementations that handle this pair.

SELinux's granularity is "allow foo_t to create block device file in dev_t directory".
TOMOYO's granularity is "allow foo to create block device file named /dev/sda1".
Both don't enforce filename and its attributes pair,
thus the attacker with root privilege can create fake device files
if he/she is permitted to create device files by MAC's policy.

It would be possible to handle this pair within MAC's policy
by expanding their policy syntaxes,
but offloading this handling on filesystem can make MAC's policy syntax simple
because filename and its attributes pairs are conventionally constant.
You won't let foo_t to create /dev/sda1 with block-8-1 attributes
and let bar_t to create /dev/sda1 with block-8-2 attributes, will you?
You don't want to describe attribute information to every entry in MAC's policy, do you?
It is redundant to describe this attribute enforcement information in MAC's policy
unless you want to break conventional filename and its attributes pairs.

> What security does your filesystem add at all, if it's useless without a MAC
> doing all the hard work?
Allow / partition to be mounted for read-only mode.
Allow /dev partition to be enforced filename and its attributes
to avoid /dev/null spoofing (create /dev/null as a regular file for eavesdropping purpose).

This filesystem adds filename and its attributes enforcement,
but it is overridable if this filesystem is used without MAC.
This filesystem adds unoverridable filename and its attributes enforcement
if this filesystem is used with MAC.

Regards.

2007-12-17 13:08:15

by Al Boldi

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Indan Zupancic wrote:
> On Mon, December 17, 2007 01:40, Tetsuo Handa wrote:
> > So, use of this filesystem alone is meaningless because
> > attackers with root privileges can do what you are saying.
> > But use of this filesystem with MAC is still valid because
> > MAC can prevent attackers with root privileges from doing what you are
> > saying.
>
> If MAC can avoid all that, then why can't it also avoid tampering with
> /dev? What security does your filesystem add at all, if it's useless
> without a MAC doing all the hard work?

I think the answer is obvious: Tetsuo wants to add functionality that the
MACs are missing. So, instead of adding this functionality per MAC, he
proposes to add it as ground work, to be combined with any MAC.

> I think you can better spend your time on read-only bind mounts.

That would be too coarse.

Thanks!

--
Al

2007-12-17 13:16:56

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Al Boldi wrote:
> I think the answer is obvious: Tetsuo wants to add functionality that the
> MACs are missing. So, instead of adding this functionality per MAC, he
> proposes to add it as ground work, to be combined with any MAC.
Yes, that's right.

This filesystem is designed to be used with TOMOYO Linux,
but this filesystem can be used with other MAC implementations too.

Thank you.

2007-12-17 13:32:21

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

( This is a reply to http://lkml.org/lkml/2007/12/17/27 .)

Hello.

David Wagner wrote:
> But the point is that it's not enough just to prevent attackers
> from mounting other filesystems over this filesystem. I can think
> of all sorts of ways that an admin-level attacker might be able to
> prevent other administrators from logging in. If your defense strategy
> involves trying to enumerate all of those possible ways and then shut
> them down one by one, you're relying upon a defense strategy known as
> "blacklisting". Blacklisting has a terrible track record in the
> security field, because it's too easy to overlook one pathway.
Of course, I assume whitelisting.
SELinux and TOMOYO Linux and many other MAC implementations uses
whitelisting approach, and this filesystem is whiltelisting approach.

This filesystem handles what MAC implementations don't handle.
In other words, it is a remaining hole.

I'm proposing:

Don't you think it is dangerous to assume files in /dev directory
have appropriate filename and attributes binding?
MAC can restrict processes who can create files in /dev directory,
but MAC doesn't enforce filename and attributes binding.
So, how about enforcing filename and attributes binding in filesystem layer?

Regards.

To David Wagner:
Could you please Cc: me so that I can reply to your message?
I can't reply to your message since I'm reading this ml in daily digest mode.

2007-12-17 20:11:30

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Tetsuo Handa ([email protected]):
> A brief description about SYAORAN:
>
> SYAORAN stands for "Simple Yet All-important Object Realizing Abiding
> Nexus". SYAORAN is a filesystem for /dev with Mandatory Access Control.
>
> /dev needs to be writable, but this means that files on /dev might be
> tampered with. SYAORAN can restrict combinations of (pathname, attribute)
> that the system can create. The attribute is one of directory, regular
> file, FIFO, UNIX domain socket, symbolic link, character or block device
> file with major/minor device numbers.
>
> SYAORAN can ensure /dev/null is a character device file with major=1 minor=3.
>
> Policy specifications for this filesystem is at
> http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html
>
> Why not use FUSE?
>
> Because /dev has to be available through the lifetime of the kernel.
> It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.
>
> Why not use SELinux?
>
> Because SELinux doesn't guarantee filename and its attribute.
> The purpose of this filesystem is to ensure filename and its attribute
> (e.g. /dev/null is guaranteed to be a character device file
> with major=1 and minor=3).

We need something similar for system containers (like vservers). We
will likely want root in a container to be confined to a certain set
of devices.

For starters we expect to use the capability bounding sets (see
http://lkml.org/lkml/2007/11/26/206). So a container will have a static
/dev predefined, and CAP_MKNOD will be removed from its capability
bounding set so that root in a container cannot create any more new
devices.

For future more sophisticated device controls, two similar approaches
have been suggested (one by me, see
https://lists.linux-foundation.org/pipermail/containers/2007-September/007423.html
and
https://lists.linux-foundation.org/pipermail/containers/2007-November/008589.html
). Both actually control the devices a process can create period,
rather than trying to control at the filesystem. And yes, these both
lack the feature in your solution that for instance 'c 1 3' must be
called null, which appears to be the kind of guarantee apparmor likes to
provide.

To use your approach, i guess we would have to use selinux (or tomoyo)
to enforce that devices may only be created under /dev?

-serge

2007-12-18 00:16:16

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Serge E. Hallyn wrote:
> CAP_MKNOD will be removed from its capability
I think it is not enough because the root can rename/unlink device files
(mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2).

> To use your approach, i guess we would have to use selinux (or tomoyo)
> to enforce that devices may only be created under /dev?
Everyone can use this filesystem alone.
But use with MAC (or whatever access control mechanisms that prevent
attackers from unmounting/overlaying this filesystem) is recomennded.

2007-12-18 00:40:08

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Tetsuo Handa ([email protected]):
> Hello.
>
> Serge E. Hallyn wrote:
> > CAP_MKNOD will be removed from its capability
> I think it is not enough because the root can rename/unlink device files
> (mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2).

Sure but that doesn't bother us :)

The admin in the container has his own /dev directory and can do what he
likes with the devices he's allowed to have. He just shouldn't have
access to others. If he wants to rename /dev/sda1 to /dev/sda5 that's
his choice.

> > To use your approach, i guess we would have to use selinux (or tomoyo)
> > to enforce that devices may only be created under /dev?
> Everyone can use this filesystem alone.

Sure but it is worthless alone.

No?

What will keep the container admin from doing 'mknod /root/hda1 b 3 1'?

> But use with MAC (or whatever access control mechanisms that prevent
> attackers from unmounting/overlaying this filesystem) is recomennded.

-serge

2007-12-18 01:41:41

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

I hate to bring this again, but what if the admin in the container
mounts an external file system (eg. nfs, usb, loop mount from a file,
or via fuse), and that file system already has a device that we would
like to ban inside that container ?

Since anyway we will have to keep a white- (or black-) list of devices
that are permitted in a container, and that list may change even change
per container -- why not enforce the access control at the VFS layer ?
It's safer in the long run.

Oren.

Serge E. Hallyn wrote:
> Quoting Tetsuo Handa ([email protected]):
>> Hello.
>>
>> Serge E. Hallyn wrote:
>>> CAP_MKNOD will be removed from its capability
>> I think it is not enough because the root can rename/unlink device files
>> (mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2).
>
> Sure but that doesn't bother us :)
>
> The admin in the container has his own /dev directory and can do what he
> likes with the devices he's allowed to have. He just shouldn't have
> access to others. If he wants to rename /dev/sda1 to /dev/sda5 that's
> his choice.
>
>>> To use your approach, i guess we would have to use selinux (or tomoyo)
>>> to enforce that devices may only be created under /dev?
>> Everyone can use this filesystem alone.
>
> Sure but it is worthless alone.
>
> No?
>
> What will keep the container admin from doing 'mknod /root/hda1 b 3 1'?
>
>> But use with MAC (or whatever access control mechanisms that prevent
>> attackers from unmounting/overlaying this filesystem) is recomennded.
>
> -serge
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

2007-12-18 01:56:15

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Serge E. Hallyn ([email protected]):
> Quoting Tetsuo Handa ([email protected]):
> > Hello.
> >
> > Serge E. Hallyn wrote:
> > > CAP_MKNOD will be removed from its capability
> > I think it is not enough because the root can rename/unlink device files
> > (mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2).
>
> Sure but that doesn't bother us :)
>
> The admin in the container has his own /dev directory and can do what he
> likes with the devices he's allowed to have. He just shouldn't have
> access to others. If he wants to rename /dev/sda1 to /dev/sda5 that's
> his choice.
>
> > > To use your approach, i guess we would have to use selinux (or tomoyo)
> > > to enforce that devices may only be created under /dev?
> > Everyone can use this filesystem alone.
>
> Sure but it is worthless alone.
>
> No?

Oh, no, I'm sorry - I was thinking in terms of my requirements again.
But your requirements are to ensure that an application accessing a
device at a well-known location get what it expect.

So then the main quesiton is still the one I think Al had asked - what
keeps a rogue CAP_SYS_MOUNT process from doing
mount --bind /dev/hda1 /dev/null ?

thanks,
-serge

> What will keep the container admin from doing 'mknod /root/hda1 b 3 1'?
>
> > But use with MAC (or whatever access control mechanisms that prevent
> > attackers from unmounting/overlaying this filesystem) is recomennded.
>
> -serge

2007-12-18 02:09:44

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Oren Laadan ([email protected]):
>
> I hate to bring this again, but what if the admin in the container
> mounts an external file system (eg. nfs, usb, loop mount from a file,
> or via fuse), and that file system already has a device that we would
> like to ban inside that container ?

Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
then mnt->mnt_flags |= MNT_NODEV. So that's no problem.

But that's been pulled out of -mm! ? Crap.

> Since anyway we will have to keep a white- (or black-) list of devices
> that are permitted in a container, and that list may change even change
> per container -- why not enforce the access control at the VFS layer ?
> It's safer in the long run.

By that you mean more along the lines of Pavel's patch than my whitelist
LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
by 'vfs layer' :), or something different entirely?

thanks,
-serge

2007-12-18 02:26:47

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Serge E. Hallyn wrote:
> But your requirements are to ensure that an application accessing a
> device at a well-known location get what it expect.

Yes. That's the purpose of this filesystem.

> So then the main quesiton is still the one I think Al had asked - what
> keeps a rogue CAP_SYS_MOUNT process from doing
> mount --bind /dev/hda1 /dev/null ?

Excuse me, but I guess you meant "mount --bind /dev/ /root/" or something
because mount operation requires directories.
MAC can prevent a rogue CAP_SYS_MOUNT process from doing
"mount --bind /dev/ /root/".
For example, regarding TOMOYO Linux, you need to give
"allow_mount /dev/ /root/ --bind 0" permission
to permit "mount --bind /dev/ /root/" request.

Did you mean "ln -s /dev/hda1 /dev/null" or "ln /dev/hda1 /dev/null"?
No problem. MAC can prevent such requests too.

Regards.

2007-12-18 02:54:52

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Tetsuo Handa ([email protected]):
> Hello.
>
> Serge E. Hallyn wrote:
> > But your requirements are to ensure that an application accessing a
> > device at a well-known location get what it expect.
>
> Yes. That's the purpose of this filesystem.
>
>
> > So then the main quesiton is still the one I think Al had asked - what
> > keeps a rogue CAP_SYS_MOUNT process from doing
> > mount --bind /dev/hda1 /dev/null ?
>
> Excuse me, but I guess you meant "mount --bind /dev/ /root/" or something
> because mount operation requires directories.

Nope, try

touch /root/hda1
ls -l /root/hda1
mount --bind /dev/hda1 /root/hda1
ls -l /root/hda1

But I see tomoyo prevents that

> MAC can prevent a rogue CAP_SYS_MOUNT process from doing
> "mount --bind /dev/ /root/".
> For example, regarding TOMOYO Linux, you need to give
> "allow_mount /dev/ /root/ --bind 0" permission
> to permit "mount --bind /dev/ /root/" request.

Ok, that answers my question. Thanks.

(I won't go into "who gets to say allow_mount" :)

> Did you mean "ln -s /dev/hda1 /dev/null" or "ln /dev/hda1 /dev/null"?
> No problem. MAC can prevent such requests too.

Then it sounds like this filesystem is something Tomoyo can use.

thanks,
-serge

2007-12-18 03:01:55

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Serge E. Hallyn wrote:
> Quoting Oren Laadan ([email protected]):
>> I hate to bring this again, but what if the admin in the container
>> mounts an external file system (eg. nfs, usb, loop mount from a file,
>> or via fuse), and that file system already has a device that we would
>> like to ban inside that container ?
>
> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.

Yes, that works to disallow all device files from a mounted file system.

But it's a black and white thing: either they are all banned or allowed;
you can't have some devices allowed and others not, depending on type
A scenario where this may be useful is, for instance, if we some apps in
the container to execute withing a pre-made chroot (sub)tree within that
container.

>
> But that's been pulled out of -mm! ? Crap.
>
>> Since anyway we will have to keep a white- (or black-) list of devices
>> that are permitted in a container, and that list may change even change
>> per container -- why not enforce the access control at the VFS layer ?
>> It's safer in the long run.
>
> By that you mean more along the lines of Pavel's patch than my whitelist
> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
> by 'vfs layer' :), or something different entirely?

:)

By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
Either yours or Pavel's; I tend to prefer not to use LSM as it may
collide with future security modules.

Oren.

>
> thanks,
> -serge

2007-12-18 03:40:53

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Serge E. Hallyn wrote:
> Nope, try
>
> touch /root/hda1
> ls -l /root/hda1
> mount --bind /dev/hda1 /root/hda1
> ls -l /root/hda1

[root@sakura ~]# touch /root/hda1
[root@sakura ~]# ls -l /root/hda1
-rw-r--r-- 1 root root 0 Dec 18 12:04 /root/hda1
[root@sakura ~]# mount --bind /dev/hda1 /root/hda1
[root@sakura ~]# ls -l /root/hda1
brw-r----- 1 root disk 3, 1 Dec 18 2007 /root/hda1

Oh, surprising.
I didn't know mount() accepts non-directory for mount-point.
But I think this is not a mount operation
because I can't see the contents of /dev/hda1 through /root/hda1 .
Can I see the contents of /dev/hda1 through /root/hda1 ?

> Then it sounds like this filesystem is something Tomoyo can use.

I had / partition mounted for read-only so that the admin can't do
'mknod /root/hda1 b 3 1' in 2003, and I named it
"Security Advancement Know-how Upon Readonly Approach for Linux" or SAKURA Linux.
This filesystem (SYAORAN) is developed to make /dev writable and tamper-proof
when / partition is read-only or protected by MAC.
TOMOYO is a pathname-based MAC implementation, and
SAKURA and SYAORAN were merged into TOMOYO Linux. ;-)

Regards.

2007-12-18 15:23:00

by Radoslaw Szkodzinski

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

On Mon, 17 Dec 2007 16:05:31 +0300
Al Boldi <[email protected]> wrote:

> Indan Zupancic wrote:
> > On Mon, December 17, 2007 01:40, Tetsuo Handa wrote:
> > I think you can better spend your time on read-only bind mounts.
>
> That would be too coarse.
>

Actually, who needs to create device nodes? Just prohibit everyone from
creating them, except "installer" and "udev" personality.
This means removing CAP_MKNOD on a global scale.
(OTOH, both don't need CAP_SYS_ADMIN. Maybe udev needs
CAP_SYS_MODULE...)

Now, stopping people from faking hotplug events is totally another
story. Is that currently possible?

Attachments:

signature.asc (189.00 B)

2007-12-18 15:33:24

by Radoslaw Szkodzinski

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

On Mon, 17 Dec 2007 16:30:54 +1030
David Newall <[email protected]> wrote:

> Tetsuo Handa wrote:
> > If Bob is malicious and creates /dev/sda1 with block-8-2 attribute [...]
>
> Bob can't do that. Only root can.

Not even root can, if you remove him the capability. Only udev can.
(which possibly doesn't have to run as root, given correct capability
set?)

Of course root may be able to change the configuration of udev to
create device nodes of his liking if you allow that...

Attachments:

signature.asc (189.00 B)

2007-12-18 15:36:34

by Pavel Machek

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

> Why not use SELinux?
>
> Because SELinux doesn't guarantee filename and its attribute.
> The purpose of this filesystem is to ensure filename and its attribute
> (e.g. /dev/null is guaranteed to be a character device file
> with major=1 and minor=3).

Why not improve selinux to be able to assign label of new file based
on directory label and name?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-12-18 15:55:38

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

On Thu, 06 Dec 2007 15:29:07 GMT, Pavel Machek said:
>
> > Why not use SELinux?
> >
> > Because SELinux doesn't guarantee filename and its attribute.
> > The purpose of this filesystem is to ensure filename and its attribute
> > (e.g. /dev/null is guaranteed to be a character device file
> > with major=1 and minor=3).
>
> Why not improve selinux to be able to assign label of new file based
> on directory label and name?

The problem isn't the label, it's the *other* attributes...

What happens if /dev/null has the correct SELinux label, but the major/minor
is 1,27 rather than 1,3?

Attachments:

(No filename) (226.00 B)

2007-12-18 16:43:36

by Casey Schaufler

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

--- [email protected] wrote:

> On Thu, 06 Dec 2007 15:29:07 GMT, Pavel Machek said:
> >
> > > Why not use SELinux?
> > >
> > > Because SELinux doesn't guarantee filename and its attribute.
> > > The purpose of this filesystem is to ensure filename and its attribute
> > > (e.g. /dev/null is guaranteed to be a character device file
> > > with major=1 and minor=3).
> >
> > Why not improve selinux to be able to assign label of new file based
> > on directory label and name?
>
> The problem isn't the label, it's the *other* attributes...
>
> What happens if /dev/null has the correct SELinux label, but the major/minor
> is 1,27 rather than 1,3?

Isn't this the kind of thing that Bastille is good for?

Casey Schaufler
[email protected]

2007-12-19 09:45:27

by Pavel Emelyanov

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Oren Laadan wrote:
> Serge E. Hallyn wrote:
>> Quoting Oren Laadan ([email protected]):
>>> I hate to bring this again, but what if the admin in the container
>>> mounts an external file system (eg. nfs, usb, loop mount from a file,
>>> or via fuse), and that file system already has a device that we would
>>> like to ban inside that container ?
>> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
>> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
>
> Yes, that works to disallow all device files from a mounted file system.
>
> But it's a black and white thing: either they are all banned or allowed;
> you can't have some devices allowed and others not, depending on type
> A scenario where this may be useful is, for instance, if we some apps in
> the container to execute withing a pre-made chroot (sub)tree within that
> container.
>
>> But that's been pulled out of -mm! ? Crap.
>>
>>> Since anyway we will have to keep a white- (or black-) list of devices
>>> that are permitted in a container, and that list may change even change
>>> per container -- why not enforce the access control at the VFS layer ?
>>> It's safer in the long run.
>> By that you mean more along the lines of Pavel's patch than my whitelist
>> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
>> by 'vfs layer' :), or something different entirely?
>
> :)
>
> By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
> Either yours or Pavel's; I tend to prefer not to use LSM as it may
> collide with future security modules.

Oren, AFAIS you've seen my patches for device access controller, right?

Maybe we can revisit the issue then and try to come to agreement on what
kind of model and implementation we all want?

> Oren.
>
>> thanks,
>> -serge
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-12-19 12:11:26

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Hello.

Radoslaw Szkodzinski (AstralStorm) wrote:
> Actually, who needs to create device nodes? Just prohibit everyone from
> creating them, except "installer" and "udev" personality.
> This means removing CAP_MKNOD on a global scale.

What happens if the root tampers udev's configuration file?
The udev will create inappropriate (i.e. filename with unexpected attributes)
device nodes, won't it?

Also, creating device nodes is not the only threat.
The root can do
# mv /dev/sda1 /dev/tmp; mv /dev/sda2 /dev/sda1; mv /dev/tmp /dev/sda2
to rename/unlink device nodes.

After all, revoking CAP_MKNOD is not enough for guaranteeing
filename and its attributes.

This filesystem is designed to guarantee filename and its attributes,
but this filesystem has additional access control capability.
You can forbid mknod/unlink /dev/null if you want nobody to do so.
You can forbid chmod/chown /dev/null if you want nobody to do so.

Well... it is not fair to refer only udev's configuration file.
If configuration file of this filesystem is tampered,
this filesystem will create inappropriate device nodes.
So, some access control mechanism for protecting configuration files
is recommended for both udev and this filesystem.

Regards.

2007-12-19 14:11:08

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Pavel Emelyanov ([email protected]):
> Oren Laadan wrote:
> > Serge E. Hallyn wrote:
> >> Quoting Oren Laadan ([email protected]):
> >>> I hate to bring this again, but what if the admin in the container
> >>> mounts an external file system (eg. nfs, usb, loop mount from a file,
> >>> or via fuse), and that file system already has a device that we would
> >>> like to ban inside that container ?
> >> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
> >> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
> >
> > Yes, that works to disallow all device files from a mounted file system.
> >
> > But it's a black and white thing: either they are all banned or allowed;
> > you can't have some devices allowed and others not, depending on type
> > A scenario where this may be useful is, for instance, if we some apps in
> > the container to execute withing a pre-made chroot (sub)tree within that
> > container.
> >
> >> But that's been pulled out of -mm! ? Crap.
> >>
> >>> Since anyway we will have to keep a white- (or black-) list of devices
> >>> that are permitted in a container, and that list may change even change
> >>> per container -- why not enforce the access control at the VFS layer ?
> >>> It's safer in the long run.
> >> By that you mean more along the lines of Pavel's patch than my whitelist
> >> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
> >> by 'vfs layer' :), or something different entirely?
> >
> > :)
> >
> > By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
> > Either yours or Pavel's; I tend to prefer not to use LSM as it may
> > collide with future security modules.
>
> Oren, AFAIS you've seen my patches for device access controller, right?
>
> Maybe we can revisit the issue then and try to come to agreement on what
> kind of model and implementation we all want?

That would be great, Pavel. I do prefer your solution over my LSM, so
if we can get an elegant block device control right in the vfs code that
would be my preference.

The only thing that makes me keep wanting to go back to an LSM is the
fact that the code defining the whitelist seems out of place in the vfs.
But I guess that's actually separated into a modular cgroup, with the
actual enforcement built in at the vfs. So that's really the best
solution.

-serge

2007-12-19 14:13:45

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Oren Laadan ([email protected]):
>
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan ([email protected]):
> >> I hate to bring this again, but what if the admin in the container
> >> mounts an external file system (eg. nfs, usb, loop mount from a file,
> >> or via fuse), and that file system already has a device that we would
> >> like to ban inside that container ?
> >
> > Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
> > then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
>
> Yes, that works to disallow all device files from a mounted file system.
>
> But it's a black and white thing: either they are all banned or allowed;
> you can't have some devices allowed and others not, depending on type
> A scenario where this may be useful is, for instance, if we some apps in
> the container to execute withing a pre-made chroot (sub)tree within that
> container.

Yes, it's workable short-term, and we've always said that a more
complete solution would be worked on later, as people have time.

> > But that's been pulled out of -mm! ? Crap.
> >
> >> Since anyway we will have to keep a white- (or black-) list of devices
> >> that are permitted in a container, and that list may change even change
> >> per container -- why not enforce the access control at the VFS layer ?
> >> It's safer in the long run.
> >
> > By that you mean more along the lines of Pavel's patch than my whitelist
> > LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
> > by 'vfs layer' :), or something different entirely?
>
> :)
>
> By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
> Either yours or Pavel's; I tend to prefer not to use LSM as it may
> collide with future security modules.

Yeah I keep waffling. The LSM is so simple... but i do prefer Pavel's
patch. Let's keep pursuing that.

-serge

2007-12-19 19:15:08

by Radoslaw Szkodzinski

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

On Wed, 19 Dec 2007 21:11:11 +0900
Tetsuo Handa <[email protected]> wrote:

> Hello.
>
> Radoslaw Szkodzinski (AstralStorm) wrote:
> > Actually, who needs to create device nodes? Just prohibit everyone from
> > creating them, except "installer" and "udev" personality.
> > This means removing CAP_MKNOD on a global scale.
>
> What happens if the root tampers udev's configuration file?
> The udev will create inappropriate (i.e. filename with unexpected attributes)
> device nodes, won't it?

Yes. But root doesn't need access to these files, at least not usually.
Create a separate user for editing config files - much lower
probability of breakage. Remove almost all capabilities from root and
profit.

> After all, revoking CAP_MKNOD is not enough for guaranteeing
> filename and its attributes.
>
> This filesystem is designed to guarantee filename and its attributes,
> but this filesystem has additional access control capability.
> You can forbid mknod/unlink /dev/null if you want nobody to do so.
> You can forbid chmod/chown /dev/null if you want nobody to do so.

You can forbid all operations on /dev (except udev) with an ACL.
So, what is the need for this filesystem?

Attachments:

signature.asc (189.00 B)

2007-12-19 23:44:47

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Tetsuo Handa ([email protected]):
> A brief description about SYAORAN:
>
> SYAORAN stands for "Simple Yet All-important Object Realizing Abiding
> Nexus". SYAORAN is a filesystem for /dev with Mandatory Access Control.

I apologize if I'm commiting a faux pas by asking this, but any chance
of renaming this to something like strictdev or sdev, or at least with
'dev' in it somewhere?

Maybe the fs will sell like hotcakes and everyone will know what SYAORAN
means by next year, but just in case that doesn't happen, there is
absolutely nothing in the name that would tell me I should bother to
look at it...

> /dev needs to be writable, but this means that files on /dev might be
> tampered with. SYAORAN can restrict combinations of (pathname, attribute)
> that the system can create. The attribute is one of directory, regular
> file, FIFO, UNIX domain socket, symbolic link, character or block device
> file with major/minor device numbers.
>
> SYAORAN can ensure /dev/null is a character device file with major=1 minor=3.
>
> Policy specifications for this filesystem is at
> http://tomoyo.sourceforge.jp/en/1.5.x/policy-syaoran.html
>
> Why not use FUSE?
>
> Because /dev has to be available through the lifetime of the kernel.
> It is not acceptable if /dev stops working due to SIGKILL or OOM-killer.
>
> Why not use SELinux?
>
> Because SELinux doesn't guarantee filename and its attribute.
> The purpose of this filesystem is to ensure filename and its attribute
> (e.g. /dev/null is guaranteed to be a character device file
> with major=1 and minor=3).
>
> Signed-off-by: Tetsuo Handa <[email protected]>
> ---
> fs/syaoran/syaoran.c | 338 +++++++++++++++++
> fs/syaoran/syaoran.h | 964 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 1302 insertions(+)
>
> --- /dev/null
> +++ linux-2.6.24-rc5/fs/syaoran/syaoran.c
> @@ -0,0 +1,338 @@
> +/*
> + * fs/syaoran/syaoran.c
> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Portions Copyright (C) 2005-2007 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2007/12/16
> + *
> + * This filesystem is developed using the ramfs implementation.
> + *
> + */
> +/*
> + * Resizable simple ram filesystem for Linux.
> + *
> + * Copyright (C) 2000 Linus Torvalds.
> + * 2000 Transmeta Corp.
> + *
> + * Usage limits added by David Gibson, Linuxcare Australia.
> + * This file is released under the GPL.
> + */
> +
> +/*
> + * NOTE! This filesystem is probably most useful
> + * not as a real filesystem, but as an example of
> + * how virtual filesystems can be written.
> + *
> + * It doesn't get much simpler than this. Consider
> + * that this file implements the full semantics of
> + * a POSIX-compliant read-write filesystem.
> + *
> + * Note in particular how the filesystem does not
> + * need to implement any data structures of its own
> + * to keep track of the virtual data: using the VFS
> + * caches is sufficient.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/fs.h>
> +#include <linux/pagemap.h>
> +#include <linux/highmem.h>
> +#include <linux/time.h>
> +#include <linux/init.h>
> +#include <linux/string.h>
> +#include <linux/backing-dev.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +static struct super_operations syaoran_ops;
> +static struct address_space_operations syaoran_aops;
> +static struct inode_operations syaoran_file_inode_operations;
> +static struct inode_operations syaoran_dir_inode_operations;
> +static struct inode_operations syaoran_symlink_inode_operations;
> +static struct file_operations syaoran_file_operations;
> +
> +static struct backing_dev_info syaoran_backing_dev_info = {
> + .ra_pages = 0, /* No readahead */
> + .capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK |
> + BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY |
> + BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP,
> +};
> +
> +#include "syaoran.h"
> +
> +static struct inode *syaoran_get_inode(struct super_block *sb, int mode,
> + dev_t dev)
> +{
> + struct inode *inode = new_inode(sb);
> +
> + if (inode) {
> + struct timespec now = CURRENT_TIME;
> + inode->i_mode = mode;
> + inode->i_uid = current->fsuid;
> + inode->i_gid = current->fsgid;
> + inode->i_blocks = 0;
> + inode->i_mapping->a_ops = &syaoran_aops;
> + inode->i_mapping->backing_dev_info = &syaoran_backing_dev_info;
> + inode->i_atime = now;
> + inode->i_mtime = now;
> + inode->i_ctime = now;
> + switch (mode & S_IFMT) {
> + default:
> + init_special_inode(inode, mode, dev);
> + if (S_ISBLK(mode))
> + inode->i_fop = &wrapped_def_blk_fops;
> + else if (S_ISCHR(mode))
> + inode->i_fop = &wrapped_def_chr_fops;
> + inode->i_op = &syaoran_file_inode_operations;
> + break;
> + case S_IFREG:
> + inode->i_op = &syaoran_file_inode_operations;
> + inode->i_fop = &syaoran_file_operations;
> + break;
> + case S_IFDIR:
> + inode->i_op = &syaoran_dir_inode_operations;
> + inode->i_fop = &simple_dir_operations;
> + /*
> + * directory inodes start off with i_nlink == 2
> + * (for "." entry)
> + */
> + inode->i_nlink++;
> + break;
> + case S_IFLNK:
> + inode->i_op = &syaoran_symlink_inode_operations;
> + break;
> + }
> + }
> + return inode;
> +}
> +
> +/*
> + * File creation. Allocate an inode, and we're done..
> + */
> +/* SMP-safe */
> +static int syaoran_mknod(struct inode *dir, struct dentry *dentry, int mode,
> + dev_t dev)
> +{
> + struct inode *inode;
> + int error = -ENOSPC;
> + if (MayCreateNode(dentry, mode, dev) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, mode, dev);
> + if (inode) {
> + if (dir->i_mode & S_ISGID) {
> + inode->i_gid = dir->i_gid;
> + if (S_ISDIR(mode))
> + inode->i_mode |= S_ISGID;
> + }
> + d_instantiate(dentry, inode);
> + dget(dentry); /* Extra count - pin the dentry in core */
> + error = 0;
> + }
> + return error;
> +}
> +
> +static int syaoran_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> +{
> + int retval = syaoran_mknod(dir, dentry, mode | S_IFDIR, 0);
> + if (!retval)
> + dir->i_nlink++;
> + return retval;
> +}
> +
> +static int syaoran_create(struct inode *dir, struct dentry *dentry, int mode,
> + struct nameidata *nd)
> +{
> + return syaoran_mknod(dir, dentry, mode | S_IFREG, 0);
> +}
> +
> +static int syaoran_symlink(struct inode *dir, struct dentry *dentry,
> + const char *symname)
> +{
> + struct inode *inode;
> + int error = -ENOSPC;
> + if (MayCreateNode(dentry, S_IFLNK, 0) < 0)
> + return -EPERM;
> + inode = syaoran_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0);
> + if (inode) {
> + int l = strlen(symname)+1;
> + error = page_symlink(inode, symname, l);
> + if (!error) {
> + if (dir->i_mode & S_ISGID)
> + inode->i_gid = dir->i_gid;
> + d_instantiate(dentry, inode);
> + dget(dentry);
> + } else
> + iput(inode);
> + }
> + return error;
> +}
> +
> +static int syaoran_link(struct dentry *old_dentry, struct inode *dir,
> + struct dentry *dentry)
> +{
> + struct inode *inode = old_dentry->d_inode;
> + if (!inode || MayCreateNode(dentry, inode->i_mode, inode->i_rdev) < 0)
> + return -EPERM;
> + return simple_link(old_dentry, dir, dentry);
> +}
> +
> +static int syaoran_unlink(struct inode *dir, struct dentry *dentry)
> +{
> + if (MayModifyNode(dentry, MAY_DELETE) < 0)
> + return -EPERM;
> + return simple_unlink(dir, dentry);
> +}
> +
> +static int syaoran_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> +{
> + struct inode *inode = old_dentry->d_inode;
> + if (!inode || MayModifyNode(old_dentry, MAY_DELETE) < 0 ||
> + MayCreateNode(new_dentry, inode->i_mode, inode->i_rdev) < 0)
> + return -EPERM;
> + return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
> +}
> +
> +static int syaoran_rmdir(struct inode *dir, struct dentry *dentry)
> +{
> + if (MayModifyNode(dentry, MAY_DELETE) < 0)
> + return -EPERM;
> + return simple_rmdir(dir, dentry);
> +}
> +
> +static int syaoran_setattr(struct dentry *dentry, struct iattr *attr)
> +{
> + struct inode *inode = dentry->d_inode;
> + int error = inode_change_ok(inode, attr);
> + if (!error) {
> + unsigned int ia_valid = attr->ia_valid;
> + unsigned int flags = 0;
> + if (ia_valid & (ATTR_UID | ATTR_GID))
> + flags |= MAY_CHOWN;
> + if (ia_valid & ATTR_MODE)
> + flags |= MAY_CHMOD;
> + if (MayModifyNode(dentry, flags) < 0)
> + return -EPERM;
> + if (!error)
> + error = inode_setattr(inode, attr);
> + }
> + return error;
> +}
> +
> +/*
> + * Copied from mm/page-writeback.c since
> + * __set_page_dirty_no_writeback() is not exported.
> + */
> +static int syaoran_set_page_dirty_no_writeback(struct page *page)
> +{
> + if (!PageDirty(page))
> + SetPageDirty(page);
> + return 0;
> +}
> +
> +static struct address_space_operations syaoran_aops = {
> + .readpage = simple_readpage,
> + .write_begin = simple_write_begin,
> + .write_end = simple_write_end,
> + .set_page_dirty = syaoran_set_page_dirty_no_writeback,
> +};
> +
> +static struct file_operations syaoran_file_operations = {
> + .aio_read = generic_file_aio_read,
> + .read = do_sync_read,
> + .aio_write = generic_file_aio_write,
> + .write = do_sync_write,
> + .mmap = generic_file_mmap,
> + .fsync = simple_sync_file,
> + .splice_read = generic_file_splice_read,
> + .llseek = generic_file_llseek,
> +};
> +
> +static struct inode_operations syaoran_file_inode_operations = {
> + .getattr = simple_getattr,
> + .setattr = syaoran_setattr,
> +};
> +
> +static struct inode_operations syaoran_dir_inode_operations = {
> + .create = syaoran_create,
> + .lookup = simple_lookup,
> + .link = syaoran_link,
> + .unlink = syaoran_unlink,
> + .symlink = syaoran_symlink,
> + .mkdir = syaoran_mkdir,
> + .rmdir = syaoran_rmdir,
> + .mknod = syaoran_mknod,
> + .rename = syaoran_rename,
> + .setattr = syaoran_setattr,
> +};
> +
> +static struct inode_operations syaoran_symlink_inode_operations = {
> + .readlink = generic_readlink,
> + .follow_link = page_follow_link_light,
> + .put_link = page_put_link,
> + .setattr = syaoran_setattr,
> +};
> +
> +static struct super_operations syaoran_ops = {
> + .statfs = simple_statfs,
> + .drop_inode = generic_delete_inode,
> + .put_super = syaoran_put_super,
> +};
> +
> +static int syaoran_fill_super(struct super_block *sb, void *data, int silent)
> +{
> + struct inode *inode;
> + struct dentry *root;
> + int error;
> +
> + sb->s_maxbytes = MAX_LFS_FILESIZE;
> + sb->s_blocksize = PAGE_CACHE_SIZE;
> + sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
> + sb->s_magic = SYAORAN_MAGIC;
> + sb->s_op = &syaoran_ops;
> + sb->s_time_gran = 1;
> + error = Syaoran_Initialize(sb, data);
> + if (error < 0)
> + return error;
> + inode = syaoran_get_inode(sb, S_IFDIR | 0755, 0);
> + if (!inode)
> + return -ENOMEM;
> +
> + root = d_alloc_root(inode);
> + if (!root) {
> + iput(inode);
> + return -ENOMEM;
> + }
> + sb->s_root = root;
> + MakeInitialNodes(sb);
> + return 0;
> +}
> +
> +static int syaoran_get_sb(struct file_system_type *fs_type,
> + int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> +{
> + return get_sb_nodev(fs_type, flags, data, syaoran_fill_super, mnt);
> +}
> +
> +static struct file_system_type syaoran_fs_type = {
> + .owner = THIS_MODULE,
> + .name = "syaoran",
> + .get_sb = syaoran_get_sb,
> + .kill_sb = kill_litter_super,
> +};
> +
> +static int __init init_syaoran_fs(void)
> +{
> + return register_filesystem(&syaoran_fs_type);
> +}
> +
> +static void __exit exit_syaoran_fs(void)
> +{
> + unregister_filesystem(&syaoran_fs_type);
> +}
> +module_init(init_syaoran_fs);
> +module_exit(exit_syaoran_fs);
> +
> +MODULE_LICENSE("GPL");
> --- /dev/null
> +++ linux-2.6.24-rc5/fs/syaoran/syaoran.h
> @@ -0,0 +1,964 @@
> +/*
> + * fs/syaoran/internal.h

That's not what the diff says it's called :)

Also much of this .h file could really stand to be in other
files like syaoran/read_config.c, syaoran/super.c, and
syaoran/debug.c.

> + *
> + * Implementation of the Tamper-Proof Device Filesystem.
> + *
> + * Copyright (C) 2005-2007 NTT DATA CORPORATION
> + *
> + * Version: 1.5.3-pre 2007/12/16
> + *
> + * A brief description about SYAORAN:
> + *
> + * SYAORAN stands for "Simple Yet All-important Object Realizing Abiding
> + * Nexus". SYAORAN is a filesystem for /dev with Mandatory Access Control.
> + *
> + * /dev needs to be writable, but this means that files on /dev might be
> + * tampered with. SYAORAN can restrict combinations of (pathname, attribute)
> + * that the system can create. The attribute is one of directory, regular
> + * file, FIFO, UNIX domain socket, symbolic link, character or block device
> + * file with major/minor device numbers.
> + *
> + * Why not use FUSE?
> + *
> + * Because /dev has to be available through the lifetime of the kernel.
> + * It is not acceptable if /dev stops working due to SIGKILL or OOM-killer .
> + */
> +
> +#ifndef _LINUX_SYAORAN_H
> +#define _LINUX_SYAORAN_H
> +
> +#include <linux/namei.h>
> +#include <linux/mm.h>
> +
> +/***** SYAORAN start. *****/
> +
> +#define list_for_each_cookie(pos, cookie, head) \
> + for ((cookie) || ((cookie) = (head)), pos = (cookie)->next; \
> + prefetch(pos->next), pos != (head) || ((cookie) = NULL); \
> + (cookie) = pos, pos = pos->next)
> +
> +/* The following constants are used to restrict operations.*/
> +
> +#define MAY_CREATE 1 /* This file is allowed to mknod() */
> +#define MAY_DELETE 2 /* This file is allowed to unlink() */
> +#define MAY_CHMOD 4 /* This file is allowed to chmod() */
> +#define MAY_CHOWN 8 /* This file is allowed to chown() */
> +#define DEVICE_USED 16 /* This block or character device file is used. */
> +#define NO_CREATE_AT_MOUNT 32 /* Don't create this file at mount(). */
> +
> +/* some random number */
> +#define SYAORAN_MAGIC 0x2F646576 /* = '/dev' */
> +
> +static void syaoran_put_super(struct super_block *sb);
> +static int Syaoran_Initialize(struct super_block *sb, void *data);
> +static void MakeInitialNodes(struct super_block *sb);
> +static int MayCreateNode(struct dentry *dentry, int mode, int dev);
> +static int MayModifyNode(struct dentry *dentry, unsigned int flags);
> +static int syaoran_create_tracelog(struct super_block *sb,
> + const char *filename);
> +
> +/* Wraps blkdev_open() to trace open operation for block devices. */
> +static int (*org_blkdev_open) (struct inode *inode, struct file *filp);
> +static struct file_operations wrapped_def_blk_fops;
> +
> +static int wrapped_blkdev_open(struct inode *inode, struct file *filp)
> +{
> + int error = org_blkdev_open(inode, filp);
> + if (error != -ENXIO)
> + MayModifyNode(filp->f_dentry, DEVICE_USED);
> + return error;
> +}
> +
> +/* Wraps chrdev_open() to trace open operation for character devices. */
> +static int (*org_chrdev_open) (struct inode *inode, struct file *filp);
> +static struct file_operations wrapped_def_chr_fops;
> +
> +static int wrapped_chrdev_open(struct inode *inode, struct file *filp)
> +{
> + int error = org_chrdev_open(inode, filp);
> + if (error != -ENXIO)
> + MayModifyNode(filp->f_dentry, DEVICE_USED);
> + return error;
> +}
> +
> +/* lookup_create() without nameidata. Called only while initialization. */
> +static struct dentry *lookup_create2(const char *name, struct dentry *base,
> + const bool is_dir)
> +{
> + struct dentry *dentry;
> + const int len = name ? strlen(name) : 0;
> + mutex_lock(&base->d_inode->i_mutex);
> + dentry = lookup_one_len(name, base, len);
> + if (IS_ERR(dentry))
> + goto fail;
> + if (!is_dir && name[len] && !dentry->d_inode)
> + goto enoent;
> + return dentry;
> +enoent:
> + dput(dentry);
> + dentry = ERR_PTR(-ENOENT);
> +fail:
> + return dentry;
> +}
> +
> +/* mkdir(). Called only while initialization. */
> +static int fs_mkdir(const char *pathname, struct dentry *base, int mode,
> + uid_t user, gid_t group)
> +{
> + struct dentry *dentry = lookup_create2(pathname, base, 1);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_mkdir(base->d_inode, dentry, mode);
> + if (!error) {
> + lock_kernel();

lock_kernel()? Why is that necessary?

> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/* mknod(). Called only while initialization. */
> +static int fs_mknod(const char *filename, struct dentry *base, int mode,
> + dev_t dev, uid_t user, gid_t group)
> +{
> + struct dentry *dentry;
> + int error;
> + switch (mode & S_IFMT) {
> + case S_IFCHR:
> + case S_IFBLK:
> + case S_IFIFO:
> + case S_IFSOCK:
> + case S_IFREG:
> + break;
> + default:
> + return -EPERM;
> + }
> + dentry = lookup_create2(filename, base, 0);
> + error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_mknod(base->d_inode, dentry, mode, dev);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/* symlink(). Called only while initialization. */
> +static int fs_symlink(const char *pathname, struct dentry *base,
> + char *oldname, int mode, uid_t user, gid_t group)
> +{
> + struct dentry *dentry = lookup_create2(pathname, base, 0);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + error = vfs_symlink(base->d_inode, dentry, oldname, S_IALLUGO);
> + if (!error) {
> + lock_kernel();
> + dentry->d_inode->i_mode = mode;
> + dentry->d_inode->i_uid = user;
> + dentry->d_inode->i_gid = group;
> + unlock_kernel();
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + return error;
> +}
> +
> +/*
> + * Format string.
> + * Leading and trailing whitespaces are removed.
> + * Multiple whitespaces are packed into single space.
> + */
> +static void NormalizeLine(unsigned char *buffer)
> +{
> + unsigned char *sp = buffer;
> + unsigned char *dp = buffer;
> + bool first = 1;
> + while (*sp && (*sp <= ' ' || *sp >= 127))
> + sp++;
> + while (*sp) {
> + if (!first)
> + *dp++ = ' ';
> + first = 0;
> + while (*sp > ' ' && *sp < 127)
> + *dp++ = *sp++;
> + while (*sp && (*sp <= ' ' || *sp >= 127))
> + sp++;
> + }
> + *dp = '\0';
> +}
> +
> +/* Convert text form of filename into binary form. */
> +static void UnEscape(char *filename)
> +{
> + char *cp = filename;
> + char c, d, e;
> + if (!cp)
> + return;
> + while ((c = *filename++) != '\0') {
> + if (c != '\\') {
> + *cp++ = c;
> + continue;
> + }
> + if ((c = *filename++) == '\\') {
> + *cp++ = c;
> + continue;
> + }
> + if (c < '0' || c > '3')
> + break;
> + d = *filename++;
> + if (d < '0' || d > '7')
> + break;
> + e = *filename++;
> + if (e < '0' || e > '7')
> + break;
> + *(unsigned char *) cp++ = (unsigned char)
> + (((unsigned char) (c - '0') << 6) +
> + ((unsigned char) (d - '0') << 3) +
> + (unsigned char) (e - '0'));
> + }
> + *cp = '\0';
> +}
> +
> +struct dev_entry {
> + struct list_head list;
> + /* Binary form of pathname under mount point. Never NULL. */
> + char *name;
> + /*
> + * Mode and permissions. setuid/setgid/sticky bits are not supported.
> + */
> + mode_t mode;
> + uid_t uid;
> + gid_t gid;
> + dev_t kdev;
> + /*
> + * Binary form of initial contents for the symlink.
> + * NULL if not symlink.
> + */
> + char *symlink_data;
> + /* File access control flags. */
> + unsigned int flags;
> + /* Text form of pathname under mount point. Never NULL. */
> + const char *printable_name;
> + /*
> + * Text form of initial contents for the symlink.
> + * NULL if not symlink.
> + */
> + const char *printable_symlink_data;
> +};
> +
> +struct syaoran_sb_info {
> + struct list_head list;
> + bool initialize_done; /* False if initialization is in progress. */
> + bool is_permissive_mode; /* True if permissive mode. */
> +};
> +
> +static inline char *strdup(const char *data)
> +{
> + return kstrdup(data, GFP_KERNEL);
> +}
> +
> +static int RegisterNodeInfo(char *buffer, struct super_block *sb)
> +{
> + enum {
> + ARG_FILENAME = 0,
> + ARG_PERMISSION = 1,
> + ARG_UID = 2,
> + ARG_GID = 3,
> + ARG_FLAGS = 4,
> + ARG_DEV_TYPE = 5,
> + ARG_SYMLINK_DATA = 6,
> + ARG_DEV_MAJOR = 6,
> + ARG_DEV_MINOR = 7,
> + MAX_ARG = 8
> + };
> + char *args[MAX_ARG];
> + int i;
> + int error = -EINVAL;
> + unsigned int perm, uid, gid, flags, major = 0, minor = 0;
> + struct syaoran_sb_info *info = (struct syaoran_sb_info *) sb->s_fs_info;
> + struct dev_entry *entry;
> + memset(args, 0, sizeof(args));
> + args[0] = buffer;
> + for (i = 1; i < MAX_ARG; i++) {
> + args[i] = strchr(args[i - 1] + 1, ' ');
> + if (!args[i])
> + break;
> + *args[i]++ = '\0';
> + }
> + /*
> + printk("<%s> <%s> <%s> <%s> <%s> <%s> <%s> <%s>\n",
> + args[0], args[1], args[2], args[3], args[4], args[5],
> + args[6], args[7]);
> + */
> + if (!args[ARG_FILENAME] || !args[ARG_PERMISSION] || !args[ARG_UID] ||
> + !args[ARG_GID] || !args[ARG_DEV_TYPE] || !args[ARG_FLAGS])
> + goto out;
> + if (sscanf(args[ARG_PERMISSION], "%o", &perm) != 1 || !(perm <= 0777)
> + || sscanf(args[ARG_UID], "%u", &uid) != 1
> + || sscanf(args[ARG_GID], "%u", &gid) != 1
> + || sscanf(args[ARG_FLAGS], "%u", &flags) != 1
> + || *(args[ARG_DEV_TYPE] + 1))
> + goto out;
> + switch (*args[ARG_DEV_TYPE]) {
> + case 'c':
> + perm |= S_IFCHR;
> + if (!args[ARG_DEV_MAJOR]
> + || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
> + || !args[ARG_DEV_MINOR]
> + || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
> + goto out;
> + break;
> + case 'b':
> + perm |= S_IFBLK;
> + if (!args[ARG_DEV_MAJOR]
> + || sscanf(args[ARG_DEV_MAJOR], "%u", &major) != 1
> + || !args[ARG_DEV_MINOR]
> + || sscanf(args[ARG_DEV_MINOR], "%u", &minor) != 1)
> + goto out;
> + break;
> + case 'l':
> + perm |= S_IFLNK;
> + if (!args[ARG_SYMLINK_DATA])
> + goto out;
> + break;
> + case 'd':
> + perm |= S_IFDIR;
> + break;
> + case 's':
> + perm |= S_IFSOCK;
> + break;
> + case 'p':
> + perm |= S_IFIFO;
> + break;
> + case 'f':
> + perm |= S_IFREG;
> + break;
> + default:
> + goto out;
> + }
> + error = -ENOMEM;
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry)
> + goto out;
> + if (S_ISLNK(perm)) {
> + entry->printable_symlink_data = strdup(args[ARG_SYMLINK_DATA]);
> + if (!entry->printable_symlink_data)
> + goto out_freemem;
> + }
> + entry->printable_name = strdup(args[ARG_FILENAME]);
> + if (!entry->printable_name)
> + goto out_freemem;
> + if (S_ISLNK(perm)) {
> + entry->symlink_data = strdup(entry->printable_symlink_data);
> + if (!entry->symlink_data)
> + goto out_freemem;
> + UnEscape(entry->symlink_data);
> + }
> + entry->name = strdup(entry->printable_name);
> + if (!entry->name)
> + goto out_freemem;
> + UnEscape(entry->name);
> + /*
> + * Drop trailing '/', for GetLocalAbsolutePath() doesn't append
> + * trailing '/'.
> + */
> + i = strlen(entry->name);
> + if (i && entry->name[i - 1] == '/')
> + entry->name[i - 1] = '\0';
> + entry->mode = perm;
> + entry->uid = uid;
> + entry->gid = gid;
> + entry->kdev = S_ISCHR(perm) || S_ISBLK(perm) ? MKDEV(major, minor) : 0;
> + entry->flags = flags;
> + list_add_tail(&entry->list, &info->list);
> + /* printk("Entry added.\n"); */
> + error = 0;
> +out:
> + return error;
> +out_freemem:
> + kfree(entry->printable_symlink_data);
> + kfree(entry->printable_name);
> + kfree(entry->symlink_data);
> + kfree(entry);
> + goto out;
> +}
> +
> +static void syaoran_put_super(struct super_block *sb)
> +{
> + struct syaoran_sb_info *info;
> + struct dev_entry *entry;
> + struct dev_entry *tmp;
> + if (!sb)
> + return;
> + info = (struct syaoran_sb_info *) sb->s_fs_info;
> + if (!info)
> + return;
> + list_for_each_entry_safe(entry, tmp, &info->list, list) {
> + kfree(entry->name);
> + kfree(entry->symlink_data);
> + kfree(entry->printable_name);
> + kfree(entry->printable_symlink_data);
> + list_del(&entry->list);
> + /* printk("Entry removed.\n"); */
> + kfree(entry);
> + }
> + kfree(info);
> + sb->s_fs_info = NULL;
> + printk(KERN_DEBUG "%s: Unused memory freed.\n", __FUNCTION__);
> +}
> +
> +static int ReadConfigFile(struct file *file, struct super_block *sb)
> +{
> + char *buffer;
> + int error = -ENOMEM;
> + if (!file)
> + return -EINVAL;
> + buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (buffer) {
> + int len;
> + char *cp;
> + unsigned long offset = 0;
> + while ((len = kernel_read(file, offset, buffer, PAGE_SIZE)) > 0
> + && (cp = memchr(buffer, '\n', len)) != NULL) {
> + *cp = '\0';
> + offset += cp - buffer + 1;
> + NormalizeLine(buffer);
> + if (RegisterNodeInfo(buffer, sb) == -ENOMEM)
> + goto out;
> + }
> + error = 0;
> + }
> +out:
> + kfree(buffer);
> + return error;
> +}
> +
> +static void MakeNode(struct dev_entry *entry, struct dentry *root)
> +{
> + struct dentry *base = dget(root);
> + char *filename = entry->name;
> + char *name = filename;
> + unsigned int c;
> + const mode_t perm = entry->mode;
> + const uid_t uid = entry->uid;
> + const gid_t gid = entry->gid;
> + goto start;
> + while ((c = *(unsigned char *) filename) != '\0') {
> + if (c == '/') {
> + struct dentry *new_base;
> + const int len = filename - name;
> + *filename = '\0';
> + mutex_lock(&base->d_inode->i_mutex);
> + new_base = lookup_one_len(name, base, len);
> + mutex_unlock(&base->d_inode->i_mutex);
> + dput(base);
> + *filename = '/';
> + filename++;
> + if (IS_ERR(new_base))
> + return;
> + if (!new_base->d_inode ||
> + !S_ISDIR(new_base->d_inode->i_mode)) {
> + dput(new_base);
> + return;
> + }
> + base = new_base;
> +start:
> + name = filename;
> + } else {
> + filename++;
> + }
> + }
> + filename = (char *) name;
> + if (S_ISLNK(perm)) {
> + fs_symlink(filename, base, entry->symlink_data, perm, uid, gid);
> + } else if (S_ISDIR(perm)) {
> + fs_mkdir(filename, base, perm ^ S_IFDIR, uid, gid);
> + } else if (S_ISSOCK(perm) || S_ISFIFO(perm) || S_ISREG(perm)) {
> + fs_mknod(filename, base, perm, 0, uid, gid);
> + } else if (S_ISCHR(perm) || S_ISBLK(perm)) {
> + fs_mknod(filename, base, perm, entry->kdev, uid, gid);
> + }
> + dput(base);
> +}
> +
> +/* Create files according to the policy file. */
> +static void MakeInitialNodes(struct super_block *sb)
> +{
> + struct syaoran_sb_info *info;
> + struct dev_entry *entry;
> + if (!sb)
> + return;
> + info = (struct syaoran_sb_info *) sb->s_fs_info;
> + if (!info)
> + return;
> + if (info->is_permissive_mode) {
> + syaoran_create_tracelog(sb, ".syaoran");
> + syaoran_create_tracelog(sb, ".syaoran_all");
> + }
> + list_for_each_entry(entry, &info->list, list) {
> + if ((entry->flags & NO_CREATE_AT_MOUNT) == 0)
> + MakeNode(entry, sb->s_root);
> + }
> + info->initialize_done = 1;
> +}
> +
> +/* Read policy file. */
> +static int Syaoran_Initialize(struct super_block *sb, void *data)
> +{
> + int error = -EINVAL;
> + static bool first = 1;
> + if (first) {
> + first = 0;
> + printk(KERN_INFO "SYAORAN: 1.5.3-pre 2007/12/16\n");
> + }

This fn should probably be split into several smaller functions...

> + {
> + struct inode *inode = new_inode(sb);
> + if (!inode)
> + return -EINVAL;
> + /* Create /dev/ram0 to get the value of blkdev_open(). */

Since you're submitting your code for inclusion, this shouldn't be
necessary. If you need a pointer to blkdev_open and chrdev_open, then
get the actual values from fs/char_dev.c and fs/block_dev.c.

> + init_special_inode(inode, S_IFBLK | 0666, MKDEV(1, 0));
> + wrapped_def_blk_fops = *inode->i_fop;
> + iput(inode);
> + org_blkdev_open = wrapped_def_blk_fops.open;
> + wrapped_def_blk_fops.open = wrapped_blkdev_open;
> + }
> + {
> + struct inode *inode = new_inode(sb);
> + if (!inode)
> + return -EINVAL;
> + /* Create /dev/null to get the value of chrdev_open(). */
> + init_special_inode(inode, S_IFCHR | 0666, MKDEV(1, 3));
> + wrapped_def_chr_fops = *inode->i_fop;
> + iput(inode);
> + org_chrdev_open = wrapped_def_chr_fops.open;
> + wrapped_def_chr_fops.open = wrapped_chrdev_open;
> + }
> + if (data) {

Passing a filename as a mount option and having the kernel parse
that as a config file is probably not acceptable.

Any reason not to create /dev/config, /dev/enforce, and /dev/start
files, so early userspace can just mount this fs, cat policy >
/dev/config, echo 1 > /dev/enforce cat 1 > /dev/start, and then your
fs starts enforcing?

> + struct file *f;
> + char *filename = (char *) data;
> + bool is_permissive_mode = 0;
> + if (strncmp(filename, "accept=", 7) == 0) {
> + filename += 7;
> + is_permissive_mode = 1;
> + } else if (strncmp(filename, "enforce=", 8) == 0) {
> + filename += 8;
> + is_permissive_mode = 0;
> + } else {
> + printk(KERN_INFO
> + "SYAORAN: Missing 'accept=' or 'enforce='.\n");
> + return -EINVAL;
> + }
> + f = filp_open(filename, O_RDONLY, 0600);
> + if (!IS_ERR(f)) {
> + struct syaoran_sb_info *p;
> + if (!S_ISREG(f->f_dentry->d_inode->i_mode))
> + goto out;
> + p = kzalloc(sizeof(*p), GFP_KERNEL);
> + if (!p)
> + goto out;
> + p->is_permissive_mode = is_permissive_mode;
> + sb->s_fs_info = p;
> + INIT_LIST_HEAD(&((struct syaoran_sb_info *)
> + sb->s_fs_info)->list);
> + printk(KERN_INFO "SYAORAN: Reading '%s'\n", filename);
> + error = ReadConfigFile(f, sb);
> +out:
> + if (error)
> + printk(KERN_INFO "SYAORAN: Can't read '%s'\n",
> + filename);
> + filp_close(f, NULL);
> + } else {
> + printk(KERN_INFO "SYAORAN: Can't open '%s'\n",
> + filename);
> + }
> + } else {
> + printk(KERN_INFO "SYAORAN: Missing config-file path.\n");
> + }
> + return error;
> +}
> +
> +/* Get absolute pathname from mount point. */
> +static int GetLocalAbsolutePath(struct dentry *dentry, char *buffer, int buflen)
> +{
> + char *start = buffer;
> + char *end = buffer + buflen;
> + int namelen;
> +
> + if (buflen < 256)
> + goto out;
> +
> + *--end = '\0';
> + buflen--;
> + for (;;) {
> + struct dentry *parent;
> + if (IS_ROOT(dentry))
> + break;
> + parent = dentry->d_parent;
> + namelen = dentry->d_name.len;
> + buflen -= namelen + 1;
> + if (buflen < 0)
> + goto out;
> + end -= namelen;
> + memcpy(end, dentry->d_name.name, namelen);
> + *--end = '/';
> + dentry = parent;
> + }
> + if (*end == '/') {
> + buflen++;
> + end++;
> + }
> + namelen = dentry->d_name.len;
> + buflen -= namelen;
> + if (buflen < 0)
> + goto out;
> + end -= namelen;
> + memcpy(end, dentry->d_name.name, namelen);
> + memmove(start, end, strlen(end) + 1);
> + return 0;
> +out:
> + return -ENOMEM;
> +}
> +
> +/* Get absolute pathname of the given dentry from mount point. */
> +static int local_realpath_from_dentry(struct dentry *dentry, char *newname,
> + int newname_len)
> +{
> + int error;
> + struct dentry *d_dentry;
> + if (!dentry || !newname || newname_len <= 0)
> + return -EINVAL;
> + d_dentry = dget(dentry);
> + /***** CRITICAL SECTION START *****/
> + spin_lock(&dcache_lock);
> + error = GetLocalAbsolutePath(d_dentry, newname, newname_len);
> + spin_unlock(&dcache_lock);
> + /***** CRITICAL SECTION END *****/
> + dput(d_dentry);
> + return error;
> +}
> +
> +static int CheckFlags(struct syaoran_sb_info *info, struct dentry *dentry,
> + int mode, int dev, unsigned int flags)
> +{

This reaally could be made much more readable...

> + int error = -EPERM;
> + /*
> + * I use static buffer, for local_realpath_from_dentry() needs
> + * dcache_lock.
> + */
> + static char filename[PAGE_SIZE];
> + static DEFINE_SPINLOCK(lock);
> + spin_lock(&lock);
> + memset(filename, 0, sizeof(filename));
> + if (local_realpath_from_dentry(dentry, filename, sizeof(filename) - 1)
> + == 0) {

Since the fn name doesn't help one has to follow several
functions to make sure that ==0 in fact means there was
no error copying the name...

> + struct dev_entry *entry;
> + list_for_each_entry(entry, &info->list, list) {

walking through a list of info for devices which are allowed to exist,
attached to the sb? How about a helper fn?

> + if ((mode & S_IFMT) != (entry->mode & S_IFMT))
> + continue;
> + if ((S_ISBLK(mode) || S_ISCHR(mode)) &&
> + dev != entry->kdev)
> + continue;
> + if (strcmp(entry->name, filename + 1))
> + continue;
> + if (info->is_permissive_mode) {
> + entry->flags |= flags;
> + error = 0;
> + } else {
> + if ((entry->flags & flags) == flags)
> + error = 0;
> + }
> + break;
> + }
> + }
> + if (error && strlen(filename) < (sizeof(filename) / 4) - 16) {

Seems like this whole block should be a separate error/debug
helper function...

> + const char *name;
> + const uid_t uid = current->fsuid;
> + const gid_t gid = current->fsgid;
> + const mode_t perm = mode & 0777;
> + flags &= ~DEVICE_USED;
> + {
> + char *end = filename + sizeof(filename) - 1;
> + const char *cp = strchr(filename, '\0') - 1;
> + while (cp > filename) {
> + const unsigned char c = *cp--;
> + if (c == '\\') {
> + *--end = '\\';
> + *--end = '\\';
> + } else if (c > ' ' && c < 127) {
> + *--end = c;
> + } else {
> + *--end = (c & 7) + '0';
> + *--end = ((c >> 3) & 7) + '0';
> + *--end = (c >> 6) + '0';
> + *--end = '\\';
> + }
> + }
> + name = end;
> + }
> + switch (mode & S_IFMT) {
> + case S_IFCHR:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'c',
> + MAJOR(dev), MINOR(dev));
> + break;
> + case S_IFBLK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'b',
> + MAJOR(dev), MINOR(dev));
> + break;
> + case S_IFIFO:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'p');
> + break;
> + case S_IFSOCK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 's');
> + break;
> + case S_IFDIR:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'd');
> + break;
> + case S_IFLNK:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c %s\n",
> + name, perm, uid, gid, flags, 'l', "unknown");
> + break;
> + case S_IFREG:
> + printk(KERN_DEBUG
> + "SYAORAN-ERROR: %s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'f');
> + break;
> + }
> + }
> + spin_unlock(&lock);
> + return error;
> +}
> +
> +/* Check whether the given dentry is allowed to mknod. */
> +static int MayCreateNode(struct dentry *dentry, int mode, int dev)
> +{
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
> + if (!info) {
> + printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
> + __FUNCTION__);
> + return -EPERM;
> + }
> + if (!info->initialize_done)
> + return 0;
> + return CheckFlags(info, dentry, mode, dev, MAY_CREATE);
> +}
> +
> +/* Check whether the given dentry is allowed to chmod/chown/unlink. */
> +static int MayModifyNode(struct dentry *dentry, unsigned int flags)
> +{
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) dentry->d_sb->s_fs_info;
> + if (!info) {
> + printk(KERN_DEBUG "%s: dentry->d_sb->s_fs_info == NULL\n",
> + __FUNCTION__);
> + return -EPERM;
> + }
> + if (flags == DEVICE_USED && !info->is_permissive_mode)
> + return 0;
> + if (!dentry->d_inode)
> + return -ENOENT;
> + return CheckFlags(info, dentry, dentry->d_inode->i_mode,
> + dentry->d_inode->i_rdev, flags);
> +}
> +
> +/*
> + * The following structure and codes are used for transferring data
> + * to interfaces files.

And should probably be in a separate file.

> + */
> +
> +struct syaoran_read_struct {
> + char *buf; /* Buffer for reading. */
> + int avail; /* Bytes available for reading. */
> + struct super_block *sb; /* The super_block of this partition. */
> + struct dev_entry *entry; /* The entry currently reading from. */
> + _Bool read_all; /* Dump all entries? */
> + struct list_head *pos; /* Current position. */
> +};
> +
> +static void ReadTable(struct syaoran_read_struct *head, char *buf, int count)
> +{
> + struct super_block *sb = head->sb;
> + struct syaoran_sb_info *info =
> + (struct syaoran_sb_info *) sb->s_fs_info;
> + struct list_head *pos;
> + const _Bool read_all = head->read_all;
> + if (!info)
> + return;
> + if (!head->pos)
> + return;
> + list_for_each_cookie(pos, head->pos, &info->list) {
> + struct dev_entry *entry =
> + list_entry(pos, struct dev_entry, list);
> + const unsigned int flags =
> + read_all ? entry->flags : entry->flags & ~DEVICE_USED;
> + const char *name = entry->printable_name;
> + const uid_t uid = entry->uid;
> + const gid_t gid = entry->gid;
> + const mode_t perm = entry->mode & 0777;
> + int len = 0;
> + switch (entry->mode & S_IFMT) {
> + case S_IFCHR:
> + if (!head->read_all && !(entry->flags & DEVICE_USED))
> + break;
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'c',
> + MAJOR(entry->kdev), MINOR(entry->kdev));
> + break;
> + case S_IFBLK:
> + if (!head->read_all && !(entry->flags & DEVICE_USED))
> + break;
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %3u %3u\n",
> + name, perm, uid, gid, flags, 'b',
> + MAJOR(entry->kdev), MINOR(entry->kdev));
> + break;
> + case S_IFIFO:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'p');
> + break;
> + case S_IFSOCK:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 's');
> + break;
> + case S_IFDIR:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'd');
> + break;
> + case S_IFLNK:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c %s\n",
> + name, perm, uid, gid, flags, 'l',
> + entry->printable_symlink_data);
> + break;
> + case S_IFREG:
> + len = snprintf(buf, count,
> + "%-20s %3o %3u %3u %2u %c\n",
> + name, perm, uid, gid, flags, 'f');
> + break;
> + }
> + if (len < 0 || count <= len)
> + break;
> + count -= len;
> + buf += len;
> + head->avail += len;
> + }
> +}
> +
> +static int syaoran_trace_open(struct inode *inode, struct file *file)
> +{
> + struct syaoran_read_struct *head =
> + kzalloc(sizeof(*head), GFP_KERNEL);
> + if (!head)
> + return -ENOMEM;
> + head->sb = inode->i_sb;
> + head->read_all =
> + (strcmp(file->f_dentry->d_name.name, ".syaoran_all") == 0);
> + head->pos = &((struct syaoran_sb_info *) head->sb->s_fs_info)->list;
> + head->buf = kzalloc(PAGE_SIZE * 2, GFP_KERNEL);
> + if (!head->buf) {
> + kfree(head);
> + return -ENOMEM;
> + }
> + file->private_data = head;
> + return 0;
> +}
> +
> +static int syaoran_trace_release(struct inode *inode, struct file *file)
> +{
> + struct syaoran_read_struct *head = file->private_data;
> + kfree(head->buf);
> + kfree(head);
> + file->private_data = NULL;
> + return 0;
> +}
> +
> +static ssize_t syaoran_trace_read(struct file *file, char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct syaoran_read_struct *head =
> + (struct syaoran_read_struct *) file->private_data;
> + int len = head->avail;
> + char *cp = head->buf;
> + if (!access_ok(VERIFY_WRITE, buf, count))
> + return -EFAULT;
> + ReadTable(head, cp + len, PAGE_SIZE * 2 - len);
> + len = head->avail;
> + if (len > count)
> + len = count;
> + if (len > 0) {
> + if (copy_to_user(buf, cp, len))
> + return -EFAULT;
> + head->avail -= len;
> + memmove(cp, cp + len, head->avail);
> + }
> + return len;
> +}
> +
> +static struct file_operations syaoran_trace_operations = {
> + .open = syaoran_trace_open,
> + .release = syaoran_trace_release,
> + .read = syaoran_trace_read,
> +};
> +
> +/* Create interface files for reading status. */
> +static int syaoran_create_tracelog(struct super_block *sb,
> + const char *filename)
> +{
> + struct dentry *base = dget(sb->s_root);
> + struct dentry *dentry = lookup_create2(filename, base, 0);
> + int error = PTR_ERR(dentry);
> + if (!IS_ERR(dentry)) {
> + struct inode *inode = new_inode(sb);
> + if (inode) {
> + struct timespec now = CURRENT_TIME;
> + inode->i_mode = S_IFREG | 0400;
> + inode->i_uid = 0;
> + inode->i_gid = 0;
> + inode->i_blocks = 0;
> + inode->i_mapping->a_ops = &syaoran_aops;
> + inode->i_mapping->backing_dev_info =
> + &syaoran_backing_dev_info;
> + inode->i_op = &syaoran_file_inode_operations;
> + inode->i_atime = now;
> + inode->i_mtime = now;
> + inode->i_ctime = now;
> + inode->i_fop = &syaoran_trace_operations;
> + d_instantiate(dentry, inode);
> + dget(dentry); /* Extra count - pin the dentry in core */
> + error = 0;
> + }
> + dput(dentry);
> + }
> + mutex_unlock(&base->d_inode->i_mutex);
> + dput(base);
> + return error;
> +}
> +
> +/***** SYAORAN end. *****/
> +#endif
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2007-12-20 00:06:19

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Serge E. Hallyn wrote:
> Quoting Pavel Emelyanov ([email protected]):
>> Oren Laadan wrote:
>>> Serge E. Hallyn wrote:
>>>> Quoting Oren Laadan ([email protected]):
>>>>> I hate to bring this again, but what if the admin in the container
>>>>> mounts an external file system (eg. nfs, usb, loop mount from a file,
>>>>> or via fuse), and that file system already has a device that we would
>>>>> like to ban inside that container ?
>>>> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
>>>> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
>>> Yes, that works to disallow all device files from a mounted file system.
>>>
>>> But it's a black and white thing: either they are all banned or allowed;
>>> you can't have some devices allowed and others not, depending on type
>>> A scenario where this may be useful is, for instance, if we some apps in
>>> the container to execute withing a pre-made chroot (sub)tree within that
>>> container.
>>>
>>>> But that's been pulled out of -mm! ? Crap.
>>>>
>>>>> Since anyway we will have to keep a white- (or black-) list of devices
>>>>> that are permitted in a container, and that list may change even change
>>>>> per container -- why not enforce the access control at the VFS layer ?
>>>>> It's safer in the long run.
>>>> By that you mean more along the lines of Pavel's patch than my whitelist
>>>> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
>>>> by 'vfs layer' :), or something different entirely?
>>> :)
>>>
>>> By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
>>> Either yours or Pavel's; I tend to prefer not to use LSM as it may
>>> collide with future security modules.
>> Oren, AFAIS you've seen my patches for device access controller, right?

If you mean this one:
http://openvz.org/pipermail/devel/2007-September/007647.html
then ack :)

>>
>> Maybe we can revisit the issue then and try to come to agreement on what
>> kind of model and implementation we all want?
>
> That would be great, Pavel. I do prefer your solution over my LSM, so
> if we can get an elegant block device control right in the vfs code that
> would be my preference.

I concur.

So it seems to me that we are all in favor of the model where open()
of a device will consult a black/white-list. Also, we are all in favor
of a non-LSM implementation, Pavel's code being a good example.

Oren.

> The only thing that makes me keep wanting to go back to an LSM is the
> fact that the code defining the whitelist seems out of place in the vfs.
> But I guess that's actually separated into a modular cgroup, with the
> actual enforcement built in at the vfs. So that's really the best
> solution.
>
> -serge

2007-12-20 07:43:24

by Pavel Emelyanov

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Oren Laadan wrote:
>
> Serge E. Hallyn wrote:
>> Quoting Pavel Emelyanov ([email protected]):
>>> Oren Laadan wrote:
>>>> Serge E. Hallyn wrote:
>>>>> Quoting Oren Laadan ([email protected]):
>>>>>> I hate to bring this again, but what if the admin in the container
>>>>>> mounts an external file system (eg. nfs, usb, loop mount from a file,
>>>>>> or via fuse), and that file system already has a device that we would
>>>>>> like to ban inside that container ?
>>>>> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
>>>>> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
>>>> Yes, that works to disallow all device files from a mounted file system.
>>>>
>>>> But it's a black and white thing: either they are all banned or allowed;
>>>> you can't have some devices allowed and others not, depending on type
>>>> A scenario where this may be useful is, for instance, if we some apps in
>>>> the container to execute withing a pre-made chroot (sub)tree within that
>>>> container.
>>>>
>>>>> But that's been pulled out of -mm! ? Crap.
>>>>>
>>>>>> Since anyway we will have to keep a white- (or black-) list of devices
>>>>>> that are permitted in a container, and that list may change even change
>>>>>> per container -- why not enforce the access control at the VFS layer ?
>>>>>> It's safer in the long run.
>>>>> By that you mean more along the lines of Pavel's patch than my whitelist
>>>>> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
>>>>> by 'vfs layer' :), or something different entirely?
>>>> :)
>>>>
>>>> By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
>>>> Either yours or Pavel's; I tend to prefer not to use LSM as it may
>>>> collide with future security modules.
>>> Oren, AFAIS you've seen my patches for device access controller, right?
>
> If you mean this one:
> http://openvz.org/pipermail/devel/2007-September/007647.html
> then ack :)

Great! Thanks.

>>> Maybe we can revisit the issue then and try to come to agreement on what
>>> kind of model and implementation we all want?
>> That would be great, Pavel. I do prefer your solution over my LSM, so
>> if we can get an elegant block device control right in the vfs code that
>> would be my preference.
>
> I concur.
>
> So it seems to me that we are all in favor of the model where open()
> of a device will consult a black/white-list. Also, we are all in favor
> of a non-LSM implementation, Pavel's code being a good example.

Thank you, Oren and Serge! I will revisit this issue then, but
I have a vacation the next week and, after this, we have a New
Year and Christmas holidays in Russia. So I will be able to go
on with it only after the 7th January :( Hope this is OK for you.

Besides, Andrew told that he would pay little attention to new
features till the 2.6.24 release, so I'm afraid we won't have this
even in -mm in the nearest months :(

Thanks,
Pavel

> Oren.
>
>> The only thing that makes me keep wanting to go back to an LSM is the
>> fact that the code defining the whitelist seems out of place in the vfs.
>> But I guess that's actually separated into a modular cgroup, with the
>> actual enforcement built in at the vfs. So that's really the best
>> solution.
>>
>> -serge
>

2007-12-20 14:09:57

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Quoting Pavel Emelyanov ([email protected]):
> Oren Laadan wrote:
> >
> > Serge E. Hallyn wrote:
> >> Quoting Pavel Emelyanov ([email protected]):
> >>> Oren Laadan wrote:
> >>>> Serge E. Hallyn wrote:
> >>>>> Quoting Oren Laadan ([email protected]):
> >>>>>> I hate to bring this again, but what if the admin in the container
> >>>>>> mounts an external file system (eg. nfs, usb, loop mount from a file,
> >>>>>> or via fuse), and that file system already has a device that we would
> >>>>>> like to ban inside that container ?
> >>>>> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
> >>>>> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
> >>>> Yes, that works to disallow all device files from a mounted file system.
> >>>>
> >>>> But it's a black and white thing: either they are all banned or allowed;
> >>>> you can't have some devices allowed and others not, depending on type
> >>>> A scenario where this may be useful is, for instance, if we some apps in
> >>>> the container to execute withing a pre-made chroot (sub)tree within that
> >>>> container.
> >>>>
> >>>>> But that's been pulled out of -mm! ? Crap.
> >>>>>
> >>>>>> Since anyway we will have to keep a white- (or black-) list of devices
> >>>>>> that are permitted in a container, and that list may change even change
> >>>>>> per container -- why not enforce the access control at the VFS layer ?
> >>>>>> It's safer in the long run.
> >>>>> By that you mean more along the lines of Pavel's patch than my whitelist
> >>>>> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
> >>>>> by 'vfs layer' :), or something different entirely?
> >>>> :)
> >>>>
> >>>> By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
> >>>> Either yours or Pavel's; I tend to prefer not to use LSM as it may
> >>>> collide with future security modules.
> >>> Oren, AFAIS you've seen my patches for device access controller, right?
> >
> > If you mean this one:
> > http://openvz.org/pipermail/devel/2007-September/007647.html
> > then ack :)
>
> Great! Thanks.
>
> >>> Maybe we can revisit the issue then and try to come to agreement on what
> >>> kind of model and implementation we all want?
> >> That would be great, Pavel. I do prefer your solution over my LSM, so
> >> if we can get an elegant block device control right in the vfs code that
> >> would be my preference.
> >
> > I concur.
> >
> > So it seems to me that we are all in favor of the model where open()
> > of a device will consult a black/white-list. Also, we are all in favor
> > of a non-LSM implementation, Pavel's code being a good example.
>
> Thank you, Oren and Serge! I will revisit this issue then, but
> I have a vacation the next week and, after this, we have a New
> Year and Christmas holidays in Russia. So I will be able to go
> on with it only after the 7th January :( Hope this is OK for you.
>
> Besides, Andrew told that he would pay little attention to new
> features till the 2.6.24 release, so I'm afraid we won't have this
> even in -mm in the nearest months :(
>
> Thanks,
> Pavel

Cool, let me know any way I can help when you get started.

thanks,
-serge

2007-12-21 01:46:38

[permalink] [raw]

Subject: Re: [patch 1/2] [RFC] Simple tamper-proof device filesystem.

Pavel Emelyanov wrote:
> Oren Laadan wrote:
>> Serge E. Hallyn wrote:
>>> Quoting Pavel Emelyanov ([email protected]):
>>>> Oren Laadan wrote:
>>>>> Serge E. Hallyn wrote:
>>>>>> Quoting Oren Laadan ([email protected]):
>>>>>>> I hate to bring this again, but what if the admin in the container
>>>>>>> mounts an external file system (eg. nfs, usb, loop mount from a file,
>>>>>>> or via fuse), and that file system already has a device that we would
>>>>>>> like to ban inside that container ?
>>>>>> Miklos' user mount patches enforced that if !capable(CAP_MKNOD),
>>>>>> then mnt->mnt_flags |= MNT_NODEV. So that's no problem.
>>>>> Yes, that works to disallow all device files from a mounted file system.
>>>>>
>>>>> But it's a black and white thing: either they are all banned or allowed;
>>>>> you can't have some devices allowed and others not, depending on type
>>>>> A scenario where this may be useful is, for instance, if we some apps in
>>>>> the container to execute withing a pre-made chroot (sub)tree within that
>>>>> container.
>>>>>
>>>>>> But that's been pulled out of -mm! ? Crap.
>>>>>>
>>>>>>> Since anyway we will have to keep a white- (or black-) list of devices
>>>>>>> that are permitted in a container, and that list may change even change
>>>>>>> per container -- why not enforce the access control at the VFS layer ?
>>>>>>> It's safer in the long run.
>>>>>> By that you mean more along the lines of Pavel's patch than my whitelist
>>>>>> LSM, or you actually mean Tetsuo's filesystem (i assume you don't mean that
>>>>>> by 'vfs layer' :), or something different entirely?
>>>>> :)
>>>>>
>>>>> By 'vfs' I mean at open() time, and not at mount(), or mknod() time.
>>>>> Either yours or Pavel's; I tend to prefer not to use LSM as it may
>>>>> collide with future security modules.
>>>> Oren, AFAIS you've seen my patches for device access controller, right?
>> If you mean this one:
>> http://openvz.org/pipermail/devel/2007-September/007647.html
>> then ack :)
>
> Great! Thanks.
>
>>>> Maybe we can revisit the issue then and try to come to agreement on what
>>>> kind of model and implementation we all want?
>>> That would be great, Pavel. I do prefer your solution over my LSM, so
>>> if we can get an elegant block device control right in the vfs code that
>>> would be my preference.
>> I concur.
>>
>> So it seems to me that we are all in favor of the model where open()
>> of a device will consult a black/white-list. Also, we are all in favor
>> of a non-LSM implementation, Pavel's code being a good example.
>
> Thank you, Oren and Serge! I will revisit this issue then, but
> I have a vacation the next week and, after this, we have a New
> Year and Christmas holidays in Russia. So I will be able to go
> on with it only after the 7th January :( Hope this is OK for you.
>
> Besides, Andrew told that he would pay little attention to new
> features till the 2.6.24 release, so I'm afraid we won't have this
> even in -mm in the nearest months :(

Sounds great ! (as for the delay, it wasn't the highest priority issue
to begin with, so no worries).

Ah.. coincidentally they are celebrated here, too, on the same time :D
Merry Christmas and Happy New Year !

Oren.

>
> Thanks,
> Pavel
>
>> Oren.
>>
>>> The only thing that makes me keep wanting to go back to an LSM is the
>>> fact that the code defining the whitelist seems out of place in the vfs.
>>> But I guess that's actually separated into a modular cgroup, with the
>>> actual enforcement built in at the vfs. So that's really the best
>>> solution.
>>>
>>> -serge
>

2007-12-24 13:09:19