Hi Andrew,
There is a problem with active restarts in autofs (that is to
say restarting autofs when there are busy mounts).
Currently autofs uses "umount -l" to clear active mounts at
restart. While using lazy umount works for most cases, anything
that needs to walk back up the mount tree to construct a path,
such as getcwd(2) and the proc file system /proc/<pid>/cwd, no
longer works because the point from which the path is constructed
has been detached from the mount tree.
The actual problem with autofs is that it can't reconnect to
existing mounts. Immediately one things of just adding the
ability to remount autofs file systems would solve it, but
alas, that can't work. This is because autofs direct mounts
and the implementation of "on demand mount and expire" of
nested mount trees have the file system mounted on top of
the mount trigger dentry.
To resolve this a miscellaneous device node for routing ioctl
commands to these mount points has been implemented for the
autofs4 kernel module.
For those wishing to test this out an updated user space daemon
is needed. Checking out and building from the git repo or
applying all the current patches to the 5.0.3 tar distribution
will do the trick. This is all available at the usual location
on kernel.org.
Ian
Hi Andrew,
Patch to catch invalid dentry when calculating it's path.
Signed-off-by: Ian Kent <[email protected]>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry 2008-02-20 12:55:39.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 12:54:46.000000000 +0900
@@ -171,7 +171,7 @@ static int autofs4_getpath(struct autofs
for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent)
len += tmp->d_name.len + 1;
- if (--len > NAME_MAX) {
+ if (!len || --len > NAME_MAX) {
spin_unlock(&dcache_lock);
return 0;
}
Hi Andrew,
Patch to track the uid and gid of the last process to request
a mount for on an autofs dentry.
Signed-off-by: Ian Kent < [email protected]>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900
@@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str
ino->flags = 0;
ino->mode = mode;
+ ino->uid = 0;
+ ino->gid = 0;
ino->inode = NULL;
ino->dentry = NULL;
ino->size = 0;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
--- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900
@@ -58,6 +58,9 @@ struct autofs_info {
unsigned long last_used;
atomic_t count;
+ uid_t uid;
+ gid_t gid;
+
mode_t mode;
size_t size;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
@@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
status = wq->status;
+ /*
+ * For direct and offset mounts we need to track the requestor
+ * uid and gid in the dentry info struct. This is so it can be
+ * supplied, on request, by the misc device ioctl interface.
+ * This is needed during daemon resatart when reconnecting
+ * to existing, active, autofs mounts. The uid and gid (and
+ * related string values) may be used for macro substitution
+ * in autofs mount maps.
+ */
+ if (!status) {
+ struct dentry *de = NULL;
+
+ /* direct mount or browsable map */
+ ino = autofs4_dentry_ino(dentry);
+ if (!ino) {
+ /* If not lookup actual dentry used */
+ de = d_lookup(dentry->d_parent, &dentry->d_name);
+ ino = autofs4_dentry_ino(de);
+ }
+
+ /* Set mount requestor */
+ if (ino) {
+ if (ino) {
+ ino->uid = wq->uid;
+ ino->gid = wq->gid;
+ }
+ }
+
+ if (de)
+ dput(de);
+ }
+
/* Are we the last process to need status? */
if (atomic_dec_and_test(&wq->wait_ctr))
kfree(wq);
Hi Andrew,
Patch to add miscellaneous device to autofs4 module for
ioctls.
Signed-off-by: Ian Kent < [email protected]>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/expire.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/expire.c 2008-02-22 11:51:41.000000000 +0900
@@ -244,10 +244,10 @@ cont:
}
/* Check if we can expire a direct mount (possibly a tree) */
-static struct dentry *autofs4_expire_direct(struct super_block *sb,
- struct vfsmount *mnt,
- struct autofs_sb_info *sbi,
- int how)
+struct dentry *autofs4_expire_direct(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi,
+ int how)
{
unsigned long timeout;
struct dentry *root = dget(sb->s_root);
@@ -281,10 +281,10 @@ static struct dentry *autofs4_expire_dir
* - it is unused by any user process
* - it has been unused for exp_timeout time
*/
-static struct dentry *autofs4_expire_indirect(struct super_block *sb,
- struct vfsmount *mnt,
- struct autofs_sb_info *sbi,
- int how)
+struct dentry *autofs4_expire_indirect(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi,
+ int how)
{
unsigned long timeout;
struct dentry *root = sb->s_root;
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/init.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900
@@ -29,11 +29,20 @@ static struct file_system_type autofs_fs
static int __init init_autofs4_fs(void)
{
- return register_filesystem(&autofs_fs_type);
+ int err;
+
+ err = register_filesystem(&autofs_fs_type);
+ if (err)
+ return err;
+
+ err = autofs_dev_ioctl_init();
+
+ return err;
}
static void __exit exit_autofs4_fs(void)
{
+ autofs_dev_ioctl_exit();
unregister_filesystem(&autofs_fs_type);
}
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
--- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl 2008-02-22 11:51:41.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-22 11:51:41.000000000 +0900
@@ -14,6 +14,7 @@
/* Internal header file for autofs */
#include <linux/auto_fs4.h>
+#include <linux/auto_dev-ioctl.h>
#include <linux/mutex.h>
#include <linux/list.h>
@@ -40,6 +41,16 @@
#define DPRINTK(fmt,args...) do {} while(0)
#endif
+#define WARN(fmt,args...) \
+do { \
+ printk("KERN_WARNING pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \
+} while(0)
+
+#define ERROR(fmt,args...) \
+do { \
+ printk("KERN_ERR pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \
+} while(0)
+
/* Unified info structure. This is pointed to by both the dentry and
inode structures. Each file in the filesystem has an instance of this
structure. It holds a reference to the dentry, so dentries are never
@@ -172,6 +183,17 @@ int autofs4_expire_run(struct super_bloc
struct autofs_packet_expire __user *);
int autofs4_expire_multi(struct super_block *, struct vfsmount *,
struct autofs_sb_info *, int __user *);
+struct dentry *autofs4_expire_direct(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi, int how);
+struct dentry *autofs4_expire_indirect(struct super_block *sb,
+ struct vfsmount *mnt,
+ struct autofs_sb_info *sbi, int how);
+
+/* Device node initialization */
+
+int autofs_dev_ioctl_init(void);
+void autofs_dev_ioctl_exit(void);
/* Operations structures */
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/Makefile.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/Makefile
--- linux-2.6.25-rc2-mm1/fs/autofs4/Makefile.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/Makefile 2008-02-22 11:51:41.000000000 +0900
@@ -4,4 +4,4 @@
obj-$(CONFIG_AUTOFS4_FS) += autofs4.o
-autofs4-objs := init.o inode.o root.o symlink.o waitq.o expire.o
+autofs4-objs := init.o inode.o root.o symlink.o waitq.o expire.o dev-ioctl.o
diff -up /dev/null linux-2.6.25-rc2-mm1/fs/autofs4/dev-ioctl.c
--- /dev/null 2008-02-22 18:55:57.149000956 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/dev-ioctl.c 2008-02-22 11:53:33.000000000 +0900
@@ -0,0 +1,788 @@
+/*
+ * linux/fs/autofs4/dev-ioctl.c
+ *
+ * Copyright 2008 Red Hat, Inc. All rights reserved.
+ * Copyright 2008 Ian Kent <[email protected]>
+ *
+ * This file is part of the Linux kernel and is made available under
+ * the terms of the GNU General Public License, version 2, or at your
+ * option, any later version, incorporated herein by reference.
+ */
+
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/wait.h>
+#include <linux/namei.h>
+#include <linux/fcntl.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+#include <linux/compat.h>
+#include <linux/syscalls.h>
+#include <linux/smp_lock.h>
+#include <linux/magic.h>
+#include <linux/dcache.h>
+#include <asm/uaccess.h>
+
+#include "autofs_i.h"
+
+/*
+ * This module implements an interface for routing autofs ioctl control
+ * commands via a miscellaneous device file.
+ *
+ * The alternate interface is needed because we need to be able open
+ * an ioctl file descriptor on an autofs mount that may be covered by
+ * another mount. This situation arises when starting automount(8)
+ * or other user space daemon which uses direct mounts or offset
+ * mounts (used for autofs lazy mount/umount of nested mount trees),
+ * which have been left busy at at service shutdown.
+ */
+
+#define AUTOFS_DEVICE_NAME "autofs"
+
+#define AUTOFS_DEV_IOCTL_IOC_FIRST AUTOFS_DEV_IOCTL_VERSION
+#define AUTOFS_DEV_IOCTL_IOC_COUNT AUTOFS_IOC_COUNT - 11
+
+#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl)
+
+typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *, struct autofs_dev_ioctl *);
+
+static int check_name(const char *name)
+{
+ if (!strchr(name, '/'))
+ return -EINVAL;
+ return 0;
+}
+
+/*
+ * Check a string doesn't overrun the chunk of
+ * memory we copied from user land.
+ */
+static int invalid_str(char *str, void *end)
+{
+ while ((void *) str <= end)
+ if (!*str++)
+ return 0;
+ return -EINVAL;
+}
+
+/*
+ * Check that the user compiled against correct version of autofs
+ * misc device code.
+ *
+ * As well as checking the version compatibility this always copies
+ * the kernel interface version out.
+ */
+static int check_dev_ioctl_version(int cmd, struct autofs_dev_ioctl *param)
+{
+ int err = 0;
+
+ if ((AUTOFS_DEV_IOCTL_VERSION_MAJOR != param->ver_major) ||
+ (AUTOFS_DEV_IOCTL_VERSION_MINOR < param->ver_minor)) {
+ WARN("ioctl control interface version mismatch: "
+ "kernel(%u.%u), user(%u.%u), cmd(%d)",
+ AUTOFS_DEV_IOCTL_VERSION_MAJOR,
+ AUTOFS_DEV_IOCTL_VERSION_MINOR,
+ param->ver_major, param->ver_minor, cmd);
+ err = -EINVAL;
+ }
+
+ /* Fill in the kernel version. */
+ param->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR;
+ param->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR;
+
+ return err;
+}
+
+/*
+ * Copy parameter control struct, including a possible path allocated
+ * at the end of the struct.
+ */
+static struct autofs_dev_ioctl *copy_dev_ioctl(struct autofs_dev_ioctl __user *in)
+{
+ struct autofs_dev_ioctl tmp, *ads;
+
+ if (copy_from_user(&tmp, in, sizeof(tmp)))
+ return ERR_PTR(-EFAULT);
+
+ if (tmp.size < sizeof(tmp))
+ return ERR_PTR(-EINVAL);
+
+ ads = kmalloc(tmp.size, GFP_KERNEL);
+ if (!ads)
+ return ERR_PTR(-ENOMEM);
+
+ if (copy_from_user(ads, in, tmp.size)) {
+ vfree(ads);
+ return ERR_PTR(-EFAULT);
+ }
+
+ return ads;
+}
+
+static inline void free_dev_ioctl(struct autofs_dev_ioctl *param)
+{
+ kfree(param);
+ return;
+}
+
+/*
+ * Check sanity of parameter control fields and if a path is present
+ * check that it has a "/" and is terminated.
+ */
+static int validate_dev_ioctl(int cmd, struct autofs_dev_ioctl *param)
+{
+ int err = -EINVAL;
+
+ if (check_dev_ioctl_version(cmd, param)) {
+ WARN("invalid device control module version "
+ "supplied for cmd(0x%08x)", cmd);
+ goto out;
+ }
+
+ if (param->size > sizeof(*param)) {
+ err = check_name(param->path);
+ if (err) {
+ WARN("invalid path supplied for cmd(0x%08x)", cmd);
+ goto out;
+ }
+
+ err = invalid_str(param->path,
+ (void *) ((size_t) param + param->size));
+ if (err) {
+ WARN("invalid path supplied for cmd(0x%08x)", cmd);
+ goto out;
+ }
+ }
+
+ err = 0;
+out:
+ return err;
+}
+
+/*
+ * Get the autofs super block info struct from the file opened on
+ * the autofs mount point.
+ */
+static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f)
+{
+ struct autofs_sb_info *sbi = NULL;
+ struct inode *inode;
+
+ if (f) {
+ inode = f->f_path.dentry->d_inode;
+ sbi = autofs4_sbi(inode->i_sb);
+ }
+
+ return sbi;
+}
+
+/* Return autofs module protocol version */
+static inline int autofs_dev_ioctl_protover(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ param->arg1 = sbi->version;
+ return 0;
+}
+
+/* Return autofs module protocol sub version */
+static inline int autofs_dev_ioctl_protosubver(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ param->arg1 = sbi->sub_version;
+ return 0;
+}
+
+/*
+ * Walk down the mount stack looking for an autofs mount that
+ * has the requested device number (aka. new_encode_dev(sb->s_dev).
+ */
+static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+ struct super_block *sb;
+ dev_t s_dev;
+ unsigned int err;
+
+ err = -ENOENT;
+
+ /* Lookup the dentry name at the base of our mount point */
+ dentry = d_lookup(nd->path.dentry, &nd->last);
+ if (!dentry)
+ goto out;
+
+ dput(nd->path.dentry);
+ nd->path.dentry = dentry;
+
+ /* And follow the mount stack looking for our autofs mount */
+ while (1) {
+ inode = nd->path.dentry->d_inode;
+ if (!inode)
+ continue;
+
+ sb = inode->i_sb;
+ s_dev = new_encode_dev(sb->s_dev);
+ if (devno == s_dev) {
+ if (sb->s_magic == AUTOFS_SUPER_MAGIC) {
+ err = 0;
+ break;
+ }
+ }
+
+ if (!follow_down(&nd->path.mnt, &nd->path.dentry))
+ goto out;
+ }
+
+out:
+ return err;
+}
+
+/*
+ * Install the file opened for autofs mount point control functions
+ * and set close on exec.
+ */
+static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file)
+{
+ struct files_struct *files = current->files;
+ struct fdtable *fdt;
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+ BUG_ON(fdt->fd[fd] != NULL);
+ rcu_assign_pointer(fdt->fd[fd], file);
+ FD_SET(fd, fdt->close_on_exec);
+ spin_unlock(&files->file_lock);
+}
+
+/*
+ * Open a file descriptor on the autofs mount point corresponding
+ * to the given path and device number (aka. new_encode_dev(sb->s_dev)).
+ */
+static int autofs_dev_ioctl_open_mountpoint(const char *path, dev_t devid)
+{
+ struct file *filp;
+ struct nameidata nd;
+ int err, fd;
+
+ fd = get_unused_fd();
+ if (fd >= 0) {
+ /* Get nameidata of the parent directory */
+ err = path_lookup(path, LOOKUP_PARENT, &nd);
+ if (err)
+ goto out;
+
+ /*
+ * Search down, within the parent, looking for an
+ * autofs super block that has the device number
+ * corresponding to the autofs fs we want to open.
+ */
+ err = autofs_dev_ioctl_find_super(&nd, devid);
+ if (err)
+ goto out_put;
+
+ filp = dentry_open(nd.path.dentry, nd.path.mnt, O_RDONLY);
+ if (IS_ERR(filp)) {
+ err = PTR_ERR(filp);
+ goto out_put;
+ }
+
+ autofs_dev_ioctl_fd_install(fd, filp);
+ }
+
+ return fd;
+
+out_put:
+ path_put(&nd.path);
+out:
+ put_unused_fd(fd);
+ return err;
+}
+
+/* Open a file descriptor on an autofs mount point */
+static int autofs_dev_ioctl_openmount(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ const char *path;
+ dev_t devid;
+ int err, fd;
+
+ /* param->path has already been checked */
+ if (!param->arg1)
+ return -EINVAL;
+
+ err = 0;
+
+ devid = param->arg1;
+ path = param->path;
+
+ fd = autofs_dev_ioctl_open_mountpoint(path, devid);
+ if (fd < 0) {
+ err = fd;
+ goto out;
+ }
+
+ param->ioctlfd = fd;
+out:
+ return err;
+}
+
+/* Close file descriptor allocated above (user can also use close(2)). */
+static inline int autofs_dev_ioctl_closemount(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ return sys_close(param->ioctlfd);
+}
+
+/*
+ * Send "ready" status for an existing wait (either a mount or an expire
+ * request).
+ */
+static inline int autofs_dev_ioctl_ready(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ autofs_wqt_t token;
+
+ token = (autofs_wqt_t) param->arg1;
+ return autofs4_wait_release(sbi, token, 0);
+}
+
+/*
+ * Send "fail" status for an existing wait (either a mount or an expire
+ * request).
+ */
+static inline int autofs_dev_ioctl_fail(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ autofs_wqt_t token;
+ int status;
+
+ token = (autofs_wqt_t) param->arg1;
+ status = param->arg2 ? param->arg2 : -ENOENT;
+ return autofs4_wait_release(sbi, token, status);
+}
+
+/*
+ * Set the pipe fd for kernel communication to the daemon.
+ *
+ * Normally this is set at mount using an option but if we
+ * are reconnecting to a busy mount then we need to use this
+ * to tell the autofs mount about the new kernel pipe fd. In
+ * order to protect mounts against incorrectly setting the
+ * pipefd we also require that the autofs mount be catatonic.
+ *
+ * This also sets the process group id used to identify the
+ * controlling process (eg. the owning automount(8) daemon).
+ */
+static int autofs_dev_ioctl_setpipefd(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ int pipefd;
+ int err = 0;
+
+ if (param->arg1 == -1)
+ return -EINVAL;
+
+ pipefd = param->arg1;
+
+ if (!sbi->catatonic)
+ return -EBUSY;
+ else {
+ struct file *pipe = fget(pipefd);
+ if (!pipe->f_op || !pipe->f_op->write) {
+ err = -EPIPE;
+ fput(pipe);
+ goto out;
+ }
+ sbi->oz_pgrp = task_pgrp_nr(current);
+ sbi->pipefd = pipefd;
+ sbi->pipe = pipe;
+ sbi->catatonic = 0;
+ }
+out:
+ return err;
+}
+
+/*
+ * Make the autofs mount point catatonic, no longer responsive to
+ * mount requests. Also closes the kernel pipe file descriptor.
+ */
+static inline int autofs_dev_ioctl_catatonic(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ if (!sbi->catatonic)
+ autofs4_catatonic_mode(sbi);
+ return 0;
+}
+
+/* Set the autofs mount timeout */
+static inline int autofs_dev_ioctl_timeout(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ unsigned long timeout;
+
+ timeout = param->arg1;
+ param->arg1 = sbi->exp_timeout / HZ;
+ sbi->exp_timeout = timeout * HZ;
+ return 0;
+}
+
+/*
+ * Return the uid and gid of the last request for the mount
+ *
+ * When reconstructing an autofs mount tree with active mounts
+ * we need to re-connect to mounts that may have used the original
+ * process uid and gid (or string variations of them) for mount
+ * lookups within the map entry.
+ */
+static inline int autofs_dev_ioctl_requestor(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ struct autofs_info *ino;
+ struct nameidata nd;
+ const char *path;
+ dev_t devid;
+ int err = -ENOENT;
+
+ if (param->size <= sizeof(*param)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ path = param->path;
+ devid = sbi->sb->s_dev;
+
+ param->arg1 = param->arg2 = -1;
+
+ /* Get nameidata of the parent directory */
+ err = path_lookup(path, LOOKUP_PARENT, &nd);
+ if (err)
+ goto out;
+
+ err = autofs_dev_ioctl_find_super(&nd, devid);
+ if (err)
+ goto out_release;
+
+ ino = autofs4_dentry_ino(nd.path.dentry);
+ if (ino) {
+ err = 0;
+ param->arg1 = ino->uid;
+ param->arg2 = ino->gid;
+ }
+
+out_release:
+ path_put(&nd.path);
+out:
+ return err;
+}
+
+/*
+ * Call repeatedly until it returns -EAGAIN, meaning there's nothing
+ * more that can be done.
+ */
+static int autofs_dev_ioctl_expire(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ struct dentry *dentry;
+ struct vfsmount *mnt;
+ int err = -EAGAIN;
+ int when;
+
+ when = param->arg1;
+ mnt = fp->f_path.mnt;
+
+ if (sbi->type & AUTOFS_TYPE_DIRECT)
+ dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when);
+ else
+ dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when);
+
+ if (dentry) {
+ struct autofs_info *ino = autofs4_dentry_ino(dentry);
+
+ /*
+ * This is synchronous because it makes the daemon a
+ * little easier
+ */
+ ino->flags |= AUTOFS_INF_EXPIRING;
+ err = autofs4_wait(sbi, dentry, NFY_EXPIRE);
+ ino->flags &= ~AUTOFS_INF_EXPIRING;
+ dput(dentry);
+ }
+
+ return err;
+}
+
+/* Check if autofs mount point is in use */
+static inline int autofs_dev_ioctl_askumount(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ param->arg1 = 0;
+ if (may_umount(fp->f_path.mnt))
+ param->arg1 = 1;
+ return 0;
+}
+
+/*
+ * Check if the given path is a mountpoint.
+ *
+ * If we are supplied with the file descriptor of the autofs
+ * mount we're looking for a specific mount. In this case
+ * the path is considered a mountpoint if it is itself a
+ * mountpoint or contains a mount, such as a multi-mount
+ * without a root mount.
+ *
+ * If we aren't supplied with a file descriptor then we lookup
+ * the nameidata of the path and check if is a mounted autofs
+ * filesystem or a filesystem mounted within an autofs filesystem.
+ *
+ * In both cases we return an indication of whether the path
+ * is a mount point and the super magic of the covering mount.
+ */
+static inline int autofs_dev_ioctl_ismountpoint(struct file *fp,
+ struct autofs_sb_info *sbi,
+ struct autofs_dev_ioctl *param)
+{
+ struct nameidata nd;
+ const char *path;
+ int err = -ENOENT;
+
+ if (param->size <= sizeof(*param)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ path = param->path;
+
+ param->arg1 = param->arg2 = 0;
+
+ if (param->ioctlfd == -1) {
+ unsigned long magic;
+
+ err = path_lookup(path, LOOKUP_FOLLOW, &nd);
+ if (err)
+ goto out;
+
+ magic = nd.path.dentry->d_inode->i_sb->s_magic;
+
+ if (follow_up(&nd.path.mnt, &nd.path.dentry)) {
+ struct inode *inode = nd.path.dentry->d_inode;
+ if (magic == AUTOFS_SUPER_MAGIC ||
+ inode->i_sb->s_magic == AUTOFS_SUPER_MAGIC) {
+ param->arg1 = 1;
+ param->arg2 = magic;
+ }
+ }
+ } else {
+ dev_t devid = sbi->sb->s_dev;
+
+ err = path_lookup(path, LOOKUP_PARENT, &nd);
+ if (err)
+ goto out;
+
+ err = autofs_dev_ioctl_find_super(&nd, devid);
+ if (err)
+ goto out_release;
+
+ param->arg1 = have_submounts(nd.path.dentry);
+
+ if (d_mountpoint(nd.path.dentry)) {
+ if (follow_down(&nd.path.mnt, &nd.path.dentry)) {
+ struct inode *inode = nd.path.dentry->d_inode;
+ param->arg2 = inode->i_sb->s_magic;
+ }
+ }
+ }
+
+out_release:
+ path_put(&nd.path);
+out:
+ return err;
+}
+
+/*
+ * Our range of ioctl numbers isn't 0 based so we need to shift
+ * the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table
+ * lookup.
+ */
+#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST))
+
+static ioctl_fn lookup_dev_ioctl(unsigned int cmd)
+{
+ static struct {
+ int cmd;
+ ioctl_fn fn;
+ } _ioctls[] = {
+ {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL},
+ {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD), autofs_dev_ioctl_protover},
+ {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD), autofs_dev_ioctl_protosubver},
+ {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD), autofs_dev_ioctl_openmount},
+ {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD), autofs_dev_ioctl_closemount},
+ {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD), autofs_dev_ioctl_ready},
+ {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD), autofs_dev_ioctl_fail},
+ {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD), autofs_dev_ioctl_setpipefd},
+ {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD), autofs_dev_ioctl_catatonic},
+ {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD), autofs_dev_ioctl_timeout},
+ {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTOR_CMD), autofs_dev_ioctl_requestor},
+ {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD), autofs_dev_ioctl_expire},
+ {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD), autofs_dev_ioctl_askumount},
+ {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint}
+ };
+ int idx = cmd_idx(cmd);
+
+ return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn;
+}
+
+/* ioctl dispatcher */
+static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user)
+{
+ struct autofs_dev_ioctl *param;
+ struct file *fp;
+ struct autofs_sb_info *sbi;
+ unsigned int cmd;
+ ioctl_fn fn = NULL;
+ int err = 0;
+
+ /* only root can play with this */
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ cmd = _IOC_NR(command);
+
+ if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) ||
+ cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST) >= AUTOFS_DEV_IOCTL_IOC_COUNT) {
+ return -ENOTTY;
+ }
+
+ fn = lookup_dev_ioctl(cmd);
+ if (!fn) {
+ WARN("unknown command 0x%08x", command);
+ return -ENOTTY;
+ }
+
+ /*
+ * Trying to avoid low memory issues when a device is
+ * suspended.
+ */
+ current->flags |= PF_MEMALLOC;
+
+ /* Copy the parameters into kernel space. */
+ param = copy_dev_ioctl(user);
+
+ current->flags &= ~PF_MEMALLOC;
+
+ if (IS_ERR(param))
+ return PTR_ERR(param);
+
+ err = validate_dev_ioctl(command, param);
+ if (err)
+ goto out;
+
+ /* The validate routine above always sets the version */
+ if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD)
+ goto done;
+
+ fp = NULL;
+ sbi = NULL;
+
+ /*
+ * For obvious reasons the openmount can't have a file
+ * descriptor yet. We don't take a reference to the
+ * file during close to allow for immediate release.
+ */
+ if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD &&
+ cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) {
+ fp = fget(param->ioctlfd);
+ if (!fp) {
+ if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD)
+ goto cont;
+ err = -EBADF;
+ goto out;
+ }
+
+ if (!fp->f_op) {
+ err = -ENOTTY;
+ fput(fp);
+ goto out;
+ }
+
+ sbi = autofs_dev_ioctl_sbi(fp);
+ if (!sbi) {
+ err = -EINVAL;
+ fput(fp);
+ goto out;
+ }
+ }
+cont:
+ err = fn(fp, sbi, param);
+
+ if (fp)
+ fput(fp);
+done:
+ if (!err && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE))
+ err = -EFAULT;
+out:
+ free_dev_ioctl(param);
+ return err;
+}
+
+static long autofs_dev_ioctl(struct file *file, uint command, ulong u)
+{
+ return (long) _autofs_dev_ioctl(command, (struct autofs_dev_ioctl __user *) u);
+}
+
+#ifdef CONFIG_COMPAT
+static long autofs_dev_ioctl_compat(struct file *file, uint command, ulong u)
+{
+ return (long) autofs_dev_ioctl(file, command, (ulong) compat_ptr(u));
+}
+#else
+#define autofs_dev_ioctl_compat NULL
+#endif
+
+static const struct file_operations _dev_ioctl_fops = {
+ .unlocked_ioctl = autofs_dev_ioctl,
+ .compat_ioctl = autofs_dev_ioctl_compat,
+ .owner = THIS_MODULE,
+};
+
+static struct miscdevice _autofs_dev_ioctl_misc = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = AUTOFS_DEVICE_NAME,
+ .fops = &_dev_ioctl_fops
+};
+
+/* Register/deregister misc character device */
+int autofs_dev_ioctl_init(void)
+{
+ int r;
+
+ r = misc_register(&_autofs_dev_ioctl_misc);
+ if (r) {
+ ERROR("misc_register failed for control device");
+ return r;
+ }
+
+ return 0;
+}
+
+void autofs_dev_ioctl_exit(void)
+{
+ if (misc_deregister(&_autofs_dev_ioctl_misc) < 0)
+ ERROR("misc_deregister failed for control device");
+
+ return;
+}
+
diff -up linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h.device-node-ioctl linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h
--- linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
+++ linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h 2008-02-22 11:51:41.000000000 +0900
@@ -23,7 +23,7 @@
#define AUTOFS_MIN_PROTO_VERSION 3
#define AUTOFS_MAX_PROTO_VERSION 5
-#define AUTOFS_PROTO_SUBVERSION 0
+#define AUTOFS_PROTO_SUBVERSION 1
/* Mask for expire behaviour */
#define AUTOFS_EXP_IMMEDIATE 1
diff -up /dev/null linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h
--- /dev/null 2008-02-22 18:55:57.149000956 +0900
+++ linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h 2008-02-22 11:51:41.000000000 +0900
@@ -0,0 +1,114 @@
+/*
+ * linux/include/linux/auto_dev-ioctl.h
+ *
+ * Copyright 2008 Red Hat, Inc. All rights reserved.
+ * Copyright 2008 Ian Kent <[email protected]>
+ *
+ * This file is part of the Linux kernel and is made available under
+ * the terms of the GNU General Public License, version 2, or at your
+ * option, any later version, incorporated herein by reference.
+ */
+
+#ifndef _LINUX_AUTO_DEV_IOCTL_H
+#define _LINUX_AUTO_DEV_IOCTL_H
+
+#include <linux/types.h>
+
+#define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+
+#define AUTOFS_DEVID_LEN 16
+
+/*
+ * An ioctl interface for autofs mount point control.
+ */
+
+/*
+ * All the ioctls use this structure.
+ * When sending a path size must account for the total length
+ * of the chunk of memory otherwise is is the size of the
+ * structure.
+ */
+
+struct autofs_dev_ioctl {
+ __u32 ver_major;
+ __u32 ver_minor;
+ __u32 size; /* total size of data passed in
+ * including this struct */
+ __s32 ioctlfd; /* automount command fd */
+
+ __u32 arg1; /* Command parameters */
+ __u32 arg2;
+
+ char path[0];
+};
+
+static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in)
+{
+ in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR;
+ in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR;
+ in->size = sizeof(struct autofs_dev_ioctl);
+ in->ioctlfd = -1;
+ in->arg1 = 0;
+ in->arg2 = 0;
+ return;
+}
+
+/*
+ * If you change this make sure you make the corresponding change
+ * to autofs-dev-ioctl.c:lookup_ioctl()
+ */
+enum {
+ /* Get various version info */
+ AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71,
+ AUTOFS_DEV_IOCTL_PROTOVER_CMD,
+ AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD,
+
+ /* Open mount ioctl fd */
+ AUTOFS_DEV_IOCTL_OPENMOUNT_CMD,
+
+ /* Close mount ioctl fd */
+ AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD,
+
+ /* Mount/expire status returns */
+ AUTOFS_DEV_IOCTL_READY_CMD,
+ AUTOFS_DEV_IOCTL_FAIL_CMD,
+
+ /* Activate/deactivate autofs mount */
+ AUTOFS_DEV_IOCTL_SETPIPEFD_CMD,
+ AUTOFS_DEV_IOCTL_CATATONIC_CMD,
+
+ /* Expiry timeout */
+ AUTOFS_DEV_IOCTL_TIMEOUT_CMD,
+
+ /* Get mount last requesting uid and gid */
+ AUTOFS_DEV_IOCTL_REQUESTOR_CMD,
+
+ /* Check for eligible expire candidates */
+ AUTOFS_DEV_IOCTL_EXPIRE_CMD,
+
+ /* Request busy status */
+ AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD,
+
+ /* Check if path is a mountpoint */
+ AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD,
+};
+
+#define AUTOFS_IOCTL 0x93
+
+#define AUTOFS_DEV_IOCTL_VERSION _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_VERSION_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_PROTOVER _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_PROTOVER_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_PROTOSUBVER _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_OPENMOUNT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_CLOSEMOUNT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_READY _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_READY_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_FAIL _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_FAIL_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_SETPIPEFD _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_SETPIPEFD_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_CATATONIC _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_CATATONIC_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_TIMEOUT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_TIMEOUT_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_REQUESTOR _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_REQUESTOR_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_EXPIRE _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_EXPIRE_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_ASKUMOUNT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD, struct autofs_dev_ioctl)
+#define AUTOFS_DEV_IOCTL_ISMOUNTPOINT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD, struct autofs_dev_ioctl)
+
+#endif /* _LINUX_AUTO_DEV_IOCTL_H */
Hi Andrew,
Patch to add a display mount option to show the device number
of the autofs mount super block.
Signed-off-by: Ian Kent < [email protected]>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option 2008-02-20 13:01:06.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:03:45.000000000 +0900
@@ -190,6 +190,7 @@ static int autofs4_show_options(struct s
seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ);
seq_printf(m, ",minproto=%d", sbi->min_proto);
seq_printf(m, ",maxproto=%d", sbi->max_proto);
+ seq_printf(m, ",dev=%d", autofs4_get_dev(sbi));
if (sbi->type & AUTOFS_TYPE_OFFSET)
seq_printf(m, ",offset");
@@ -332,7 +333,7 @@ int autofs4_fill_super(struct super_bloc
sbi->sb = s;
sbi->version = 0;
sbi->sub_version = 0;
- sbi->type = 0;
+ sbi->type = AUTOFS_TYPE_INDIRECT;
sbi->min_proto = 0;
sbi->max_proto = 0;
mutex_init(&sbi->wq_mutex);
On Tue, 26 Feb 2008, Ian Kent wrote:
> +
> + /* Set mount requestor */
> + if (ino) {
> + if (ino) {
> + ino->uid = wq->uid;
> + ino->gid = wq->gid;
> + }
> + }
> +
As has been spotted, this is obviously wrong.
And here is the correction.
Signed-off-by: Ian Kent <[email protected]>
Ian
---
diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
--- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix 2008-02-26 14:02:05.000000000 +0900
+++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-26 14:02:20.000000000 +0900
@@ -385,10 +385,8 @@ int autofs4_wait(struct autofs_sb_info *
/* Set mount requestor */
if (ino) {
- if (ino) {
- ino->uid = wq->uid;
- ino->gid = wq->gid;
- }
+ ino->uid = wq->uid;
+ ino->gid = wq->gid;
}
if (de)
On Tue, 26 Feb 2008 12:21:58 +0900 (WST) Ian Kent <[email protected]> wrote:
>
> Hi Andrew,
>
> There is a problem with active restarts in autofs (that is to
> say restarting autofs when there are busy mounts).
>
> Currently autofs uses "umount -l" to clear active mounts at
> restart. While using lazy umount works for most cases, anything
> that needs to walk back up the mount tree to construct a path,
> such as getcwd(2) and the proc file system /proc/<pid>/cwd, no
> longer works because the point from which the path is constructed
> has been detached from the mount tree.
>
> The actual problem with autofs is that it can't reconnect to
> existing mounts. Immediately one things of just adding the
> ability to remount autofs file systems would solve it, but
> alas, that can't work. This is because autofs direct mounts
> and the implementation of "on demand mount and expire" of
> nested mount trees have the file system mounted on top of
> the mount trigger dentry.
>
> To resolve this a miscellaneous device node for routing ioctl
> commands to these mount points has been implemented for the
> autofs4 kernel module.
>
> For those wishing to test this out an updated user space daemon
> is needed. Checking out and building from the git repo or
> applying all the current patches to the 5.0.3 tar distribution
> will do the trick. This is all available at the usual location
> on kernel.org.
>
Could we please be a bit more specific than "the usual location"?
Should autofs userspace have an entry in Documentation/Changes?
On Tue, 26 Feb 2008 12:23:20 +0900 (WST) Ian Kent <[email protected]> wrote:
> Hi Andrew,
>
> Patch to track the uid and gid of the last process to request
> a mount for on an autofs dentry.
>
> Signed-off-by: Ian Kent < [email protected]>
>
> Ian
>
> ---
> diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
> --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900
> +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900
> @@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str
>
> ino->flags = 0;
> ino->mode = mode;
> + ino->uid = 0;
> + ino->gid = 0;
> ino->inode = NULL;
> ino->dentry = NULL;
> ino->size = 0;
> diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
> --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900
> +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900
> @@ -58,6 +58,9 @@ struct autofs_info {
> unsigned long last_used;
> atomic_t count;
>
> + uid_t uid;
> + gid_t gid;
> +
> mode_t mode;
> size_t size;
>
> diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
> --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900
> +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
> @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
>
> status = wq->status;
>
> + /*
> + * For direct and offset mounts we need to track the requestor
> + * uid and gid in the dentry info struct. This is so it can be
> + * supplied, on request, by the misc device ioctl interface.
> + * This is needed during daemon resatart when reconnecting
> + * to existing, active, autofs mounts. The uid and gid (and
> + * related string values) may be used for macro substitution
> + * in autofs mount maps.
> + */
> + if (!status) {
> + struct dentry *de = NULL;
> +
> + /* direct mount or browsable map */
> + ino = autofs4_dentry_ino(dentry);
> + if (!ino) {
> + /* If not lookup actual dentry used */
> + de = d_lookup(dentry->d_parent, &dentry->d_name);
> + ino = autofs4_dentry_ino(de);
> + }
> +
> + /* Set mount requestor */
> + if (ino) {
> + if (ino) {
> + ino->uid = wq->uid;
> + ino->gid = wq->gid;
> + }
> + }
> +
> + if (de)
> + dput(de);
> + }
> +
But uids and gids are no longer system-wide-unique. Two different users
can have the same identifiers in different namespaces. What happens then?
On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <[email protected]> wrote:
> Hi Andrew,
>
> Patch to add miscellaneous device to autofs4 module for
> ioctls.
Could you please document the new kernel interface which you're proposing?
In Docmentation/ or in the changelog?
We seem to be passing some string into a miscdevice ioctl and getting some
results back. Be aware that this won't be a terribly popular proposal, so
I'd suggest that you fully describe the problem which it's trying to solve,
and how it solves it, and why the various alternatives (sysfs, netlink,
mount options, etc) were judged unsuitable.
> ...
>
> --- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
> +++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900
> @@ -29,11 +29,20 @@ static struct file_system_type autofs_fs
>
> static int __init init_autofs4_fs(void)
> {
> - return register_filesystem(&autofs_fs_type);
> + int err;
> +
> + err = register_filesystem(&autofs_fs_type);
> + if (err)
> + return err;
> +
> + err = autofs_dev_ioctl_init();
> +
> + return err;
> }
We should run unregister_filesystem() if autofs_dev_ioctl_init() fails.
> static void __exit exit_autofs4_fs(void)
> {
> + autofs_dev_ioctl_exit();
> unregister_filesystem(&autofs_fs_type);
> }
>
> diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
> --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl 2008-02-22 11:51:41.000000000 +0900
> +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-22 11:51:41.000000000 +0900
> @@ -14,6 +14,7 @@
> /* Internal header file for autofs */
>
> #include <linux/auto_fs4.h>
> +#include <linux/auto_dev-ioctl.h>
> #include <linux/mutex.h>
> #include <linux/list.h>
>
> @@ -40,6 +41,16 @@
> #define DPRINTK(fmt,args...) do {} while(0)
> #endif
>
> +#define WARN(fmt,args...) \
> +do { \
> + printk("KERN_WARNING pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \
> +} while(0)
> +
> +#define ERROR(fmt,args...) \
> +do { \
> + printk("KERN_ERR pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \
> +} while(0)
Please always pass all diffs through scripts/checkpatch.pl.
> /* Unified info structure. This is pointed to by both the dentry and
> inode structures. Each file in the filesystem has an instance of this
> structure. It holds a reference to the dentry, so dentries are never
> @@ -172,6 +183,17 @@ int autofs4_expire_run(struct super_bloc
> struct autofs_packet_expire __user *);
> int autofs4_expire_multi(struct super_block *, struct vfsmount *,
> struct autofs_sb_info *, int __user *);
> +struct dentry *autofs4_expire_direct(struct super_block *sb,
> + struct vfsmount *mnt,
> + struct autofs_sb_info *sbi, int how);
> +struct dentry *autofs4_expire_indirect(struct super_block *sb,
> + struct vfsmount *mnt,
> + struct autofs_sb_info *sbi, int how);
> +
> +/* Device node initialization */
> +
> +int autofs_dev_ioctl_init(void);
> +void autofs_dev_ioctl_exit(void);
>
> ...
>
> @@ -0,0 +1,788 @@
> +/*
> + * linux/fs/autofs4/dev-ioctl.c
We prefer not to bother with the filename-in-the-file thing. You know what
file you're reading, and these things tend to not get updated across
renames.
> + * Copyright 2008 Red Hat, Inc. All rights reserved.
> + * Copyright 2008 Ian Kent <[email protected]>
> + *
> + * This file is part of the Linux kernel and is made available under
> + * the terms of the GNU General Public License, version 2, or at your
> + * option, any later version, incorporated herein by reference.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/vmalloc.h>
> +#include <linux/miscdevice.h>
> +#include <linux/init.h>
> +#include <linux/wait.h>
> +#include <linux/namei.h>
> +#include <linux/fcntl.h>
> +#include <linux/file.h>
> +#include <linux/sched.h>
> +#include <linux/compat.h>
> +#include <linux/syscalls.h>
> +#include <linux/smp_lock.h>
> +#include <linux/magic.h>
> +#include <linux/dcache.h>
> +#include <asm/uaccess.h>
Please include <linux/foo.h> rather than <asm/foo.h> if the former exists.
> +#include "autofs_i.h"
> +
> +/*
> + * This module implements an interface for routing autofs ioctl control
> + * commands via a miscellaneous device file.
> + *
> + * The alternate interface is needed because we need to be able open
> + * an ioctl file descriptor on an autofs mount that may be covered by
> + * another mount. This situation arises when starting automount(8)
> + * or other user space daemon which uses direct mounts or offset
> + * mounts (used for autofs lazy mount/umount of nested mount trees),
> + * which have been left busy at at service shutdown.
> + */
> +
> +#define AUTOFS_DEVICE_NAME "autofs"
> +
> +#define AUTOFS_DEV_IOCTL_IOC_FIRST AUTOFS_DEV_IOCTL_VERSION
> +#define AUTOFS_DEV_IOCTL_IOC_COUNT AUTOFS_IOC_COUNT - 11
This needs parentheses.
Shouldn't these be in a header file, exported to userspace builds?
> +#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl)
> +
> +typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *, struct autofs_dev_ioctl *);
> +
> +static int check_name(const char *name)
> +{
> + if (!strchr(name, '/'))
> + return -EINVAL;
> + return 0;
> +}
> +
> +/*
> + * Check a string doesn't overrun the chunk of
> + * memory we copied from user land.
> + */
> +static int invalid_str(char *str, void *end)
> +{
> + while ((void *) str <= end)
> + if (!*str++)
> + return 0;
> + return -EINVAL;
> +}
>
> ...
>
> +/*
> + * Get the autofs super block info struct from the file opened on
> + * the autofs mount point.
> + */
> +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f)
> +{
> + struct autofs_sb_info *sbi = NULL;
> + struct inode *inode;
> +
> + if (f) {
`f' cannot be NULL here.
> + inode = f->f_path.dentry->d_inode;
> + sbi = autofs4_sbi(inode->i_sb);
> + }
> +
> + return sbi;
> +}
> +
> +/* Return autofs module protocol version */
> +static inline int autofs_dev_ioctl_protover(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + param->arg1 = sbi->version;
> + return 0;
> +}
> +
> +/* Return autofs module protocol sub version */
> +static inline int autofs_dev_ioctl_protosubver(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + param->arg1 = sbi->sub_version;
> + return 0;
> +}
Don't bother inlining things - the compiler will do it.
> +/*
> + * Walk down the mount stack looking for an autofs mount that
> + * has the requested device number (aka. new_encode_dev(sb->s_dev).
> + */
> +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno)
> +{
> + struct dentry *dentry;
> + struct inode *inode;
> + struct super_block *sb;
> + dev_t s_dev;
> + unsigned int err;
> +
> + err = -ENOENT;
> +
> + /* Lookup the dentry name at the base of our mount point */
> + dentry = d_lookup(nd->path.dentry, &nd->last);
> + if (!dentry)
> + goto out;
> +
> + dput(nd->path.dentry);
> + nd->path.dentry = dentry;
> +
> + /* And follow the mount stack looking for our autofs mount */
> + while (1) {
> + inode = nd->path.dentry->d_inode;
> + if (!inode)
> + continue;
> +
> + sb = inode->i_sb;
> + s_dev = new_encode_dev(sb->s_dev);
> + if (devno == s_dev) {
> + if (sb->s_magic == AUTOFS_SUPER_MAGIC) {
> + err = 0;
> + break;
> + }
> + }
> +
> + if (!follow_down(&nd->path.mnt, &nd->path.dentry))
> + goto out;
> + }
> +
> +out:
> + return err;
> +}
hm. possibly-interested parties cc'ed.
> +/*
> + * Install the file opened for autofs mount point control functions
> + * and set close on exec.
> + */
> +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file)
> +{
> + struct files_struct *files = current->files;
> + struct fdtable *fdt;
> +
> + spin_lock(&files->file_lock);
> + fdt = files_fdtable(files);
> + BUG_ON(fdt->fd[fd] != NULL);
> + rcu_assign_pointer(fdt->fd[fd], file);
> + FD_SET(fd, fdt->close_on_exec);
> + spin_unlock(&files->file_lock);
> +}
That's fd_install() plus an add-on. It's not autofs-specific. Should be
in fs/open.c, methinks?
>
> ...
>
> +/* Close file descriptor allocated above (user can also use close(2)). */
> +static inline int autofs_dev_ioctl_closemount(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + return sys_close(param->ioctlfd);
> +}
hm.
>
> ...
>
> +/*
> + * Set the pipe fd for kernel communication to the daemon.
> + *
> + * Normally this is set at mount using an option but if we
> + * are reconnecting to a busy mount then we need to use this
> + * to tell the autofs mount about the new kernel pipe fd. In
> + * order to protect mounts against incorrectly setting the
> + * pipefd we also require that the autofs mount be catatonic.
> + *
> + * This also sets the process group id used to identify the
> + * controlling process (eg. the owning automount(8) daemon).
> + */
> +static int autofs_dev_ioctl_setpipefd(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + int pipefd;
> + int err = 0;
> +
> + if (param->arg1 == -1)
> + return -EINVAL;
> +
> + pipefd = param->arg1;
> +
> + if (!sbi->catatonic)
> + return -EBUSY;
> + else {
> + struct file *pipe = fget(pipefd);
> + if (!pipe->f_op || !pipe->f_op->write) {
> + err = -EPIPE;
> + fput(pipe);
> + goto out;
> + }
> + sbi->oz_pgrp = task_pgrp_nr(current);
> + sbi->pipefd = pipefd;
> + sbi->pipe = pipe;
> + sbi->catatonic = 0;
> + }
> +out:
> + return err;
> +}
We have a new filesystem type, a misc device with a mysterious ioctl,
hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes
too.
This is a complex interface. We really need to see the overall problem
statement, design and interface description, please.
> +/*
> + * Make the autofs mount point catatonic, no longer responsive to
> + * mount requests. Also closes the kernel pipe file descriptor.
> + */
> +static inline int autofs_dev_ioctl_catatonic(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + if (!sbi->catatonic)
> + autofs4_catatonic_mode(sbi);
> + return 0;
> +}
> +
> +/* Set the autofs mount timeout */
> +static inline int autofs_dev_ioctl_timeout(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + unsigned long timeout;
> +
> + timeout = param->arg1;
> + param->arg1 = sbi->exp_timeout / HZ;
> + sbi->exp_timeout = timeout * HZ;
> + return 0;
> +}
uninline everything...
> +/*
> + * Return the uid and gid of the last request for the mount
> + *
> + * When reconstructing an autofs mount tree with active mounts
> + * we need to re-connect to mounts that may have used the original
> + * process uid and gid (or string variations of them) for mount
> + * lookups within the map entry.
> + */
> +static inline int autofs_dev_ioctl_requestor(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
especially that - it's only ever called indirectly anwyay!
> + struct autofs_info *ino;
> + struct nameidata nd;
> + const char *path;
> + dev_t devid;
> + int err = -ENOENT;
> +
> + if (param->size <= sizeof(*param)) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + path = param->path;
> + devid = sbi->sb->s_dev;
> +
> + param->arg1 = param->arg2 = -1;
> +
> + /* Get nameidata of the parent directory */
> + err = path_lookup(path, LOOKUP_PARENT, &nd);
> + if (err)
> + goto out;
> +
> + err = autofs_dev_ioctl_find_super(&nd, devid);
> + if (err)
> + goto out_release;
> +
> + ino = autofs4_dentry_ino(nd.path.dentry);
> + if (ino) {
> + err = 0;
> + param->arg1 = ino->uid;
> + param->arg2 = ino->gid;
> + }
> +
> +out_release:
> + path_put(&nd.path);
> +out:
> + return err;
> +}
> +
> +/*
> + * Call repeatedly until it returns -EAGAIN, meaning there's nothing
> + * more that can be done.
> + */
> +static int autofs_dev_ioctl_expire(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + struct dentry *dentry;
> + struct vfsmount *mnt;
> + int err = -EAGAIN;
> + int when;
> +
> + when = param->arg1;
> + mnt = fp->f_path.mnt;
> +
> + if (sbi->type & AUTOFS_TYPE_DIRECT)
> + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when);
> + else
> + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when);
> +
> + if (dentry) {
> + struct autofs_info *ino = autofs4_dentry_ino(dentry);
> +
> + /*
> + * This is synchronous because it makes the daemon a
> + * little easier
> + */
> + ino->flags |= AUTOFS_INF_EXPIRING;
> + err = autofs4_wait(sbi, dentry, NFY_EXPIRE);
> + ino->flags &= ~AUTOFS_INF_EXPIRING;
Are there races around the modification of ino->flags here?
> + dput(dentry);
> + }
> +
> + return err;
> +}
> +
> +/* Check if autofs mount point is in use */
> +static inline int autofs_dev_ioctl_askumount(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
> +{
> + param->arg1 = 0;
> + if (may_umount(fp->f_path.mnt))
> + param->arg1 = 1;
> + return 0;
> +}
> +
> +/*
> + * Check if the given path is a mountpoint.
> + *
> + * If we are supplied with the file descriptor of the autofs
> + * mount we're looking for a specific mount. In this case
> + * the path is considered a mountpoint if it is itself a
> + * mountpoint or contains a mount, such as a multi-mount
> + * without a root mount.
> + *
> + * If we aren't supplied with a file descriptor then we lookup
> + * the nameidata of the path and check if is a mounted autofs
> + * filesystem or a filesystem mounted within an autofs filesystem.
> + *
> + * In both cases we return an indication of whether the path
> + * is a mount point and the super magic of the covering mount.
> + */
> +static inline int autofs_dev_ioctl_ismountpoint(struct file *fp,
> + struct autofs_sb_info *sbi,
> + struct autofs_dev_ioctl *param)
uninline..
> + struct nameidata nd;
> + const char *path;
> + int err = -ENOENT;
> +
> + if (param->size <= sizeof(*param)) {
> + err = -EINVAL;
> + goto out;
> + }
> +
> + path = param->path;
> +
> + param->arg1 = param->arg2 = 0;
> +
> + if (param->ioctlfd == -1) {
> + unsigned long magic;
> +
> + err = path_lookup(path, LOOKUP_FOLLOW, &nd);
> + if (err)
> + goto out;
> +
> + magic = nd.path.dentry->d_inode->i_sb->s_magic;
> +
> + if (follow_up(&nd.path.mnt, &nd.path.dentry)) {
> + struct inode *inode = nd.path.dentry->d_inode;
> + if (magic == AUTOFS_SUPER_MAGIC ||
> + inode->i_sb->s_magic == AUTOFS_SUPER_MAGIC) {
> + param->arg1 = 1;
> + param->arg2 = magic;
> + }
> + }
> + } else {
> + dev_t devid = sbi->sb->s_dev;
> +
> + err = path_lookup(path, LOOKUP_PARENT, &nd);
> + if (err)
> + goto out;
> +
> + err = autofs_dev_ioctl_find_super(&nd, devid);
> + if (err)
> + goto out_release;
> +
> + param->arg1 = have_submounts(nd.path.dentry);
> +
> + if (d_mountpoint(nd.path.dentry)) {
> + if (follow_down(&nd.path.mnt, &nd.path.dentry)) {
> + struct inode *inode = nd.path.dentry->d_inode;
> + param->arg2 = inode->i_sb->s_magic;
> + }
> + }
> + }
> +
> +out_release:
> + path_put(&nd.path);
> +out:
> + return err;
> +}
Have you really carefully reviewed and tested what happens when non-autofs
fds are fed into all the ioctl modes?
I hope all these ioctl entrypoints are root-only. What determines that?
The miscdevice permissions?
> +/*
> + * Our range of ioctl numbers isn't 0 based so we need to shift
> + * the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table
> + * lookup.
> + */
> +#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST))
> +
> +static ioctl_fn lookup_dev_ioctl(unsigned int cmd)
> +{
> + static struct {
> + int cmd;
> + ioctl_fn fn;
> + } _ioctls[] = {
> + {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL},
> + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD), autofs_dev_ioctl_protover},
> + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD), autofs_dev_ioctl_protosubver},
> + {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD), autofs_dev_ioctl_openmount},
> + {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD), autofs_dev_ioctl_closemount},
> + {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD), autofs_dev_ioctl_ready},
> + {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD), autofs_dev_ioctl_fail},
> + {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD), autofs_dev_ioctl_setpipefd},
> + {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD), autofs_dev_ioctl_catatonic},
> + {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD), autofs_dev_ioctl_timeout},
> + {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTOR_CMD), autofs_dev_ioctl_requestor},
> + {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD), autofs_dev_ioctl_expire},
> + {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD), autofs_dev_ioctl_askumount},
> + {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint}
> + };
> + int idx = cmd_idx(cmd);
`idx' should have unsigned type.
> + return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn;
> +}
> +
> +/* ioctl dispatcher */
> +static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user)
> +{
> + struct autofs_dev_ioctl *param;
> + struct file *fp;
> + struct autofs_sb_info *sbi;
> + unsigned int cmd;
> + ioctl_fn fn = NULL;
> + int err = 0;
> +
> + /* only root can play with this */
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
OK, I guess that answers my above question.
> + cmd = _IOC_NR(command);
> +
> + if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) ||
> + cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST) >= AUTOFS_DEV_IOCTL_IOC_COUNT) {
> + return -ENOTTY;
> + }
> +
> + fn = lookup_dev_ioctl(cmd);
> + if (!fn) {
> + WARN("unknown command 0x%08x", command);
> + return -ENOTTY;
> + }
> +
> + /*
> + * Trying to avoid low memory issues when a device is
> + * suspended.
> + */
> + current->flags |= PF_MEMALLOC;
whoa, what's this doing here?
> + /* Copy the parameters into kernel space. */
> + param = copy_dev_ioctl(user);
> +
> + current->flags &= ~PF_MEMALLOC;
> +
> + if (IS_ERR(param))
> + return PTR_ERR(param);
> +
> + err = validate_dev_ioctl(command, param);
> + if (err)
> + goto out;
> +
> + /* The validate routine above always sets the version */
> + if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD)
> + goto done;
> +
> + fp = NULL;
> + sbi = NULL;
> +
> + /*
> + * For obvious reasons the openmount can't have a file
> + * descriptor yet. We don't take a reference to the
> + * file during close to allow for immediate release.
> + */
> + if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD &&
> + cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) {
> + fp = fget(param->ioctlfd);
> + if (!fp) {
> + if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD)
> + goto cont;
> + err = -EBADF;
> + goto out;
> + }
> +
> + if (!fp->f_op) {
> + err = -ENOTTY;
> + fput(fp);
> + goto out;
> + }
> +
> + sbi = autofs_dev_ioctl_sbi(fp);
> + if (!sbi) {
> + err = -EINVAL;
> + fput(fp);
> + goto out;
> + }
> + }
> +cont:
> + err = fn(fp, sbi, param);
> +
> + if (fp)
> + fput(fp);
> +done:
> + if (!err && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE))
> + err = -EFAULT;
> +out:
> + free_dev_ioctl(param);
> + return err;
> +}
> +
>
> ...
>
> diff -up /dev/null linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h
> --- /dev/null 2008-02-22 18:55:57.149000956 +0900
> +++ linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h 2008-02-22 11:51:41.000000000 +0900
> @@ -0,0 +1,114 @@
> +/*
> + * linux/include/linux/auto_dev-ioctl.h
> + *
> + * Copyright 2008 Red Hat, Inc. All rights reserved.
> + * Copyright 2008 Ian Kent <[email protected]>
> + *
> + * This file is part of the Linux kernel and is made available under
> + * the terms of the GNU General Public License, version 2, or at your
> + * option, any later version, incorporated herein by reference.
> + */
> +
> +#ifndef _LINUX_AUTO_DEV_IOCTL_H
> +#define _LINUX_AUTO_DEV_IOCTL_H
> +
> +#include <linux/types.h>
> +
> +#define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
> +#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
> +
> +#define AUTOFS_DEVID_LEN 16
> +
> +/*
> + * An ioctl interface for autofs mount point control.
> + */
> +
> +/*
> + * All the ioctls use this structure.
> + * When sending a path size must account for the total length
> + * of the chunk of memory otherwise is is the size of the
> + * structure.
> + */
> +
> +struct autofs_dev_ioctl {
> + __u32 ver_major;
> + __u32 ver_minor;
> + __u32 size; /* total size of data passed in
> + * including this struct */
> + __s32 ioctlfd; /* automount command fd */
> +
> + __u32 arg1; /* Command parameters */
> + __u32 arg2;
> +
> + char path[0];
> +};
> +
> +static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in)
> +{
> + in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR;
> + in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR;
> + in->size = sizeof(struct autofs_dev_ioctl);
> + in->ioctlfd = -1;
> + in->arg1 = 0;
> + in->arg2 = 0;
> + return;
> +}
uninline..
> +/*
> + * If you change this make sure you make the corresponding change
> + * to autofs-dev-ioctl.c:lookup_ioctl()
Can we do this automatically via preprocessor tricks, or whatever?
> + */
> +enum {
> + /* Get various version info */
> + AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71,
> + AUTOFS_DEV_IOCTL_PROTOVER_CMD,
> + AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD,
> +
> + /* Open mount ioctl fd */
> + AUTOFS_DEV_IOCTL_OPENMOUNT_CMD,
> +
> + /* Close mount ioctl fd */
> + AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD,
> +
> + /* Mount/expire status returns */
> + AUTOFS_DEV_IOCTL_READY_CMD,
> + AUTOFS_DEV_IOCTL_FAIL_CMD,
> +
> + /* Activate/deactivate autofs mount */
> + AUTOFS_DEV_IOCTL_SETPIPEFD_CMD,
> + AUTOFS_DEV_IOCTL_CATATONIC_CMD,
> +
> + /* Expiry timeout */
> + AUTOFS_DEV_IOCTL_TIMEOUT_CMD,
> +
> + /* Get mount last requesting uid and gid */
> + AUTOFS_DEV_IOCTL_REQUESTOR_CMD,
> +
> + /* Check for eligible expire candidates */
> + AUTOFS_DEV_IOCTL_EXPIRE_CMD,
> +
> + /* Request busy status */
> + AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD,
> +
> + /* Check if path is a mountpoint */
> + AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD,
> +};
> +
On Tue, 26 Feb 2008 13:29:07 +0900 (WST) Ian Kent <[email protected]> wrote:
> --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option 2008-02-20 13:01:06.000000000 +0900
> +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:03:45.000000000 +0900
> @@ -190,6 +190,7 @@ static int autofs4_show_options(struct s
> seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ);
> seq_printf(m, ",minproto=%d", sbi->min_proto);
> seq_printf(m, ",maxproto=%d", sbi->max_proto);
> + seq_printf(m, ",dev=%d", autofs4_get_dev(sbi));
%u would be more appropriate here.
On Wed, 2008-02-27 at 20:40 -0800, Andrew Morton wrote:
> On Tue, 26 Feb 2008 12:21:58 +0900 (WST) Ian Kent <[email protected]> wrote:
>
> >
> > Hi Andrew,
> >
> > There is a problem with active restarts in autofs (that is to
> > say restarting autofs when there are busy mounts).
> >
> > Currently autofs uses "umount -l" to clear active mounts at
> > restart. While using lazy umount works for most cases, anything
> > that needs to walk back up the mount tree to construct a path,
> > such as getcwd(2) and the proc file system /proc/<pid>/cwd, no
> > longer works because the point from which the path is constructed
> > has been detached from the mount tree.
> >
> > The actual problem with autofs is that it can't reconnect to
> > existing mounts. Immediately one things of just adding the
> > ability to remount autofs file systems would solve it, but
> > alas, that can't work. This is because autofs direct mounts
> > and the implementation of "on demand mount and expire" of
> > nested mount trees have the file system mounted on top of
> > the mount trigger dentry.
> >
> > To resolve this a miscellaneous device node for routing ioctl
> > commands to these mount points has been implemented for the
> > autofs4 kernel module.
> >
> > For those wishing to test this out an updated user space daemon
> > is needed. Checking out and building from the git repo or
> > applying all the current patches to the 5.0.3 tar distribution
> > will do the trick. This is all available at the usual location
> > on kernel.org.
> >
>
> Could we please be a bit more specific than "the usual location"?
Yes, I should have been more specific.
I'll fix that.
>
> Should autofs userspace have an entry in Documentation/Changes?
Sound like a sensible thing to do.
I'll include a patch for that when I re-post the patch set.
Ian
On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote:
> On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > Patch to add miscellaneous device to autofs4 module for
> > ioctls.
>
> Could you please document the new kernel interface which you're proposing?
> In Docmentation/ or in the changelog?
Also a good suggestion.
I'll include that too.
>
> We seem to be passing some string into a miscdevice ioctl and getting some
> results back. Be aware that this won't be a terribly popular proposal, so
> I'd suggest that you fully describe the problem which it's trying to solve,
> and how it solves it, and why the various alternatives (sysfs, netlink,
> mount options, etc) were judged unsuitable.
Yes, as I said above.
I don't expect that people that aren't close to the development of
autofs will "get" the problem description in the leading post but I will
try and expand on it as best I can.
As for the possible alternatives, it sounds like I have some more work
to do on that. Mount options can't be used as I described in the lead in
post and, as far as my understanding of sysfs goes, I don't think it's
appropriate. But, I'm not aware of what the netlink interface may be
able to do for me so I will need to check on that.
>
>
> > ...
> >
> > --- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900
> > +++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900
> > @@ -29,11 +29,20 @@ static struct file_system_type autofs_fs
> >
> > static int __init init_autofs4_fs(void)
> > {
> > - return register_filesystem(&autofs_fs_type);
> > + int err;
> > +
> > + err = register_filesystem(&autofs_fs_type);
> > + if (err)
> > + return err;
> > +
> > + err = autofs_dev_ioctl_init();
> > +
> > + return err;
> > }
>
> We should run unregister_filesystem() if autofs_dev_ioctl_init() fails.
>
> > static void __exit exit_autofs4_fs(void)
> > {
> > + autofs_dev_ioctl_exit();
> > unregister_filesystem(&autofs_fs_type);
> > }
> >
> > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
> > --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl 2008-02-22 11:51:41.000000000 +0900
> > +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-22 11:51:41.000000000 +0900
> > @@ -14,6 +14,7 @@
> > /* Internal header file for autofs */
> >
> > #include <linux/auto_fs4.h>
> > +#include <linux/auto_dev-ioctl.h>
> > #include <linux/mutex.h>
> > #include <linux/list.h>
> >
> > @@ -40,6 +41,16 @@
> > #define DPRINTK(fmt,args...) do {} while(0)
> > #endif
> >
> > +#define WARN(fmt,args...) \
> > +do { \
> > + printk("KERN_WARNING pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \
> > +} while(0)
> > +
> > +#define ERROR(fmt,args...) \
> > +do { \
> > + printk("KERN_ERR pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \
> > +} while(0)
>
> Please always pass all diffs through scripts/checkpatch.pl.
>
> > /* Unified info structure. This is pointed to by both the dentry and
> > inode structures. Each file in the filesystem has an instance of this
> > structure. It holds a reference to the dentry, so dentries are never
> > @@ -172,6 +183,17 @@ int autofs4_expire_run(struct super_bloc
> > struct autofs_packet_expire __user *);
> > int autofs4_expire_multi(struct super_block *, struct vfsmount *,
> > struct autofs_sb_info *, int __user *);
> > +struct dentry *autofs4_expire_direct(struct super_block *sb,
> > + struct vfsmount *mnt,
> > + struct autofs_sb_info *sbi, int how);
> > +struct dentry *autofs4_expire_indirect(struct super_block *sb,
> > + struct vfsmount *mnt,
> > + struct autofs_sb_info *sbi, int how);
> > +
> > +/* Device node initialization */
> > +
> > +int autofs_dev_ioctl_init(void);
> > +void autofs_dev_ioctl_exit(void);
> >
> > ...
> >
> > @@ -0,0 +1,788 @@
> > +/*
> > + * linux/fs/autofs4/dev-ioctl.c
>
> We prefer not to bother with the filename-in-the-file thing. You know what
> file you're reading, and these things tend to not get updated across
> renames.
>
> > + * Copyright 2008 Red Hat, Inc. All rights reserved.
> > + * Copyright 2008 Ian Kent <[email protected]>
> > + *
> > + * This file is part of the Linux kernel and is made available under
> > + * the terms of the GNU General Public License, version 2, or at your
> > + * option, any later version, incorporated herein by reference.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/miscdevice.h>
> > +#include <linux/init.h>
> > +#include <linux/wait.h>
> > +#include <linux/namei.h>
> > +#include <linux/fcntl.h>
> > +#include <linux/file.h>
> > +#include <linux/sched.h>
> > +#include <linux/compat.h>
> > +#include <linux/syscalls.h>
> > +#include <linux/smp_lock.h>
> > +#include <linux/magic.h>
> > +#include <linux/dcache.h>
> > +#include <asm/uaccess.h>
>
> Please include <linux/foo.h> rather than <asm/foo.h> if the former exists.
>
> > +#include "autofs_i.h"
> > +
> > +/*
> > + * This module implements an interface for routing autofs ioctl control
> > + * commands via a miscellaneous device file.
> > + *
> > + * The alternate interface is needed because we need to be able open
> > + * an ioctl file descriptor on an autofs mount that may be covered by
> > + * another mount. This situation arises when starting automount(8)
> > + * or other user space daemon which uses direct mounts or offset
> > + * mounts (used for autofs lazy mount/umount of nested mount trees),
> > + * which have been left busy at at service shutdown.
> > + */
> > +
> > +#define AUTOFS_DEVICE_NAME "autofs"
> > +
> > +#define AUTOFS_DEV_IOCTL_IOC_FIRST AUTOFS_DEV_IOCTL_VERSION
> > +#define AUTOFS_DEV_IOCTL_IOC_COUNT AUTOFS_IOC_COUNT - 11
>
> This needs parentheses.
>
> Shouldn't these be in a header file, exported to userspace builds?
>
> > +#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl)
> > +
> > +typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *, struct autofs_dev_ioctl *);
> > +
> > +static int check_name(const char *name)
> > +{
> > + if (!strchr(name, '/'))
> > + return -EINVAL;
> > + return 0;
> > +}
> > +
> > +/*
> > + * Check a string doesn't overrun the chunk of
> > + * memory we copied from user land.
> > + */
> > +static int invalid_str(char *str, void *end)
> > +{
> > + while ((void *) str <= end)
> > + if (!*str++)
> > + return 0;
> > + return -EINVAL;
> > +}
> >
> > ...
> >
> > +/*
> > + * Get the autofs super block info struct from the file opened on
> > + * the autofs mount point.
> > + */
> > +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f)
> > +{
> > + struct autofs_sb_info *sbi = NULL;
> > + struct inode *inode;
> > +
> > + if (f) {
>
> `f' cannot be NULL here.
>
> > + inode = f->f_path.dentry->d_inode;
> > + sbi = autofs4_sbi(inode->i_sb);
> > + }
> > +
> > + return sbi;
> > +}
> > +
> > +/* Return autofs module protocol version */
> > +static inline int autofs_dev_ioctl_protover(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + param->arg1 = sbi->version;
> > + return 0;
> > +}
> > +
> > +/* Return autofs module protocol sub version */
> > +static inline int autofs_dev_ioctl_protosubver(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + param->arg1 = sbi->sub_version;
> > + return 0;
> > +}
>
> Don't bother inlining things - the compiler will do it.
>
> > +/*
> > + * Walk down the mount stack looking for an autofs mount that
> > + * has the requested device number (aka. new_encode_dev(sb->s_dev).
> > + */
> > +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno)
> > +{
> > + struct dentry *dentry;
> > + struct inode *inode;
> > + struct super_block *sb;
> > + dev_t s_dev;
> > + unsigned int err;
> > +
> > + err = -ENOENT;
> > +
> > + /* Lookup the dentry name at the base of our mount point */
> > + dentry = d_lookup(nd->path.dentry, &nd->last);
> > + if (!dentry)
> > + goto out;
> > +
> > + dput(nd->path.dentry);
> > + nd->path.dentry = dentry;
> > +
> > + /* And follow the mount stack looking for our autofs mount */
> > + while (1) {
> > + inode = nd->path.dentry->d_inode;
> > + if (!inode)
> > + continue;
> > +
> > + sb = inode->i_sb;
> > + s_dev = new_encode_dev(sb->s_dev);
> > + if (devno == s_dev) {
> > + if (sb->s_magic == AUTOFS_SUPER_MAGIC) {
> > + err = 0;
> > + break;
> > + }
> > + }
> > +
> > + if (!follow_down(&nd->path.mnt, &nd->path.dentry))
> > + goto out;
> > + }
> > +
> > +out:
> > + return err;
> > +}
>
> hm. possibly-interested parties cc'ed.
>
> > +/*
> > + * Install the file opened for autofs mount point control functions
> > + * and set close on exec.
> > + */
> > +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file)
> > +{
> > + struct files_struct *files = current->files;
> > + struct fdtable *fdt;
> > +
> > + spin_lock(&files->file_lock);
> > + fdt = files_fdtable(files);
> > + BUG_ON(fdt->fd[fd] != NULL);
> > + rcu_assign_pointer(fdt->fd[fd], file);
> > + FD_SET(fd, fdt->close_on_exec);
> > + spin_unlock(&files->file_lock);
> > +}
>
> That's fd_install() plus an add-on. It's not autofs-specific. Should be
> in fs/open.c, methinks?
>
> >
> > ...
> >
> > +/* Close file descriptor allocated above (user can also use close(2)). */
> > +static inline int autofs_dev_ioctl_closemount(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + return sys_close(param->ioctlfd);
> > +}
>
> hm.
>
> >
> > ...
> >
> > +/*
> > + * Set the pipe fd for kernel communication to the daemon.
> > + *
> > + * Normally this is set at mount using an option but if we
> > + * are reconnecting to a busy mount then we need to use this
> > + * to tell the autofs mount about the new kernel pipe fd. In
> > + * order to protect mounts against incorrectly setting the
> > + * pipefd we also require that the autofs mount be catatonic.
> > + *
> > + * This also sets the process group id used to identify the
> > + * controlling process (eg. the owning automount(8) daemon).
> > + */
> > +static int autofs_dev_ioctl_setpipefd(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + int pipefd;
> > + int err = 0;
> > +
> > + if (param->arg1 == -1)
> > + return -EINVAL;
> > +
> > + pipefd = param->arg1;
> > +
> > + if (!sbi->catatonic)
> > + return -EBUSY;
> > + else {
> > + struct file *pipe = fget(pipefd);
> > + if (!pipe->f_op || !pipe->f_op->write) {
> > + err = -EPIPE;
> > + fput(pipe);
> > + goto out;
> > + }
> > + sbi->oz_pgrp = task_pgrp_nr(current);
> > + sbi->pipefd = pipefd;
> > + sbi->pipe = pipe;
> > + sbi->catatonic = 0;
> > + }
> > +out:
> > + return err;
> > +}
>
> We have a new filesystem type, a misc device with a mysterious ioctl,
> hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes
> too.
>
> This is a complex interface. We really need to see the overall problem
> statement, design and interface description, please.
>
> > +/*
> > + * Make the autofs mount point catatonic, no longer responsive to
> > + * mount requests. Also closes the kernel pipe file descriptor.
> > + */
> > +static inline int autofs_dev_ioctl_catatonic(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + if (!sbi->catatonic)
> > + autofs4_catatonic_mode(sbi);
> > + return 0;
> > +}
> > +
> > +/* Set the autofs mount timeout */
> > +static inline int autofs_dev_ioctl_timeout(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + unsigned long timeout;
> > +
> > + timeout = param->arg1;
> > + param->arg1 = sbi->exp_timeout / HZ;
> > + sbi->exp_timeout = timeout * HZ;
> > + return 0;
> > +}
>
> uninline everything...
>
> > +/*
> > + * Return the uid and gid of the last request for the mount
> > + *
> > + * When reconstructing an autofs mount tree with active mounts
> > + * we need to re-connect to mounts that may have used the original
> > + * process uid and gid (or string variations of them) for mount
> > + * lookups within the map entry.
> > + */
> > +static inline int autofs_dev_ioctl_requestor(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
>
> especially that - it's only ever called indirectly anwyay!
>
> > + struct autofs_info *ino;
> > + struct nameidata nd;
> > + const char *path;
> > + dev_t devid;
> > + int err = -ENOENT;
> > +
> > + if (param->size <= sizeof(*param)) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + path = param->path;
> > + devid = sbi->sb->s_dev;
> > +
> > + param->arg1 = param->arg2 = -1;
> > +
> > + /* Get nameidata of the parent directory */
> > + err = path_lookup(path, LOOKUP_PARENT, &nd);
> > + if (err)
> > + goto out;
> > +
> > + err = autofs_dev_ioctl_find_super(&nd, devid);
> > + if (err)
> > + goto out_release;
> > +
> > + ino = autofs4_dentry_ino(nd.path.dentry);
> > + if (ino) {
> > + err = 0;
> > + param->arg1 = ino->uid;
> > + param->arg2 = ino->gid;
> > + }
> > +
> > +out_release:
> > + path_put(&nd.path);
> > +out:
> > + return err;
> > +}
> > +
> > +/*
> > + * Call repeatedly until it returns -EAGAIN, meaning there's nothing
> > + * more that can be done.
> > + */
> > +static int autofs_dev_ioctl_expire(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + struct dentry *dentry;
> > + struct vfsmount *mnt;
> > + int err = -EAGAIN;
> > + int when;
> > +
> > + when = param->arg1;
> > + mnt = fp->f_path.mnt;
> > +
> > + if (sbi->type & AUTOFS_TYPE_DIRECT)
> > + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when);
> > + else
> > + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when);
> > +
> > + if (dentry) {
> > + struct autofs_info *ino = autofs4_dentry_ino(dentry);
> > +
> > + /*
> > + * This is synchronous because it makes the daemon a
> > + * little easier
> > + */
> > + ino->flags |= AUTOFS_INF_EXPIRING;
> > + err = autofs4_wait(sbi, dentry, NFY_EXPIRE);
> > + ino->flags &= ~AUTOFS_INF_EXPIRING;
>
> Are there races around the modification of ino->flags here?
>
> > + dput(dentry);
> > + }
> > +
> > + return err;
> > +}
> > +
> > +/* Check if autofs mount point is in use */
> > +static inline int autofs_dev_ioctl_askumount(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + param->arg1 = 0;
> > + if (may_umount(fp->f_path.mnt))
> > + param->arg1 = 1;
> > + return 0;
> > +}
> > +
> > +/*
> > + * Check if the given path is a mountpoint.
> > + *
> > + * If we are supplied with the file descriptor of the autofs
> > + * mount we're looking for a specific mount. In this case
> > + * the path is considered a mountpoint if it is itself a
> > + * mountpoint or contains a mount, such as a multi-mount
> > + * without a root mount.
> > + *
> > + * If we aren't supplied with a file descriptor then we lookup
> > + * the nameidata of the path and check if is a mounted autofs
> > + * filesystem or a filesystem mounted within an autofs filesystem.
> > + *
> > + * In both cases we return an indication of whether the path
> > + * is a mount point and the super magic of the covering mount.
> > + */
> > +static inline int autofs_dev_ioctl_ismountpoint(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
>
> uninline..
>
> > + struct nameidata nd;
> > + const char *path;
> > + int err = -ENOENT;
> > +
> > + if (param->size <= sizeof(*param)) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + path = param->path;
> > +
> > + param->arg1 = param->arg2 = 0;
> > +
> > + if (param->ioctlfd == -1) {
> > + unsigned long magic;
> > +
> > + err = path_lookup(path, LOOKUP_FOLLOW, &nd);
> > + if (err)
> > + goto out;
> > +
> > + magic = nd.path.dentry->d_inode->i_sb->s_magic;
> > +
> > + if (follow_up(&nd.path.mnt, &nd.path.dentry)) {
> > + struct inode *inode = nd.path.dentry->d_inode;
> > + if (magic == AUTOFS_SUPER_MAGIC ||
> > + inode->i_sb->s_magic == AUTOFS_SUPER_MAGIC) {
> > + param->arg1 = 1;
> > + param->arg2 = magic;
> > + }
> > + }
> > + } else {
> > + dev_t devid = sbi->sb->s_dev;
> > +
> > + err = path_lookup(path, LOOKUP_PARENT, &nd);
> > + if (err)
> > + goto out;
> > +
> > + err = autofs_dev_ioctl_find_super(&nd, devid);
> > + if (err)
> > + goto out_release;
> > +
> > + param->arg1 = have_submounts(nd.path.dentry);
> > +
> > + if (d_mountpoint(nd.path.dentry)) {
> > + if (follow_down(&nd.path.mnt, &nd.path.dentry)) {
> > + struct inode *inode = nd.path.dentry->d_inode;
> > + param->arg2 = inode->i_sb->s_magic;
> > + }
> > + }
> > + }
> > +
> > +out_release:
> > + path_put(&nd.path);
> > +out:
> > + return err;
> > +}
>
> Have you really carefully reviewed and tested what happens when non-autofs
> fds are fed into all the ioctl modes?
>
> I hope all these ioctl entrypoints are root-only. What determines that?
> The miscdevice permissions?
>
> > +/*
> > + * Our range of ioctl numbers isn't 0 based so we need to shift
> > + * the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table
> > + * lookup.
> > + */
> > +#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST))
> > +
> > +static ioctl_fn lookup_dev_ioctl(unsigned int cmd)
> > +{
> > + static struct {
> > + int cmd;
> > + ioctl_fn fn;
> > + } _ioctls[] = {
> > + {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD), autofs_dev_ioctl_protover},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD), autofs_dev_ioctl_protosubver},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD), autofs_dev_ioctl_openmount},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD), autofs_dev_ioctl_closemount},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD), autofs_dev_ioctl_ready},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD), autofs_dev_ioctl_fail},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD), autofs_dev_ioctl_setpipefd},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD), autofs_dev_ioctl_catatonic},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD), autofs_dev_ioctl_timeout},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTOR_CMD), autofs_dev_ioctl_requestor},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD), autofs_dev_ioctl_expire},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD), autofs_dev_ioctl_askumount},
> > + {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint}
> > + };
> > + int idx = cmd_idx(cmd);
>
> `idx' should have unsigned type.
>
> > + return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn;
> > +}
> > +
> > +/* ioctl dispatcher */
> > +static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user)
> > +{
> > + struct autofs_dev_ioctl *param;
> > + struct file *fp;
> > + struct autofs_sb_info *sbi;
> > + unsigned int cmd;
> > + ioctl_fn fn = NULL;
> > + int err = 0;
> > +
> > + /* only root can play with this */
> > + if (!capable(CAP_SYS_ADMIN))
> > + return -EPERM;
>
> OK, I guess that answers my above question.
>
> > + cmd = _IOC_NR(command);
> > +
> > + if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) ||
> > + cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST) >= AUTOFS_DEV_IOCTL_IOC_COUNT) {
> > + return -ENOTTY;
> > + }
> > +
> > + fn = lookup_dev_ioctl(cmd);
> > + if (!fn) {
> > + WARN("unknown command 0x%08x", command);
> > + return -ENOTTY;
> > + }
> > +
> > + /*
> > + * Trying to avoid low memory issues when a device is
> > + * suspended.
> > + */
> > + current->flags |= PF_MEMALLOC;
>
> whoa, what's this doing here?
>
> > + /* Copy the parameters into kernel space. */
> > + param = copy_dev_ioctl(user);
> > +
> > + current->flags &= ~PF_MEMALLOC;
> > +
> > + if (IS_ERR(param))
> > + return PTR_ERR(param);
> > +
> > + err = validate_dev_ioctl(command, param);
> > + if (err)
> > + goto out;
> > +
> > + /* The validate routine above always sets the version */
> > + if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD)
> > + goto done;
> > +
> > + fp = NULL;
> > + sbi = NULL;
> > +
> > + /*
> > + * For obvious reasons the openmount can't have a file
> > + * descriptor yet. We don't take a reference to the
> > + * file during close to allow for immediate release.
> > + */
> > + if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD &&
> > + cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) {
> > + fp = fget(param->ioctlfd);
> > + if (!fp) {
> > + if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD)
> > + goto cont;
> > + err = -EBADF;
> > + goto out;
> > + }
> > +
> > + if (!fp->f_op) {
> > + err = -ENOTTY;
> > + fput(fp);
> > + goto out;
> > + }
> > +
> > + sbi = autofs_dev_ioctl_sbi(fp);
> > + if (!sbi) {
> > + err = -EINVAL;
> > + fput(fp);
> > + goto out;
> > + }
> > + }
> > +cont:
> > + err = fn(fp, sbi, param);
> > +
> > + if (fp)
> > + fput(fp);
> > +done:
> > + if (!err && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE))
> > + err = -EFAULT;
> > +out:
> > + free_dev_ioctl(param);
> > + return err;
> > +}
> > +
> >
> > ...
> >
> > diff -up /dev/null linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h
> > --- /dev/null 2008-02-22 18:55:57.149000956 +0900
> > +++ linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h 2008-02-22 11:51:41.000000000 +0900
> > @@ -0,0 +1,114 @@
> > +/*
> > + * linux/include/linux/auto_dev-ioctl.h
> > + *
> > + * Copyright 2008 Red Hat, Inc. All rights reserved.
> > + * Copyright 2008 Ian Kent <[email protected]>
> > + *
> > + * This file is part of the Linux kernel and is made available under
> > + * the terms of the GNU General Public License, version 2, or at your
> > + * option, any later version, incorporated herein by reference.
> > + */
> > +
> > +#ifndef _LINUX_AUTO_DEV_IOCTL_H
> > +#define _LINUX_AUTO_DEV_IOCTL_H
> > +
> > +#include <linux/types.h>
> > +
> > +#define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
> > +#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
> > +
> > +#define AUTOFS_DEVID_LEN 16
> > +
> > +/*
> > + * An ioctl interface for autofs mount point control.
> > + */
> > +
> > +/*
> > + * All the ioctls use this structure.
> > + * When sending a path size must account for the total length
> > + * of the chunk of memory otherwise is is the size of the
> > + * structure.
> > + */
> > +
> > +struct autofs_dev_ioctl {
> > + __u32 ver_major;
> > + __u32 ver_minor;
> > + __u32 size; /* total size of data passed in
> > + * including this struct */
> > + __s32 ioctlfd; /* automount command fd */
> > +
> > + __u32 arg1; /* Command parameters */
> > + __u32 arg2;
> > +
> > + char path[0];
> > +};
> > +
> > +static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in)
> > +{
> > + in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR;
> > + in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR;
> > + in->size = sizeof(struct autofs_dev_ioctl);
> > + in->ioctlfd = -1;
> > + in->arg1 = 0;
> > + in->arg2 = 0;
> > + return;
> > +}
>
> uninline..
>
> > +/*
> > + * If you change this make sure you make the corresponding change
> > + * to autofs-dev-ioctl.c:lookup_ioctl()
>
> Can we do this automatically via preprocessor tricks, or whatever?
>
> > + */
> > +enum {
> > + /* Get various version info */
> > + AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71,
> > + AUTOFS_DEV_IOCTL_PROTOVER_CMD,
> > + AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD,
> > +
> > + /* Open mount ioctl fd */
> > + AUTOFS_DEV_IOCTL_OPENMOUNT_CMD,
> > +
> > + /* Close mount ioctl fd */
> > + AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD,
> > +
> > + /* Mount/expire status returns */
> > + AUTOFS_DEV_IOCTL_READY_CMD,
> > + AUTOFS_DEV_IOCTL_FAIL_CMD,
> > +
> > + /* Activate/deactivate autofs mount */
> > + AUTOFS_DEV_IOCTL_SETPIPEFD_CMD,
> > + AUTOFS_DEV_IOCTL_CATATONIC_CMD,
> > +
> > + /* Expiry timeout */
> > + AUTOFS_DEV_IOCTL_TIMEOUT_CMD,
> > +
> > + /* Get mount last requesting uid and gid */
> > + AUTOFS_DEV_IOCTL_REQUESTOR_CMD,
> > +
> > + /* Check for eligible expire candidates */
> > + AUTOFS_DEV_IOCTL_EXPIRE_CMD,
> > +
> > + /* Request busy status */
> > + AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD,
> > +
> > + /* Check if path is a mountpoint */
> > + AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD,
> > +};
> > +
>
On Wed, 2008-02-27 at 20:45 -0800, Andrew Morton wrote:
> On Tue, 26 Feb 2008 12:23:20 +0900 (WST) Ian Kent <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > Patch to track the uid and gid of the last process to request
> > a mount for on an autofs dentry.
> >
> > Signed-off-by: Ian Kent < [email protected]>
> >
> > Ian
> >
> > ---
> > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c
> > --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900
> > +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900
> > @@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str
> >
> > ino->flags = 0;
> > ino->mode = mode;
> > + ino->uid = 0;
> > + ino->gid = 0;
> > ino->inode = NULL;
> > ino->dentry = NULL;
> > ino->size = 0;
> > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h
> > --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900
> > +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900
> > @@ -58,6 +58,9 @@ struct autofs_info {
> > unsigned long last_used;
> > atomic_t count;
> >
> > + uid_t uid;
> > + gid_t gid;
> > +
> > mode_t mode;
> > size_t size;
> >
> > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c
> > --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900
> > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
> > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
> >
> > status = wq->status;
> >
> > + /*
> > + * For direct and offset mounts we need to track the requestor
> > + * uid and gid in the dentry info struct. This is so it can be
> > + * supplied, on request, by the misc device ioctl interface.
> > + * This is needed during daemon resatart when reconnecting
> > + * to existing, active, autofs mounts. The uid and gid (and
> > + * related string values) may be used for macro substitution
> > + * in autofs mount maps.
> > + */
> > + if (!status) {
> > + struct dentry *de = NULL;
> > +
> > + /* direct mount or browsable map */
> > + ino = autofs4_dentry_ino(dentry);
> > + if (!ino) {
> > + /* If not lookup actual dentry used */
> > + de = d_lookup(dentry->d_parent, &dentry->d_name);
> > + ino = autofs4_dentry_ino(de);
> > + }
> > +
> > + /* Set mount requestor */
> > + if (ino) {
> > + if (ino) {
> > + ino->uid = wq->uid;
> > + ino->gid = wq->gid;
> > + }
> > + }
> > +
> > + if (de)
> > + dput(de);
> > + }
> > +
>
> But uids and gids are no longer system-wide-unique. Two different users
> can have the same identifiers in different namespaces. What happens then?
That's a tricky question.
Presumably, the process requesting the mount has the user space daemon
running in the namespace within which the uid and gid are to be looked
up, by the daemon.
Am I missing something?
Ian
On Thu, 28 Feb 2008 15:22:27 +0900 Ian Kent <[email protected]> wrote:
>
> > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
> > > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
> > >
> > > status = wq->status;
> > >
> > > + /*
> > > + * For direct and offset mounts we need to track the requestor
> > > + * uid and gid in the dentry info struct. This is so it can be
> > > + * supplied, on request, by the misc device ioctl interface.
> > > + * This is needed during daemon resatart when reconnecting
> > > + * to existing, active, autofs mounts. The uid and gid (and
> > > + * related string values) may be used for macro substitution
> > > + * in autofs mount maps.
> > > + */
> > > + if (!status) {
> > > + struct dentry *de = NULL;
> > > +
> > > + /* direct mount or browsable map */
> > > + ino = autofs4_dentry_ino(dentry);
> > > + if (!ino) {
> > > + /* If not lookup actual dentry used */
> > > + de = d_lookup(dentry->d_parent, &dentry->d_name);
> > > + ino = autofs4_dentry_ino(de);
> > > + }
> > > +
> > > + /* Set mount requestor */
> > > + if (ino) {
> > > + if (ino) {
> > > + ino->uid = wq->uid;
> > > + ino->gid = wq->gid;
> > > + }
> > > + }
> > > +
> > > + if (de)
> > > + dput(de);
> > > + }
> > > +
> >
> > But uids and gids are no longer system-wide-unique. Two different users
> > can have the same identifiers in different namespaces. What happens then?
>
> That's a tricky question.
>
> Presumably, the process requesting the mount has the user space daemon
> running in the namespace within which the uid and gid are to be looked
> up, by the daemon.
>
> Am I missing something?
>
err, you assume more knowledge at this end about what you're trying to do
than actually exists :)
You seem to imply that if a machine is running 100 user namespaces then it
needs to run 100 mount daemons. Doesn't seem good.
What problem are you actually trying to solve here?
On Wed, 2008-02-27 at 22:37 -0800, Andrew Morton wrote:
> On Thu, 28 Feb 2008 15:22:27 +0900 Ian Kent <[email protected]> wrote:
>
> >
> > > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
> > > > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
> > > >
> > > > status = wq->status;
> > > >
> > > > + /*
> > > > + * For direct and offset mounts we need to track the requestor
> > > > + * uid and gid in the dentry info struct. This is so it can be
> > > > + * supplied, on request, by the misc device ioctl interface.
> > > > + * This is needed during daemon resatart when reconnecting
> > > > + * to existing, active, autofs mounts. The uid and gid (and
> > > > + * related string values) may be used for macro substitution
> > > > + * in autofs mount maps.
> > > > + */
> > > > + if (!status) {
> > > > + struct dentry *de = NULL;
> > > > +
> > > > + /* direct mount or browsable map */
> > > > + ino = autofs4_dentry_ino(dentry);
> > > > + if (!ino) {
> > > > + /* If not lookup actual dentry used */
> > > > + de = d_lookup(dentry->d_parent, &dentry->d_name);
> > > > + ino = autofs4_dentry_ino(de);
> > > > + }
> > > > +
> > > > + /* Set mount requestor */
> > > > + if (ino) {
> > > > + if (ino) {
> > > > + ino->uid = wq->uid;
> > > > + ino->gid = wq->gid;
> > > > + }
> > > > + }
> > > > +
> > > > + if (de)
> > > > + dput(de);
> > > > + }
> > > > +
> > >
> > > But uids and gids are no longer system-wide-unique. Two different users
> > > can have the same identifiers in different namespaces. What happens then?
> >
> > That's a tricky question.
> >
> > Presumably, the process requesting the mount has the user space daemon
> > running in the namespace within which the uid and gid are to be looked
> > up, by the daemon.
> >
> > Am I missing something?
> >
>
> err, you assume more knowledge at this end about what you're trying to do
> than actually exists :)
>
> You seem to imply that if a machine is running 100 user namespaces then it
> needs to run 100 mount daemons. Doesn't seem good.
More likely my lack of understanding of how namespaces are meant to
work.
>
> What problem are you actually trying to solve here?
The basic problem arises only when we want to restart the user space
daemon and there are active autofs managed mounts in place at exit (ie.
autofs mounts that have busy user mounts). They are left mounted and
processes using them continue to function. But then, when we startup
autofs we need to reconnect to these autofs mounts, some of which can
covered the by mounted file systems, and hence the need for another way
to open an ioctl descriptor to them.
It may have been overkill to re-implement all the current ioctls (and
add a couple of other much needed ones) but I though it sensible for
completeness, and we get to identify any possible problems the current
ioctls might have had due to the use of the BKL (by the VFS when calling
the ioctls).
So, why do we need the uid and gid? When someone walks over an autofs
dentry that is meant to cause a mount we send a request packet to the
daemon via a pipe which includes the process uid and gid, and as part of
the lookup we set macros for several mount map substitution variables,
derived from the uid and gid of the process requesting the mount and
they can be used within autofs maps.
This is all fine as long as we don't need to re-connect to these mounts
when starting up, since we don't get kernel requests for the mounts, we
need to obtain that information from the active mount itself.
Ian
On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <[email protected]> wrote:
>
> >
> > What problem are you actually trying to solve here?
>
> The basic problem arises only when we want to restart the user space
> daemon and there are active autofs managed mounts in place at exit (ie.
> autofs mounts that have busy user mounts). They are left mounted and
> processes using them continue to function. But then, when we startup
> autofs we need to reconnect to these autofs mounts, some of which can
> covered the by mounted file systems, and hence the need for another way
> to open an ioctl descriptor to them.
So we want to store persistant state in the kernel across userspace process
invokations. That's normally done with a thing called a "file" ;) Could we
stick all the necessary state into files in a pseudo-fs and have the daemon
go and open and read them all when it restarts?
> It may have been overkill to re-implement all the current ioctls (and
> add a couple of other much needed ones)
I don't understand that bit. But then, I don't have a clue how autofs
works.
> but I though it sensible for
> completeness, and we get to identify any possible problems the current
> ioctls might have had due to the use of the BKL (by the VFS when calling
> the ioctls).
It isn't a good idea to wait for races to reveal themselves. It will take
years, especially with a system which has as low a call frequency as autofs
mounting. And once a bug _does_ reveal itself, by then we'll have tens of
millions of machines out there running that bug.
> So, why do we need the uid and gid? When someone walks over an autofs
> dentry that is meant to cause a mount we send a request packet to the
> daemon via a pipe
connector or genetlink would be more fashionable transports.
> which includes the process uid and gid, and as part of
> the lookup we set macros for several mount map substitution variables,
> derived from the uid and gid of the process requesting the mount and
> they can be used within autofs maps.
yeah, could be a problem. Hopefully the namespace people can advise.
Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has
come up before. Recently, but I forget what the context was.
> This is all fine as long as we don't need to re-connect to these mounts
> when starting up, since we don't get kernel requests for the mounts, we
> need to obtain that information from the active mount itself.
Ian Kent wrote:
> On Wed, 2008-02-27 at 22:37 -0800, Andrew Morton wrote:
>> On Thu, 28 Feb 2008 15:22:27 +0900 Ian Kent <[email protected]> wrote:
>>
>>>>> +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900
>>>>> @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info *
>>>>>
>>>>> status = wq->status;
>>>>>
>>>>> + /*
>>>>> + * For direct and offset mounts we need to track the requestor
>>>>> + * uid and gid in the dentry info struct. This is so it can be
>>>>> + * supplied, on request, by the misc device ioctl interface.
>>>>> + * This is needed during daemon resatart when reconnecting
>>>>> + * to existing, active, autofs mounts. The uid and gid (and
>>>>> + * related string values) may be used for macro substitution
>>>>> + * in autofs mount maps.
>>>>> + */
>>>>> + if (!status) {
>>>>> + struct dentry *de = NULL;
>>>>> +
>>>>> + /* direct mount or browsable map */
>>>>> + ino = autofs4_dentry_ino(dentry);
>>>>> + if (!ino) {
>>>>> + /* If not lookup actual dentry used */
>>>>> + de = d_lookup(dentry->d_parent, &dentry->d_name);
>>>>> + ino = autofs4_dentry_ino(de);
>>>>> + }
>>>>> +
>>>>> + /* Set mount requestor */
>>>>> + if (ino) {
>>>>> + if (ino) {
>>>>> + ino->uid = wq->uid;
>>>>> + ino->gid = wq->gid;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + if (de)
>>>>> + dput(de);
>>>>> + }
>>>>> +
>>>> But uids and gids are no longer system-wide-unique. Two different users
>>>> can have the same identifiers in different namespaces. What happens then?
>>> That's a tricky question.
>>>
>>> Presumably, the process requesting the mount has the user space daemon
>>> running in the namespace within which the uid and gid are to be looked
>>> up, by the daemon.
>>>
>>> Am I missing something?
>>>
>> err, you assume more knowledge at this end about what you're trying to do
>> than actually exists :)
>>
>> You seem to imply that if a machine is running 100 user namespaces then it
>> needs to run 100 mount daemons. Doesn't seem good.
>
> More likely my lack of understanding of how namespaces are meant to
> work.
>
>> What problem are you actually trying to solve here?
>
> The basic problem arises only when we want to restart the user space
> daemon and there are active autofs managed mounts in place at exit (ie.
> autofs mounts that have busy user mounts). They are left mounted and
> processes using them continue to function. But then, when we startup
> autofs we need to reconnect to these autofs mounts, some of which can
> covered the by mounted file systems, and hence the need for another way
> to open an ioctl descriptor to them.
>
> It may have been overkill to re-implement all the current ioctls (and
> add a couple of other much needed ones) but I though it sensible for
> completeness, and we get to identify any possible problems the current
> ioctls might have had due to the use of the BKL (by the VFS when calling
> the ioctls).
>
> So, why do we need the uid and gid? When someone walks over an autofs
> dentry that is meant to cause a mount we send a request packet to the
> daemon via a pipe which includes the process uid and gid, and as part of
> the lookup we set macros for several mount map substitution variables,
> derived from the uid and gid of the process requesting the mount and
> they can be used within autofs maps.
Why do we need the uid then? Is just pid not enough to uniquely
identify a task?
Assuming we can get by with a pid only, this problem can be solved
by sending a pid_nr() of a task, i.e. the pid by which this task is
seen from an initial namespace. This pid is unique across the system
even when pid namespaces are created.
But this ... trick is only valid if the daemon, that receives the
pid doesn't try to communicate with this task (e.g. send him a signal),
but just uses this as a key to lookup in some hash. This is not about
security - even having someone's global pid task can do nothing useful
with it - this is about the consistency - when sending a signal to a
task, giving its _global_ pid to sys_kill() the signal may arrive to a
wrong task if the sender lives in a sub-namespace.
> This is all fine as long as we don't need to re-connect to these mounts
> when starting up, since we don't get kernel requests for the mounts, we
> need to obtain that information from the active mount itself.
>
> Ian
>
>
>
>
On Thu, 28 Feb 2008 10:51:08 +0300 Pavel Emelyanov <[email protected]> wrote:
> > So, why do we need the uid and gid? When someone walks over an autofs
> > dentry that is meant to cause a mount we send a request packet to the
> > daemon via a pipe which includes the process uid and gid, and as part of
> > the lookup we set macros for several mount map substitution variables,
> > derived from the uid and gid of the process requesting the mount and
> > they can be used within autofs maps.
>
> Why do we need the uid then? Is just pid not enough to uniquely
> identify a task?
The problem is that the userspace daemon is restarted. ie: it exits
and is re-run. It then needs to pick up various state from its previous
run.
On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote:
> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <[email protected]> wrote:
>
> >
> > >
> > > What problem are you actually trying to solve here?
> >
> > The basic problem arises only when we want to restart the user space
> > daemon and there are active autofs managed mounts in place at exit (ie.
> > autofs mounts that have busy user mounts). They are left mounted and
> > processes using them continue to function. But then, when we startup
> > autofs we need to reconnect to these autofs mounts, some of which can
> > covered the by mounted file systems, and hence the need for another way
> > to open an ioctl descriptor to them.
>
> So we want to store persistant state in the kernel across userspace process
> invokations. That's normally done with a thing called a "file" ;) Could we
> stick all the necessary state into files in a pseudo-fs and have the daemon
> go and open and read them all when it restarts?
Nearly sent of a reply without thinking about this and it sounds like a
good idea initially. But, if we have a large number of autofs file
systems mounted (thousands?), duplicating the information already
present in the autofs file system seems untidy and unnecessary. I
thought of exposing this information in in sysfs, but that would make
the autofs module part of sysfs have many files, and there is the
problem that the same path could have more than one autofs file system
stacked on top of it, so isn't unique.
The other obvious place is in the mount options but that is one of the
reasons that only the device number is exposed their. If we have 5000 -
10000 mounts then scanning /proc/mounts becomes a big problem from a CPU
usage perspective (and is already a big problem for me, which is partly
addressed by the new bits in the implementation here). If I could think
of another way to expose the device number as well I would use it, even
if it was an additional ioctl, to avoid scanning /proc/mounts.
>
> > It may have been overkill to re-implement all the current ioctls (and
> > add a couple of other much needed ones)
>
> I don't understand that bit. But then, I don't have a clue how autofs
> works.
>
> > but I though it sensible for
> > completeness, and we get to identify any possible problems the current
> > ioctls might have had due to the use of the BKL (by the VFS when calling
> > the ioctls).
>
> It isn't a good idea to wait for races to reveal themselves. It will take
> years, especially with a system which has as low a call frequency as autofs
> mounting. And once a bug _does_ reveal itself, by then we'll have tens of
> millions of machines out there running that bug.
Not sure I agree about the low call frequency.
It's quite normal for smaller sites to have hundreds of entries in maps
and equally common for them to do stupid things like run scripts that
scan the file systems, mounting every mount.
It's not quite the same order of magnitude, but I regularly use the
autofs connectathon suite for testing. It only ends up mounting several
hundred mounts but I can get mounting and expiring happening at the same
time so it's not an unreasonable way to test.
>
> > So, why do we need the uid and gid? When someone walks over an autofs
> > dentry that is meant to cause a mount we send a request packet to the
> > daemon via a pipe
>
> connector or genetlink would be more fashionable transports.
Right, I'll see what I can find on those topics.
My concern is that this will require considerable work in the daemon
which would be fine for version 5.1 but I need to resolve this problem
for the existing 5.0 implementation.
>
> > which includes the process uid and gid, and as part of
> > the lookup we set macros for several mount map substitution variables,
> > derived from the uid and gid of the process requesting the mount and
> > they can be used within autofs maps.
>
> yeah, could be a problem. Hopefully the namespace people can advise.
> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has
> come up before. Recently, but I forget what the context was.
I'm all ears to any feedback from others on this, please.
>
> > This is all fine as long as we don't need to re-connect to these mounts
> > when starting up, since we don't get kernel requests for the mounts, we
> > need to obtain that information from the active mount itself.
>
On Wed, 2008-02-27 at 23:59 -0800, Andrew Morton wrote:
> On Thu, 28 Feb 2008 10:51:08 +0300 Pavel Emelyanov <[email protected]> wrote:
>
> > > So, why do we need the uid and gid? When someone walks over an autofs
> > > dentry that is meant to cause a mount we send a request packet to the
> > > daemon via a pipe which includes the process uid and gid, and as part of
> > > the lookup we set macros for several mount map substitution variables,
> > > derived from the uid and gid of the process requesting the mount and
> > > they can be used within autofs maps.
> >
> > Why do we need the uid then? Is just pid not enough to uniquely
> > identify a task?
>
> The problem is that the userspace daemon is restarted. ie: it exits
> and is re-run. It then needs to pick up various state from its previous
> run.
Dumb old me, I really only need the uid.
The gid can come from the get user info functions of glibc.
DOH!
On Thu, Feb 28, 2008 at 05:06:11PM +0900, Ian Kent wrote:
>
> Dumb old me, I really only need the uid.
> The gid can come from the get user info functions of glibc.
In case the process was executed from a setgid executable, you might
have a different gid from what the user has. In fact, I don't know why
you may need more than the pid, since with the pid you can get to the
task's effective uid/gid and maybe other such information you need.
Cheers,
F?bio Oliv?
--
ex sed lex awk yacc, e pluribus unix, amem
Ian Kent <[email protected]> writes:
> On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote:
>> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <[email protected]> wrote:
>> > which includes the process uid and gid, and as part of
>> > the lookup we set macros for several mount map substitution variables,
>> > derived from the uid and gid of the process requesting the mount and
>> > they can be used within autofs maps.
>>
>> yeah, could be a problem. Hopefully the namespace people can advise.
>> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
>> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has
>> come up before. Recently, but I forget what the context was.
>
> I'm all ears to any feedback from others on this, please.
I think there is some confusion surrounding what the UID and GID are
used for in this context. I'll try to explain it as best I can.
When the automount daemon parses a map entry, it will do some amount of
variable substitution. So, let's say you're running on an i386 box, and
you want to mount a library directory from a server. You might have a
map entry that looks like this:
lib server:/export/$ARCH/lib
In this case, the automount daemon will replace $ARCH with i386, and
will try the following mount command:
mount -t nfs server:/export/i386/lib /automountdir/lib
There are cases where it would be helpful to use the requesting
process's UID in such a variable substitution. Consider the case of a
CIFS share, where the automount daemon runs as user root, but we want to
mount the share using the credentials of the requesting user. In this
case, the UID and GID can be helpful in formatting the mount options for
mounting the share.
So, the UID and GID are used only for map substitutions. Now, having
said all of that, I'll have to look more closely at why we even need to
keep track of it, given that it only needs to be used when performing
the lookup, and at that time we have information on the requesting UID
and GID.
Cheers,
Jeff
Quoting Jeff Moyer ([email protected]):
> Ian Kent <[email protected]> writes:
>
> > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote:
> >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <[email protected]> wrote:
> >> > which includes the process uid and gid, and as part of
> >> > the lookup we set macros for several mount map substitution variables,
> >> > derived from the uid and gid of the process requesting the mount and
> >> > they can be used within autofs maps.
> >>
> >> yeah, could be a problem. Hopefully the namespace people can advise.
> >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
> >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has
> >> come up before. Recently, but I forget what the context was.
> >
> > I'm all ears to any feedback from others on this, please.
>
> I think there is some confusion surrounding what the UID and GID are
> used for in this context. I'll try to explain it as best I can.
>
> When the automount daemon parses a map entry, it will do some amount of
> variable substitution. So, let's say you're running on an i386 box, and
> you want to mount a library directory from a server. You might have a
> map entry that looks like this:
>
> lib server:/export/$ARCH/lib
>
> In this case, the automount daemon will replace $ARCH with i386, and
> will try the following mount command:
>
> mount -t nfs server:/export/i386/lib /automountdir/lib
>
> There are cases where it would be helpful to use the requesting
> process's UID in such a variable substitution. Consider the case of a
> CIFS share, where the automount daemon runs as user root, but we want to
> mount the share using the credentials of the requesting user. In this
> case, the UID and GID can be helpful in formatting the mount options for
> mounting the share.
>
> So, the UID and GID are used only for map substitutions. Now, having
> said all of that, I'll have to look more closely at why we even need to
> keep track of it, given that it only needs to be used when performing
> the lookup, and at that time we have information on the requesting UID
> and GID.
Thanks Jeff. If that's the case then user namespaces don't affect this
at all.
(Still trying to follow the rest of the thread bc i definately feel like
I'm missing something. I swear I understood autofs 10+ years ago :)
thanks,
-serge
Pavel Emelyanov <[email protected]> writes:
> Why do we need the uid then? Is just pid not enough to uniquely
> identify a task?
>
> Assuming we can get by with a pid only, this problem can be solved
> by sending a pid_nr() of a task, i.e. the pid by which this task is
> seen from an initial namespace. This pid is unique across the system
> even when pid namespaces are created.
Pavel it is never correct to use a global pid when talking to user space.
In fact the concept is just a bit dubious. We must always translate
the pid into the pid namespace of the task we are talking to, or at
least into the pid namespace of the process that opened the file
handle, (essentially the same, but does not have races in the corner
cases).
Even in the kernel using global ids is dubious. When dealing with
user space it is just wrong.
Speaking of. I think we still need work on autofs in this regard.
I know last I looked we had some outstanding issues there.
Eric
On Thu, 2008-02-28 at 13:51 -0600, Serge E. Hallyn wrote:
> Quoting Jeff Moyer ([email protected]):
> > Ian Kent <[email protected]> writes:
> >
> > > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote:
> > >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <[email protected]> wrote:
> > >> > which includes the process uid and gid, and as part of
> > >> > the lookup we set macros for several mount map substitution variables,
> > >> > derived from the uid and gid of the process requesting the mount and
> > >> > they can be used within autofs maps.
> > >>
> > >> yeah, could be a problem. Hopefully the namespace people can advise.
> > >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
> > >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has
> > >> come up before. Recently, but I forget what the context was.
> > >
> > > I'm all ears to any feedback from others on this, please.
> >
> > I think there is some confusion surrounding what the UID and GID are
> > used for in this context. I'll try to explain it as best I can.
> >
> > When the automount daemon parses a map entry, it will do some amount of
> > variable substitution. So, let's say you're running on an i386 box, and
> > you want to mount a library directory from a server. You might have a
> > map entry that looks like this:
> >
> > lib server:/export/$ARCH/lib
> >
> > In this case, the automount daemon will replace $ARCH with i386, and
> > will try the following mount command:
> >
> > mount -t nfs server:/export/i386/lib /automountdir/lib
> >
> > There are cases where it would be helpful to use the requesting
> > process's UID in such a variable substitution. Consider the case of a
> > CIFS share, where the automount daemon runs as user root, but we want to
> > mount the share using the credentials of the requesting user. In this
> > case, the UID and GID can be helpful in formatting the mount options for
> > mounting the share.
> >
> > So, the UID and GID are used only for map substitutions. Now, having
> > said all of that, I'll have to look more closely at why we even need to
> > keep track of it, given that it only needs to be used when performing
> > the lookup, and at that time we have information on the requesting UID
> > and GID.
>
> Thanks Jeff. If that's the case then user namespaces don't affect this
> at all.
Yep, that's precisely the way this is used, by autofs anyway.
I guess the problem we face is that since this is a public interface
other applications could use this in a different way and we can't
control that. I think I need more information so I can document the
defined usage in my revised patch set.
In version 5 I set $UID, $GID, $USER, $GROUP and $HOME in addition to
the standard autofs macros, $ARCH, $CPU, $HOST, $OSNAME, $OSREL and
$OSVERS, and then expand the map entry.
The question that Jeff is asking himself is, why do we need this
information when we re-connect at startup, since the mounts are already
present.
The answer isn't easy to explain and will be lengthy, sorry, but let me
try anyway.
There are two types on maps, direct (in the module source you will see a
third type called an offset, which is just a direct mount in disguise)
and indirect.
For example, here is master map with direct and indirect map entries:
/- /etc/auto.direct
/test /etc/auto.indirect
/etc/auto.direct:
/automount/dparse/g6 budgie:/autofs/export1
/automount/dparse/g1 shark:/autofs/export1
and so on.
/etc/auto.indirect:
g1 shark:/autofs/export1
g6 budgie:/autofs/export1
and so on.
For the above indirect map an autofs file system is mounted on /test and
mounts are triggered by the inode lookup operation. So we see a mount of
shark:/autofs/export1 on /test/g1, for example.
The way that direct mounts are handled is by makeing an autofs mount on
each full path, such as /automount/dparse/g1, and using it as a mount
trigger. So when we walk on the path we mount shark:/autofs/export1 on
top of this mount point, for example. Since these are always a
directories we can use the follow_link inode operation to trigger the
mount.
But, each entry in direct and indirect maps can have offsets (often
called multi-mount map entries).
For example,
a direct mount map entry could also be:
/automount/dparse/g1 \
/ shark:/autofs/export5/testing/test \
/s1 shark:/autofs/export/testing/test/s1 \
/s2 shark:/autofs/export5/testing/test/s2 \
/s1/ss1 shark:/autofs/export2 \
/s2/ss2 shark:/autofs/export2
and a similar indirect mount:
g1 \
/ shark:/autofs/export5/testing/test \
/s1 shark:/autofs/export/testing/test/s1 \
/s2 shark:/autofs/export5/testing/test/s2 \
/s1/ss1 shark:/autofs/export1 \
/s2/ss2 shark:/autofs/export2
One of the issues with version 4 of autofs was that, when mounting an
entry with a large number of offsets, possibly with nesting, we needed
to mount and umount all of them as a single unit. Not really a problem,
except for people with a large number of offsets in map entries. This
mechanism is used for the well known "hosts" map and we have seen cases
(in 2.4) where the available number of mounts are exhausted or where the
number of privileged ports available is exhausted.
In version 5 we mount only as we go down the tree of offsets and
similarly for expiring them which resolves the above problem. There is
somewhat more detail to the implementation but it isn't needed for the
sake of the explanation. The one important detail is that these offsets
are implemented using the same mechanism as the direct mounts above and
so can also be covered by another mount.
To be able to do this I need to maintain context. In the daemon I use a
list to represent these offsets and use it to manage the mounting and
expiration of the mount tree, something which can only be discovered
from the original map entry. So, to rebuild this context at startup for
existing mounts I need to do the lookup to get the map entry as part of
the process of re-connecting to the mounts. Hence the need to get the
uid and gid of the original requester.
Few, that was hard work, just for those last couple of sentences.
Please, lets not go into the issue that the maps can change during a
restart, that's an issue for another time and isn't a kernel issue
anyway.
>
> (Still trying to follow the rest of the thread bc i definately feel like
> I'm missing something. I swear I understood autofs 10+ years ago :)
Me too, but now I've change it so much even I'm confused most of the
time, ;)
Hopefully the above explanation is useful in some small way.
Ian
Quoting Ian Kent ([email protected]):
>
> On Thu, 2008-02-28 at 13:51 -0600, Serge E. Hallyn wrote:
> > Quoting Jeff Moyer ([email protected]):
> > > Ian Kent <[email protected]> writes:
> > >
> > > > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote:
> > > >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <[email protected]> wrote:
> > > >> > which includes the process uid and gid, and as part of
> > > >> > the lookup we set macros for several mount map substitution variables,
> > > >> > derived from the uid and gid of the process requesting the mount and
> > > >> > they can be used within autofs maps.
> > > >>
> > > >> yeah, could be a problem. Hopefully the namespace people can advise.
> > > >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid,
> > > >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has
> > > >> come up before. Recently, but I forget what the context was.
> > > >
> > > > I'm all ears to any feedback from others on this, please.
> > >
> > > I think there is some confusion surrounding what the UID and GID are
> > > used for in this context. I'll try to explain it as best I can.
> > >
> > > When the automount daemon parses a map entry, it will do some amount of
> > > variable substitution. So, let's say you're running on an i386 box, and
> > > you want to mount a library directory from a server. You might have a
> > > map entry that looks like this:
> > >
> > > lib server:/export/$ARCH/lib
> > >
> > > In this case, the automount daemon will replace $ARCH with i386, and
> > > will try the following mount command:
> > >
> > > mount -t nfs server:/export/i386/lib /automountdir/lib
> > >
> > > There are cases where it would be helpful to use the requesting
> > > process's UID in such a variable substitution. Consider the case of a
> > > CIFS share, where the automount daemon runs as user root, but we want to
> > > mount the share using the credentials of the requesting user. In this
> > > case, the UID and GID can be helpful in formatting the mount options for
> > > mounting the share.
> > >
> > > So, the UID and GID are used only for map substitutions. Now, having
> > > said all of that, I'll have to look more closely at why we even need to
> > > keep track of it, given that it only needs to be used when performing
> > > the lookup, and at that time we have information on the requesting UID
> > > and GID.
> >
> > Thanks Jeff. If that's the case then user namespaces don't affect this
> > at all.
>
> Yep, that's precisely the way this is used, by autofs anyway.
>
> I guess the problem we face is that since this is a public interface
> other applications could use this in a different way and we can't
> control that. I think I need more information so I can document the
> defined usage in my revised patch set.
>
> In version 5 I set $UID, $GID, $USER, $GROUP and $HOME in addition to
> the standard autofs macros, $ARCH, $CPU, $HOST, $OSNAME, $OSREL and
> $OSVERS, and then expand the map entry.
>
> The question that Jeff is asking himself is, why do we need this
> information when we re-connect at startup, since the mounts are already
> present.
>
> The answer isn't easy to explain and will be lengthy, sorry, but let me
> try anyway.
>
> There are two types on maps, direct (in the module source you will see a
> third type called an offset, which is just a direct mount in disguise)
> and indirect.
>
> For example, here is master map with direct and indirect map entries:
>
> /- /etc/auto.direct
> /test /etc/auto.indirect
>
> /etc/auto.direct:
>
> /automount/dparse/g6 budgie:/autofs/export1
> /automount/dparse/g1 shark:/autofs/export1
> and so on.
>
> /etc/auto.indirect:
>
> g1 shark:/autofs/export1
> g6 budgie:/autofs/export1
> and so on.
>
> For the above indirect map an autofs file system is mounted on /test and
> mounts are triggered by the inode lookup operation. So we see a mount of
> shark:/autofs/export1 on /test/g1, for example.
>
> The way that direct mounts are handled is by makeing an autofs mount on
> each full path, such as /automount/dparse/g1, and using it as a mount
> trigger. So when we walk on the path we mount shark:/autofs/export1 on
> top of this mount point, for example. Since these are always a
> directories we can use the follow_link inode operation to trigger the
> mount.
>
> But, each entry in direct and indirect maps can have offsets (often
> called multi-mount map entries).
>
> For example,
>
> a direct mount map entry could also be:
>
> /automount/dparse/g1 \
> / shark:/autofs/export5/testing/test \
> /s1 shark:/autofs/export/testing/test/s1 \
> /s2 shark:/autofs/export5/testing/test/s2 \
> /s1/ss1 shark:/autofs/export2 \
> /s2/ss2 shark:/autofs/export2
>
> and a similar indirect mount:
>
> g1 \
> / shark:/autofs/export5/testing/test \
> /s1 shark:/autofs/export/testing/test/s1 \
> /s2 shark:/autofs/export5/testing/test/s2 \
> /s1/ss1 shark:/autofs/export1 \
> /s2/ss2 shark:/autofs/export2
>
> One of the issues with version 4 of autofs was that, when mounting an
> entry with a large number of offsets, possibly with nesting, we needed
> to mount and umount all of them as a single unit. Not really a problem,
> except for people with a large number of offsets in map entries. This
> mechanism is used for the well known "hosts" map and we have seen cases
> (in 2.4) where the available number of mounts are exhausted or where the
> number of privileged ports available is exhausted.
>
> In version 5 we mount only as we go down the tree of offsets and
> similarly for expiring them which resolves the above problem. There is
> somewhat more detail to the implementation but it isn't needed for the
> sake of the explanation. The one important detail is that these offsets
> are implemented using the same mechanism as the direct mounts above and
> so can also be covered by another mount.
>
> To be able to do this I need to maintain context. In the daemon I use a
> list to represent these offsets and use it to manage the mounting and
> expiration of the mount tree, something which can only be discovered
> from the original map entry. So, to rebuild this context at startup for
> existing mounts I need to do the lookup to get the map entry as part of
> the process of re-connecting to the mounts. Hence the need to get the
> uid and gid of the original requester.
The way the user namespaces work right now is similar to say the IPC
namespace - a task belongs to one user, that user belongs to precisely
one user namespace.
Even in my additional userns patches, I was changing uid to store the
(uid, userns) so a struct user still belonged to just one user
namespace.
In contrast, with pid namespaces a task is associated with a 'struct
pid' which links it to multiple process ids, one in each pid namespace
to which it belongs.
Perhaps we should be treating user namespaces like pid namespaces?
For autofs this would mean that when autofs wants a uid for some task,
it would be given the uid in the user namespace which autofs 'knows'.
It would also help me fix the siginfo problems I haven't solved yet -
rather than having to worry about user namespace lifetimes with siginfos
(which last a little while but have no clearly defined lifespan) we
could send the uid in an init user namespace or the uid in the target
uid namespace, or just a lightweight user struct proxy akin to 'struct
pid'.
And it also obviates the need for any sort of delegation.
So if I'm user 500 in what I think is the initial user namespace, I can
create a container with a new user namespace, the init task of which is
both uid 0 in the child userns, and uid 500 in the higher level,
automatically giving the container access to any files I own.
Eric, when you get a chance (I know you're overloaded atm) I'd love to
hear your thoughts on this...
> Few, that was hard work, just for those last couple of sentences.
>
> Please, lets not go into the issue that the maps can change during a
> restart, that's an issue for another time and isn't a kernel issue
> anyway.
>
> >
> > (Still trying to follow the rest of the thread bc i definately feel like
> > I'm missing something. I swear I understood autofs 10+ years ago :)
>
> Me too, but now I've change it so much even I'm confused most of the
> time, ;)
>
> Hopefully the above explanation is useful in some small way.
>
> Ian
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> The way the user namespaces work right now is similar to say the IPC
> namespace - a task belongs to one user, that user belongs to precisely
> one user namespace.
>
> Even in my additional userns patches, I was changing uid to store the
> (uid, userns) so a struct user still belonged to just one user
> namespace.
>
> In contrast, with pid namespaces a task is associated with a 'struct
> pid' which links it to multiple process ids, one in each pid namespace
> to which it belongs.
>
> Perhaps we should be treating user namespaces like pid namespaces?
I'm afraid, that I'm just starting a new thread of discussion in a
wrong place, but I can't refrain from asking "what for?"
> So if I'm user 500 in what I think is the initial user namespace, I can
> create a container with a new user namespace, the init task of which is
> both uid 0 in the child userns, and uid 500 in the higher level,
> automatically giving the container access to any files I own.
So do you mean that I can become a root, by calling clone()?
Thanks,
Pavel
On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote:
> On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <[email protected]> wrote:
>
> > Hi Andrew,
> >
> > Patch to add miscellaneous device to autofs4 module for
> > ioctls.
>
> Could you please document the new kernel interface which you're proposing?
> In Docmentation/ or in the changelog?
>
> We seem to be passing some string into a miscdevice ioctl and getting some
> results back. Be aware that this won't be a terribly popular proposal, so
> I'd suggest that you fully describe the problem which it's trying to solve,
> and how it solves it, and why the various alternatives (sysfs, netlink,
> mount options, etc) were judged unsuitable.
It appears I could do this with the generic netlink subsystem.
I'll have a go at it.
>
>
> >
> > ...
> >
> > +/*
> > + * Get the autofs super block info struct from the file opened on
> > + * the autofs mount point.
> > + */
> > +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f)
> > +{
> > + struct autofs_sb_info *sbi = NULL;
> > + struct inode *inode;
> > +
> > + if (f) {
>
> `f' cannot be NULL here.
And, will still be in the netlink implementation and will still return
an appropriate error.
>
> > + inode = f->f_path.dentry->d_inode;
> > + sbi = autofs4_sbi(inode->i_sb);
> > + }
> > +
> > + return sbi;
> > +}
> > +
> > +/* Return autofs module protocol version */
> > +static inline int autofs_dev_ioctl_protover(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + param->arg1 = sbi->version;
> > + return 0;
> > +}
> > +
> > +/* Return autofs module protocol sub version */
> > +static inline int autofs_dev_ioctl_protosubver(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + param->arg1 = sbi->sub_version;
> > + return 0;
> > +}
>
> Don't bother inlining things - the compiler will do it.
>
> > +/*
> > + * Walk down the mount stack looking for an autofs mount that
> > + * has the requested device number (aka. new_encode_dev(sb->s_dev).
> > + */
> > +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno)
> > +{
> > + struct dentry *dentry;
> > + struct inode *inode;
> > + struct super_block *sb;
> > + dev_t s_dev;
> > + unsigned int err;
> > +
> > + err = -ENOENT;
> > +
> > + /* Lookup the dentry name at the base of our mount point */
> > + dentry = d_lookup(nd->path.dentry, &nd->last);
> > + if (!dentry)
> > + goto out;
> > +
> > + dput(nd->path.dentry);
> > + nd->path.dentry = dentry;
> > +
> > + /* And follow the mount stack looking for our autofs mount */
> > + while (1) {
> > + inode = nd->path.dentry->d_inode;
> > + if (!inode)
> > + continue;
> > +
> > + sb = inode->i_sb;
> > + s_dev = new_encode_dev(sb->s_dev);
> > + if (devno == s_dev) {
> > + if (sb->s_magic == AUTOFS_SUPER_MAGIC) {
> > + err = 0;
> > + break;
> > + }
> > + }
> > +
> > + if (!follow_down(&nd->path.mnt, &nd->path.dentry))
> > + goto out;
> > + }
> > +
> > +out:
> > + return err;
> > +}
>
> hm. possibly-interested parties cc'ed.
Agian, will still be in the netlink implementation.
Speak up if it's not right.
>
> > +/*
> > + * Install the file opened for autofs mount point control functions
> > + * and set close on exec.
> > + */
> > +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file)
> > +{
> > + struct files_struct *files = current->files;
> > + struct fdtable *fdt;
> > +
> > + spin_lock(&files->file_lock);
> > + fdt = files_fdtable(files);
> > + BUG_ON(fdt->fd[fd] != NULL);
> > + rcu_assign_pointer(fdt->fd[fd], file);
> > + FD_SET(fd, fdt->close_on_exec);
> > + spin_unlock(&files->file_lock);
> > +}
>
> That's fd_install() plus an add-on. It's not autofs-specific. Should be
> in fs/open.c, methinks?
OK, I'll do that if someone speaks up.
>
> >
> > ...
> >
> > +/* Close file descriptor allocated above (user can also use close(2)). */
> > +static inline int autofs_dev_ioctl_closemount(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + return sys_close(param->ioctlfd);
> > +}
>
> hm.
Also, still in the netlink implementation, with a comment a bit more
informative than the one above.
>
> >
> > ...
> >
> > +/*
> > + * Set the pipe fd for kernel communication to the daemon.
> > + *
> > + * Normally this is set at mount using an option but if we
> > + * are reconnecting to a busy mount then we need to use this
> > + * to tell the autofs mount about the new kernel pipe fd. In
> > + * order to protect mounts against incorrectly setting the
> > + * pipefd we also require that the autofs mount be catatonic.
> > + *
> > + * This also sets the process group id used to identify the
> > + * controlling process (eg. the owning automount(8) daemon).
> > + */
> > +static int autofs_dev_ioctl_setpipefd(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + int pipefd;
> > + int err = 0;
> > +
> > + if (param->arg1 == -1)
> > + return -EINVAL;
> > +
> > + pipefd = param->arg1;
> > +
> > + if (!sbi->catatonic)
> > + return -EBUSY;
> > + else {
> > + struct file *pipe = fget(pipefd);
> > + if (!pipe->f_op || !pipe->f_op->write) {
> > + err = -EPIPE;
> > + fput(pipe);
> > + goto out;
> > + }
> > + sbi->oz_pgrp = task_pgrp_nr(current);
> > + sbi->pipefd = pipefd;
> > + sbi->pipe = pipe;
> > + sbi->catatonic = 0;
> > + }
> > +out:
> > + return err;
> > +}
>
> We have a new filesystem type, a misc device with a mysterious ioctl,
> hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes
> too.
That's not going to change.
There's nothing new here at all.
This is merely an re-implementation of the existing autofs ioctl
interface with two additional calls, nothing more.
>
> This is a complex interface. We really need to see the overall
> problem
> statement, design and interface description, please.
I'll add a document describing this, as previously agreed.
snip ...
> > +
> > +/*
> > + * Call repeatedly until it returns -EAGAIN, meaning there's nothing
> > + * more that can be done.
> > + */
> > +static int autofs_dev_ioctl_expire(struct file *fp,
> > + struct autofs_sb_info *sbi,
> > + struct autofs_dev_ioctl *param)
> > +{
> > + struct dentry *dentry;
> > + struct vfsmount *mnt;
> > + int err = -EAGAIN;
> > + int when;
> > +
> > + when = param->arg1;
> > + mnt = fp->f_path.mnt;
> > +
> > + if (sbi->type & AUTOFS_TYPE_DIRECT)
> > + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when);
> > + else
> > + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when);
> > +
> > + if (dentry) {
> > + struct autofs_info *ino = autofs4_dentry_ino(dentry);
> > +
> > + /*
> > + * This is synchronous because it makes the daemon a
> > + * little easier
> > + */
> > + ino->flags |= AUTOFS_INF_EXPIRING;
> > + err = autofs4_wait(sbi, dentry, NFY_EXPIRE);
> > + ino->flags &= ~AUTOFS_INF_EXPIRING;
>
> Are there races around the modification of ino->flags here?
I haven't had any problems with this over time.
I've always thought that this was because the flag is only set during an
expire, of which there is only ever one going for a given mount, is
synchronous, and mount requests only read the flag to check status.
But I could be wrong since it may have been OK because the existing
autofs ioctl holds the BKL for its operations.
I'll think about it.
snip ....
Quoting Pavel Emelyanov ([email protected]):
> > The way the user namespaces work right now is similar to say the IPC
> > namespace - a task belongs to one user, that user belongs to precisely
> > one user namespace.
> >
> > Even in my additional userns patches, I was changing uid to store the
> > (uid, userns) so a struct user still belonged to just one user
> > namespace.
> >
> > In contrast, with pid namespaces a task is associated with a 'struct
> > pid' which links it to multiple process ids, one in each pid namespace
> > to which it belongs.
> >
> > Perhaps we should be treating user namespaces like pid namespaces?
>
> I'm afraid, that I'm just starting a new thread of discussion in a
> wrong place, but I can't refrain from asking "what for?"
For the reasons I listed there :)
> > So if I'm user 500 in what I think is the initial user namespace, I can
> > create a container with a new user namespace, the init task of which is
> > both uid 0 in the child userns, and uid 500 in the higher level,
> > automatically giving the container access to any files I own.
>
> So do you mean that I can become a root, by calling clone()?
You can become root in the new container. Your capabilities are
meaningful only to targets (users, files) which exist in the user
namespace in which you are root. It becomes more precise than the
CAP_NS_OVERRIDE approach in my last patchset.
> Thanks,
> Pavel
"Serge E. Hallyn" <[email protected]> writes:
> The way the user namespaces work right now is similar to say the IPC
> namespace - a task belongs to one user, that user belongs to precisely
> one user namespace.
>
> Even in my additional userns patches, I was changing uid to store the
> (uid, userns) so a struct user still belonged to just one user
> namespace.
>
> In contrast, with pid namespaces a task is associated with a 'struct
> pid' which links it to multiple process ids, one in each pid namespace
> to which it belongs.
>
> Perhaps we should be treating user namespaces like pid namespaces?
I definitely think we should have some support like that.
We already have code in nfsv4 and p9fs if I remember correctly to make
between the user namespace of the server (which is string based) and
the uids of the local machine.
We also have a similar issue with security keys.
I don't know if the strict hierarchical nature we have makes a lot of
sense. One of the things that we should account for is that
frequently user namespaces are kept in sync between multiple machines
by system administrators. So in the real world user namespaces are a
system administrator boundary.
> For autofs this would mean that when autofs wants a uid for some task,
> it would be given the uid in the user namespace which autofs
> 'knows'.
Yes. The user namespace of the process that opened the pipe I believe
is the right choice there.
> It would also help me fix the siginfo problems I haven't solved yet -
> rather than having to worry about user namespace lifetimes with siginfos
> (which last a little while but have no clearly defined lifespan) we
> could send the uid in an init user namespace or the uid in the target
> uid namespace, or just a lightweight user struct proxy akin to 'struct
> pid'.
Yes. For the pids we have been looking at sending the pid in the
target namespace and sending the uid in the target namespace should be
no more difficult.
> And it also obviates the need for any sort of delegation.
>
> So if I'm user 500 in what I think is the initial user namespace, I can
> create a container with a new user namespace, the init task of which is
> both uid 0 in the child userns, and uid 500 in the higher level,
> automatically giving the container access to any files I own.
Right.
Long term we want to look at making this an unprivileged operation.
Allowing a user to run less privileged processes.
My impression has always been that going from comparing (userns, uid)
to a more sophisticated mapping approach was a compatible extension.
However if it looks like we need user namespace mapping support up
front to get the semantics clean I have no problem with that.
> Eric, when you get a chance (I know you're overloaded atm) I'd love to
> hear your thoughts on this...
I think you are on the right track. In a lot of ways the user
namespace is the trickiest, as this is where we change the security
rules. If it is only at the level of who is who.
Since we already have user namespace mapping infrastructure in the
kernel and ways to call back to user space to ask what the mapping
should be, I feel performing mapping in the user namespace
and generalizing the existing capability is a good idea.
We still want to tell users if you can get away with it synchronize
your user namespaces across file systems, and kernels, and machines.
If they can't having good general tools in the kernel that you only
need to learn once and not once per instance sounds good.
Eric
"Serge E. Hallyn" <[email protected]> writes:
>
> The way the user namespaces work right now is similar to say the IPC
> namespace - a task belongs to one user, that user belongs to precisely
> one user namespace.
>
> Even in my additional userns patches, I was changing uid to store the
> (uid, userns) so a struct user still belonged to just one user
> namespace.
>
> In contrast, with pid namespaces a task is associated with a 'struct
> pid' which links it to multiple process ids, one in each pid namespace
> to which it belongs.
>
> Perhaps we should be treating user namespaces like pid namespaces?
>
> For autofs this would mean that when autofs wants a uid for some task,
> it would be given the uid in the user namespace which autofs 'knows'.
>
> It would also help me fix the siginfo problems I haven't solved yet -
> rather than having to worry about user namespace lifetimes with siginfos
> (which last a little while but have no clearly defined lifespan) we
> could send the uid in an init user namespace or the uid in the target
> uid namespace, or just a lightweight user struct proxy akin to 'struct
> pid'.
>
> And it also obviates the need for any sort of delegation.
>
> So if I'm user 500 in what I think is the initial user namespace, I can
> create a container with a new user namespace, the init task of which is
> both uid 0 in the child userns, and uid 500 in the higher level,
> automatically giving the container access to any files I own.
>
> Eric, when you get a chance (I know you're overloaded atm) I'd love to
> hear your thoughts on this...
Succinctly.
I think the concept of mapping uids between user namespaces is
fundamental to properly describing and thinking about the semantics of
user namespaces correct.
We don't have to start out anything except handling the case when
no mapping exists, but asking the question how does this uid map
between from one namespace to another is fundamental.
Eric
Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
> >
> > The way the user namespaces work right now is similar to say the IPC
> > namespace - a task belongs to one user, that user belongs to precisely
> > one user namespace.
> >
> > Even in my additional userns patches, I was changing uid to store the
> > (uid, userns) so a struct user still belonged to just one user
> > namespace.
> >
> > In contrast, with pid namespaces a task is associated with a 'struct
> > pid' which links it to multiple process ids, one in each pid namespace
> > to which it belongs.
> >
> > Perhaps we should be treating user namespaces like pid namespaces?
> >
> > For autofs this would mean that when autofs wants a uid for some task,
> > it would be given the uid in the user namespace which autofs 'knows'.
> >
> > It would also help me fix the siginfo problems I haven't solved yet -
> > rather than having to worry about user namespace lifetimes with siginfos
> > (which last a little while but have no clearly defined lifespan) we
> > could send the uid in an init user namespace or the uid in the target
> > uid namespace, or just a lightweight user struct proxy akin to 'struct
> > pid'.
> >
> > And it also obviates the need for any sort of delegation.
> >
> > So if I'm user 500 in what I think is the initial user namespace, I can
> > create a container with a new user namespace, the init task of which is
> > both uid 0 in the child userns, and uid 500 in the higher level,
> > automatically giving the container access to any files I own.
> >
> > Eric, when you get a chance (I know you're overloaded atm) I'd love to
> > hear your thoughts on this...
>
> Succinctly.
>
> I think the concept of mapping uids between user namespaces is
> fundamental to properly describing and thinking about the semantics of
> user namespaces correct.
Earlier I had thought this could just be done using a special keyring,
but atm I'm thinking that would be far uglier than just having a
struct pid-like credential proxy in the kernel to pass around in place
of uids.
> We don't have to start out anything except handling the case when
> no mapping exists, but asking the question how does this uid map
> between from one namespace to another is fundamental.
True.
But in any case I'm happy letting other things like netns and related
sys be completed before prototyping this.
thanks,
-serge
"Serge E. Hallyn" <[email protected]> writes:
> Earlier I had thought this could just be done using a special keyring,
> but atm I'm thinking that would be far uglier than just having a
> struct pid-like credential proxy in the kernel to pass around in place
> of uids.
I have not looked at many of the implementation possibilities so unfortunately
I don't know what makes for a good implementation.
What I do know is that uids are serialized in filesystems, and their
mapping between namespaces is defined by system administrators.
Both of those properties are different from struct pid. Which means
a generalized struct user in the kernel can at best hold a cache of the
mappings.
My preliminary investigations suggested that for the kernel filesystem
boundary generating a struct user or a struct group just to use for a
permission check and then to throw it away was wasteful.
However for inkernel entities a struct user sounds practical.
All of which is to say that we can learn lessons from the
implementation of struct pid but that we also have different
requirements so we can only use those lessons in a limited fashion.
Eric
On Thu, 28 Feb 2008, Ian Kent wrote:
> >
> > We seem to be passing some string into a misc-device ioctl and getting some
> > results back. Be aware that this won't be a terribly popular proposal, so
> > I'd suggest that you fully describe the problem which it's trying to solve,
> > and how it solves it, and why the various alternatives (sysfs, netlink,
> > mount options, etc) were judged unsuitable.
>
> Yes, as I said above.
>
> I don't expect that people that aren't close to the development of
> autofs will "get" the problem description in the leading post but I will
> try and expand on it as best I can.
>
> As for the possible alternatives, it sounds like I have some more work
> to do on that. Mount options can't be used as I described in the lead in
> post and, as far as my understanding of sysfs goes, I don't think it's
> appropriate. But, I'm not aware of what the netlink interface may be
> able to do for me so I will need to check on that.
>
I've attempted an implementation of this using the Generic Netlink
interface and I've struck a difficulty. I would like to use current
recommended development policy but I'm not sure how far I should go with
this in order to avoid using an ioctl implementation so I'm after some
advice and suggestions.
So, onto the description.
Sorry about the length of the post.
Autofs user space uses a number of ioctls for mount control.
I have a problem that can only be resolved by adding an additional control
function that doesn't need to open the autofs mount point path directly
to issue ioctl commands. So I decided to re-implement the the control
interface, hopefully, in a cleaner way and solve a couple of other
problems at the same time (by adding two additional control functions
giving a total of three new functions) and improving one of the existing
functions.
My initial proposal used a miscellaneous device to route ioctl commands to
autofs mounts and the question of why current recommended alternatives
were not suitable was asked. The only alternative that may be suitable is
the Netlink interface.
I won't go into the details of the new functions now but focus on the
difficulty I have found implementing one of the existing functions using
the Netlink interface.
There are some restrictions on the scope of the change.
The scope is only the ioctl interface. I don't want change too many things
at once, in particular things that are currently working OK. And I'd like
to retain the existing semantic behavior of the interface.
The function that is a problem is the sending of expire requests. In the
current implementation this function is synchronous. An ioctl is used to
ask the kernel module (autofs4) to check for mounts that can be expired
and, if a candidate is found the module sends a request to the user space
daemon asking it to try and umount the select mount after which the daemon
sends a success or fail status back to the module which marks the
completion of the original ioctl expire request.
The Generic Netlink interface won't allow this because only one concurrent
send request can be active for "all" Generic Netlink Families in use,
since the socket receive function is bracketed by a single mutex. So I would
need to use a workqueue to queue the request but that has it's own set of
problems.
A workqueue can have only one active request going at a time but I may
have a number of concurrent expires happening at any one time and the
request could block for a significant amount of time so I would have to
use multiple queues. Also there isn't a one-to-one correspondence
between autofs super blocks and the stream of expire requests so the
number of needed workqueues isn't known. Hence, a workqueue would need to
be created, used and destroyed for every umount request or some other
elaborate mechanism developed to re-use workqueues. Ideas?
The next issue is that in order to keep track of multiple in flight
requests a separate Netlink socket would need to be opened for every
expire request in order to ensure that the Netlink completion reply makes
it back to the original requesting thread (Is that actually correct?). Not
really such a big problem but it defeats another aim of the
re-implementation, which is to reduce the selinux user space exposure to
file descriptors that are open but don't yet have close-on-exec flag set
when a mount or umount is spawned by the automount daemon. This can
obviously be resolved by adding a mutex around the fork/exec code but
isn't a popular idea due to added performance overhead.
That's about it for a Generic Netlink solution.
It looks like it may be possible to avoid the need to use workqueues by
adding a new netlink protocol. I didn't notice any mutual exclusion
issues in the code but I may not have looked far enough (comments
please?). This approach would still have the issue of needing a new socket
opened for each expire for the tracking multiple requests and a
considerable increase in complexity in the autofs4 module for netlink
communication.
I'm concerned about the effort I would need to devote to make this work
and the increase in complexity that may be needed, especially if the
implementation is ultimately not satisfactory for inclusion in the kernel.
Please help!
Ian
On Thu, 13 Mar 2008, Ian Kent wrote:
Thomas, could you comment on the Netlink related questions I have posed
blow please.
> On Thu, 28 Feb 2008, Ian Kent wrote:
>
> > >
> > > We seem to be passing some string into a misc-device ioctl and getting some
> > > results back. Be aware that this won't be a terribly popular proposal, so
> > > I'd suggest that you fully describe the problem which it's trying to solve,
> > > and how it solves it, and why the various alternatives (sysfs, netlink,
> > > mount options, etc) were judged unsuitable.
> >
> > Yes, as I said above.
> >
> > I don't expect that people that aren't close to the development of
> > autofs will "get" the problem description in the leading post but I will
> > try and expand on it as best I can.
> >
> > As for the possible alternatives, it sounds like I have some more work
> > to do on that. Mount options can't be used as I described in the lead in
> > post and, as far as my understanding of sysfs goes, I don't think it's
> > appropriate. But, I'm not aware of what the netlink interface may be
> > able to do for me so I will need to check on that.
> >
>
> I've attempted an implementation of this using the Generic Netlink
> interface and I've struck a difficulty. I would like to use current
> recommended development policy but I'm not sure how far I should go with
> this in order to avoid using an ioctl implementation so I'm after some
> advice and suggestions.
>
> So, onto the description.
> Sorry about the length of the post.
>
> Autofs user space uses a number of ioctls for mount control.
>
> I have a problem that can only be resolved by adding an additional control
> function that doesn't need to open the autofs mount point path directly
> to issue ioctl commands. So I decided to re-implement the the control
> interface, hopefully, in a cleaner way and solve a couple of other
> problems at the same time (by adding two additional control functions
> giving a total of three new functions) and improving one of the existing
> functions.
>
> My initial proposal used a miscellaneous device to route ioctl commands to
> autofs mounts and the question of why current recommended alternatives
> were not suitable was asked. The only alternative that may be suitable is
> the Netlink interface.
>
> I won't go into the details of the new functions now but focus on the
> difficulty I have found implementing one of the existing functions using
> the Netlink interface.
>
> There are some restrictions on the scope of the change.
> The scope is only the ioctl interface. I don't want change too many things
> at once, in particular things that are currently working OK. And I'd like
> to retain the existing semantic behavior of the interface.
>
> The function that is a problem is the sending of expire requests. In the
> current implementation this function is synchronous. An ioctl is used to
> ask the kernel module (autofs4) to check for mounts that can be expired
> and, if a candidate is found the module sends a request to the user space
> daemon asking it to try and umount the select mount after which the daemon
> sends a success or fail status back to the module which marks the
> completion of the original ioctl expire request.
>
> The Generic Netlink interface won't allow this because only one concurrent
> send request can be active for "all" Generic Netlink Families in use,
> since the socket receive function is bracketed by a single mutex. So I would
> need to use a workqueue to queue the request but that has it's own set of
> problems.
>
> A workqueue can have only one active request going at a time but I may
> have a number of concurrent expires happening at any one time and the
> request could block for a significant amount of time so I would have to
> use multiple queues. Also there isn't a one-to-one correspondence
> between autofs super blocks and the stream of expire requests so the
> number of needed workqueues isn't known. Hence, a workqueue would need to
> be created, used and destroyed for every umount request or some other
> elaborate mechanism developed to re-use workqueues. Ideas?
>
> The next issue is that in order to keep track of multiple in flight
> requests a separate Netlink socket would need to be opened for every
> expire request in order to ensure that the Netlink completion reply makes
> it back to the original requesting thread (Is that actually correct?). Not
> really such a big problem but it defeats another aim of the
> re-implementation, which is to reduce the selinux user space exposure to
> file descriptors that are open but don't yet have close-on-exec flag set
> when a mount or umount is spawned by the automount daemon. This can
> obviously be resolved by adding a mutex around the fork/exec code but
> isn't a popular idea due to added performance overhead.
>
> That's about it for a Generic Netlink solution.
>
> It looks like it may be possible to avoid the need to use workqueues by
> adding a new netlink protocol. I didn't notice any mutual exclusion
> issues in the code but I may not have looked far enough (comments
> please?). This approach would still have the issue of needing a new socket
> opened for each expire for the tracking multiple requests and a
> considerable increase in complexity in the autofs4 module for netlink
> communication.
>
> I'm concerned about the effort I would need to devote to make this work
> and the increase in complexity that may be needed, especially if the
> implementation is ultimately not satisfactory for inclusion in the kernel.
>
> Please help!
> Ian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
* Ian Kent <[email protected]> 2008-03-13 16:00
> The function that is a problem is the sending of expire requests. In the
> current implementation this function is synchronous. An ioctl is used to
> ask the kernel module (autofs4) to check for mounts that can be expired
> and, if a candidate is found the module sends a request to the user space
> daemon asking it to try and umount the select mount after which the daemon
> sends a success or fail status back to the module which marks the
> completion of the original ioctl expire request.
>
> The Generic Netlink interface won't allow this because only one concurrent
> send request can be active for "all" Generic Netlink Families in use,
> since the socket receive function is bracketed by a single mutex. So I would
> need to use a workqueue to queue the request but that has it's own set of
> problems.
>
> The next issue is that in order to keep track of multiple in flight
> requests a separate Netlink socket would need to be opened for every
> expire request in order to ensure that the Netlink completion reply makes
> it back to the original requesting thread (Is that actually correct?). Not
> really such a big problem but it defeats another aim of the
> re-implementation, which is to reduce the selinux user space exposure to
> file descriptors that are open but don't yet have close-on-exec flag set
> when a mount or umount is spawned by the automount daemon. This can
> obviously be resolved by adding a mutex around the fork/exec code but
> isn't a popular idea due to added performance overhead.
Netlink is a messaging protocol, synchronization is up to the user.
I suggest that you send a netlink notification to a multicast group for
every expire candiate when an expire request is received. Unmount
daemons simply subscribe to the group and wait for work to do. Put the
request onto a list including the netlink pid and sequence number so you
can address the orignal source of the request later on. Exit the netlink
receive function and wait for the userspace daemon to get back to you.
The userspace daemon notifies you of successful or unsuccesful unmount
attempts by sending notifications. Update your list entry accordingly
and once the request is fullfilled send a notification to the original
source of the request by using the stored pid and sequence number.
The userspace application requesting the expire can simply block on the
receival of this notification in order to make the whole operation
synchronous.
Sounds acceptable?
On Fri, 2008-03-14 at 13:45 +0100, Thomas Graf wrote:
> * Ian Kent <[email protected]> 2008-03-13 16:00
> > The function that is a problem is the sending of expire requests. In the
> > current implementation this function is synchronous. An ioctl is used to
> > ask the kernel module (autofs4) to check for mounts that can be expired
> > and, if a candidate is found the module sends a request to the user space
> > daemon asking it to try and umount the select mount after which the daemon
> > sends a success or fail status back to the module which marks the
> > completion of the original ioctl expire request.
> >
> > The Generic Netlink interface won't allow this because only one concurrent
> > send request can be active for "all" Generic Netlink Families in use,
> > since the socket receive function is bracketed by a single mutex. So I would
> > need to use a workqueue to queue the request but that has it's own set of
> > problems.
> >
> > The next issue is that in order to keep track of multiple in flight
> > requests a separate Netlink socket would need to be opened for every
> > expire request in order to ensure that the Netlink completion reply makes
> > it back to the original requesting thread (Is that actually correct?). Not
> > really such a big problem but it defeats another aim of the
> > re-implementation, which is to reduce the selinux user space exposure to
> > file descriptors that are open but don't yet have close-on-exec flag set
> > when a mount or umount is spawned by the automount daemon. This can
> > obviously be resolved by adding a mutex around the fork/exec code but
> > isn't a popular idea due to added performance overhead.
>
> Netlink is a messaging protocol, synchronization is up to the user.
Yes, I realize that, but what I'm curious about are the options that I
have within the messaging system to control delivery of message replies,
other than using separate sockets. Can this be achieved by using the pid
set in the originating message?
>
> I suggest that you send a netlink notification to a multicast group for
> every expire candiate when an expire request is received. Unmount
> daemons simply subscribe to the group and wait for work to do. Put the
> request onto a list including the netlink pid and sequence number so you
> can address the orignal source of the request later on. Exit the netlink
> receive function and wait for the userspace daemon to get back to you.
I'll have to think about what you've said here to relate it to the
situation I have. I don't actually have umount daemons, at the moment I
request an expire and the daemon creates a thread to do the umount and
sends a status message to the kernel module. But that may not matter,
see below.
>
> The userspace daemon notifies you of successful or unsuccesful unmount
> attempts by sending notifications. Update your list entry accordingly
> and once the request is fullfilled send a notification to the original
> source of the request by using the stored pid and sequence number.
>
> The userspace application requesting the expire can simply block on the
> receival of this notification in order to make the whole operation
> synchronous.
Actually, I've progressed on this since posting.
I've implemented the first steps toward using a workqueue to perform the
expire and, to my surprise, my code worked for a simple test case.
Basically, a thread in the daemon issues the expire, the kernel module
queues the work and replies. The expire workqueue task does the expire
check and if no candidates are found it sends an expire complete
notification, or it sends a umount request to the daemon and waits for
the status result, then returns that result as the expire compete
notification. Seems to work quite well.
I expect this is possibly the method you're suggesting above anyway.
Unfortunately, having now stepped up intensity of the testing, I'm
getting a hard hang on my system. I've setup to reduce the message
functions used to only two simple notification messages to the kernel
module to ensure it isn't the expire implementation causing the problem.
It's hard to see where I could have messed these two functions as they
are essentially re-entrant but there is fairly heavy mount activity of
about 10-15 mounts a second. Such is life!
Any ideas on what might be causing this?
Ian