2008-11-20 14:13:15

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET] FUSE: extend FUSE to support more operations, take #2


Hello, all.

This patchset is the second take of extend-FUSE patchset. Changes
from the last take[L] are...

* add-include-protectors and implement-nonseekable-open already merged
and thus dropped from this patchset.

* pass-nonblock-flag-to-client dropped in favor of fi.flags &
O_NONBLOCK test.

* FUSE_MINOR moved to miscdevice.h

* ioctl support updated to include restricted mode where only well
formed ioctls are supported and no retry is necesary or allowed.
All FUSE servers are forced to use restricted mode ioctl.

* Per file unique kernel handle, fuse_file->kh, is added and used to
match poll notification to its file. This lifts the requirement
that the FUSE server should supply unique fh to make use of poll.

* Other small fixes and updated to the current tree.

This patchset contains the following five patches.

0001-fuse-move-FUSE_MINOR-to-miscdevice.h
0002-FUSE-implement-ioctl-support
0003-FUSE-add-file-kernel-handle
0004-FUSE-implement-unsolicited-notification
0005-FUSE-implement-poll-support

and is on top of

master (ee2f6cc7f9ea2542ad46070ed62ba7aa04d08871)
+ [1] poll-allow-f_op_poll-to-sleep-take-2
+ [2] add-cdev_release-and-convert-cdev_alloc-to-use-it

This patchset is also available in the following git tree.

http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=extend-fuse
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git extend-fuse

Combined diffstat follows.

fs/fuse/dev.c | 49 +++++
fs/fuse/dir.c | 2
fs/fuse/file.c | 416 ++++++++++++++++++++++++++++++++++++++++++++-
fs/fuse/fuse_i.h | 28 ++-
fs/fuse/inode.c | 2
include/linux/fuse.h | 64 ++++++
include/linux/miscdevice.h | 42 ++--
7 files changed, 573 insertions(+), 30 deletions(-)

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/727161
[1] http://lkml.org/lkml/2008/11/20/161
[2] http://article.gmane.org/gmane.linux.kernel/727133


2008-11-20 14:12:19

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/5] fuse: move FUSE_MINOR to miscdevice.h

Move FUSE_MINOR to miscdevice.h. While at it, de-uglify the file.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/fuse.h | 3 ---
include/linux/miscdevice.h | 42 +++++++++++++++++++++---------------------
2 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index 350fe97..7caa473 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -40,9 +40,6 @@
/** The major number of the fuse character device */
#define FUSE_MAJOR MISC_MAJOR

-/** The minor number of the fuse character device */
-#define FUSE_MINOR 229
-
/* Make sure all structures are padded to 64bit boundary, so 32bit
userspace works under 64bit kernels */

diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 26433ec..0be9245 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -3,33 +3,33 @@
#include <linux/module.h>
#include <linux/major.h>

-#define PSMOUSE_MINOR 1
-#define MS_BUSMOUSE_MINOR 2
-#define ATIXL_BUSMOUSE_MINOR 3
-/*#define AMIGAMOUSE_MINOR 4 FIXME OBSOLETE */
-#define ATARIMOUSE_MINOR 5
-#define SUN_MOUSE_MINOR 6
-#define APOLLO_MOUSE_MINOR 7
-#define PC110PAD_MINOR 9
-/*#define ADB_MOUSE_MINOR 10 FIXME OBSOLETE */
+#define PSMOUSE_MINOR 1
+#define MS_BUSMOUSE_MINOR 2
+#define ATIXL_BUSMOUSE_MINOR 3
+/*#define AMIGAMOUSE_MINOR 4 FIXME OBSOLETE */
+#define ATARIMOUSE_MINOR 5
+#define SUN_MOUSE_MINOR 6
+#define APOLLO_MOUSE_MINOR 7
+#define PC110PAD_MINOR 9
+/*#define ADB_MOUSE_MINOR 10 FIXME OBSOLETE */
#define WATCHDOG_MINOR 130 /* Watchdog timer */
#define TEMP_MINOR 131 /* Temperature Sensor */
-#define RTC_MINOR 135
+#define RTC_MINOR 135
#define EFI_RTC_MINOR 136 /* EFI Time services */
-#define SUN_OPENPROM_MINOR 139
+#define SUN_OPENPROM_MINOR 139
#define DMAPI_MINOR 140 /* DMAPI */
-#define NVRAM_MINOR 144
-#define SGI_MMTIMER 153
+#define NVRAM_MINOR 144
+#define SGI_MMTIMER 153
#define STORE_QUEUE_MINOR 155
-#define I2O_MINOR 166
+#define I2O_MINOR 166
#define MICROCODE_MINOR 184
-#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
-#define MPT_MINOR 220
-#define MISC_DYNAMIC_MINOR 255
-
-#define TUN_MINOR 200
-#define HPET_MINOR 228
-#define KVM_MINOR 232
+#define TUN_MINOR 200
+#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
+#define MPT_MINOR 220
+#define HPET_MINOR 228
+#define FUSE_MINOR 229
+#define KVM_MINOR 232
+#define MISC_DYNAMIC_MINOR 255

struct device;

--
1.5.6

2008-11-20 14:12:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/5] FUSE: implement unsolicited notification

Clients always used to write only in response to read requests. To
implement poll efficiently, clients should be able to issue
unsolicited notifications. This patch implements basic notification
support.

Zero fuse_out_header.unique is now accepted and considered unsolicited
notification and the error field contains notification code. This
patch doesn't implement any actual notification.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/dev.c | 34 ++++++++++++++++++++++++++++++++--
include/linux/fuse.h | 4 ++++
2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index b723614..1408b18 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -813,6 +813,21 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
return err;
}

+static int fuse_handle_notify(struct fuse_conn *fc, enum fuse_notify_code code,
+ unsigned int size, struct fuse_copy_state *cs)
+{
+ int err;
+
+ switch (code) {
+ default:
+ err = -EINVAL;
+ break;
+ }
+
+ fuse_copy_finish(cs);
+ return err;
+}
+
/* Look up request on processing list by unique ID */
static struct fuse_req *request_find(struct fuse_conn *fc, u64 unique)
{
@@ -876,9 +891,24 @@ static ssize_t fuse_dev_write(struct kiocb *iocb, const struct iovec *iov,
err = fuse_copy_one(&cs, &oh, sizeof(oh));
if (err)
goto err_finish;
+
+ if (oh.len != nbytes)
+ goto err_finish;
+
+ /*
+ * Zero oh.unique indicates unsolicited notification message
+ * and error contains notification code.
+ */
+ if (!oh.unique) {
+ err = fuse_handle_notify(fc, oh.error, nbytes - sizeof(oh),
+ &cs);
+ if (err)
+ return err;
+ return nbytes;
+ }
+
err = -EINVAL;
- if (!oh.unique || oh.error <= -1000 || oh.error > 0 ||
- oh.len != nbytes)
+ if (oh.error <= -1000 || oh.error > 0)
goto err_finish;

spin_lock(&fc->lock);
diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index 608e300..abde994 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -203,6 +203,10 @@ enum fuse_opcode {
FUSE_IOCTL = 39,
};

+enum fuse_notify_code {
+ FUSE_NOTIFY_CODE_MAX,
+};
+
/* The read buffer is required to be at least 8k, but may be much larger */
#define FUSE_MIN_READ_BUFFER 8192

--
1.5.6

2008-11-20 14:12:49

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/5] FUSE: add file kernel handle

The file handle, fuse_file->fh, is opaque value supplied by userland
FUSE server and uniqueness is not guaranteed. Add file kernel handle,
fuse_file->kh, which is allocated by the kernel on file allocation and
guaranteed to be unique.

This will be used by poll to match notification to the respective file
but can be used for other purposes where unique file handle is
necessary.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 8 ++++++--
fs/fuse/fuse_i.h | 8 +++++++-
fs/fuse/inode.c | 1 +
4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index fd03330..7c6821e 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -408,7 +408,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, int mode,
goto out_put_forget_req;

err = -ENOMEM;
- ff = fuse_file_alloc();
+ ff = fuse_file_alloc(fc);
if (!ff)
goto out_put_request;

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 71fde8b..1f662f1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -46,7 +46,7 @@ static int fuse_send_open(struct inode *inode, struct file *file, int isdir,
return err;
}

-struct fuse_file *fuse_file_alloc(void)
+struct fuse_file *fuse_file_alloc(struct fuse_conn *fc)
{
struct fuse_file *ff;
ff = kmalloc(sizeof(struct fuse_file), GFP_KERNEL);
@@ -58,6 +58,9 @@ struct fuse_file *fuse_file_alloc(void)
} else {
INIT_LIST_HEAD(&ff->write_entry);
atomic_set(&ff->count, 0);
+ spin_lock(&fc->lock);
+ ff->kh = ++fc->khctr;
+ spin_unlock(&fc->lock);
}
}
return ff;
@@ -109,6 +112,7 @@ void fuse_finish_open(struct inode *inode, struct file *file,

int fuse_open_common(struct inode *inode, struct file *file, int isdir)
{
+ struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_open_out outarg;
struct fuse_file *ff;
int err;
@@ -121,7 +125,7 @@ int fuse_open_common(struct inode *inode, struct file *file, int isdir)
if (err)
return err;

- ff = fuse_file_alloc();
+ ff = fuse_file_alloc(fc);
if (!ff)
return -ENOMEM;

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 35accfd..fdfdfb0 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -100,6 +100,9 @@ struct fuse_file {
/** Request reserved for flush and release */
struct fuse_req *reserved_req;

+ /** Kernel file handle guaranteed to be unique */
+ u64 kh;
+
/** File handle used by userspace */
u64 fh;

@@ -322,6 +325,9 @@ struct fuse_conn {
/** The list of requests under I/O */
struct list_head io;

+ /** The next unique kernel file handle */
+ u64 khctr;
+
/** Number of requests currently in the background */
unsigned num_background;

@@ -499,7 +505,7 @@ void fuse_read_fill(struct fuse_req *req, struct file *file,
*/
int fuse_open_common(struct inode *inode, struct file *file, int isdir);

-struct fuse_file *fuse_file_alloc(void);
+struct fuse_file *fuse_file_alloc(struct fuse_conn *fc);
void fuse_file_free(struct fuse_file *ff);
void fuse_finish_open(struct inode *inode, struct file *file,
struct fuse_file *ff, struct fuse_open_out *outarg);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2e99f34..9f27441 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -485,6 +485,7 @@ static struct fuse_conn *new_conn(struct super_block *sb)
fc->bdi.unplug_io_fn = default_unplug_io_fn;
/* fuse does it's own writeback accounting */
fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB;
+ fc->khctr = 0;
fc->dev = sb->s_dev;
err = bdi_init(&fc->bdi);
if (err)
--
1.5.6

2008-11-20 14:13:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/5] FUSE: implement ioctl support

Generic ioctl support is tricky to implement because only the ioctl
implementation itself knows which memory regions need to be read
and/or written. To support this, fuse client can request retry of
ioctl specifying memory regions to read and write. Deep copying
(nested pointers) can be implemented by retrying multiple times
resolving one depth of dereference at a time.

For security and cleanliness considerations, ioctl implementation has
restricted mode where the kernel determines data transfer directions
and sizes using the _IOC_*() macros on the ioctl command. In this
mode, retry is not allowed.

For all FUSE servers, restricted mode is enforced. Unrestricted ioctl
will be used by CUSE.

Plese read the comment on top of fs/fuse/file.c::fuse_file_do_ioctl()
for more information.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/file.c | 273 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fuse.h | 32 ++++++
2 files changed, 305 insertions(+), 0 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 34930a9..71fde8b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1470,6 +1470,275 @@ static loff_t fuse_file_llseek(struct file *file, loff_t offset, int origin)
return retval;
}

+static int fuse_ioctl_copy_user(struct page **pages, struct iovec *iov,
+ unsigned int nr_segs, size_t bytes, bool to_user)
+{
+ struct iov_iter ii;
+ int page_idx = 0;
+
+ if (!bytes)
+ return 0;
+
+ iov_iter_init(&ii, iov, nr_segs, bytes, 0);
+
+ while (iov_iter_count(&ii)) {
+ struct page *page = pages[page_idx++];
+ size_t todo = min_t(size_t, PAGE_SIZE, iov_iter_count(&ii));
+ void *kaddr, *map;
+
+ kaddr = map = kmap(page);
+
+ while (todo) {
+ char __user *uaddr = ii.iov->iov_base + ii.iov_offset;
+ size_t iov_len = ii.iov->iov_len - ii.iov_offset;
+ size_t copy = min(todo, iov_len);
+ size_t left;
+
+ if (!to_user)
+ left = copy_from_user(kaddr, uaddr, copy);
+ else
+ left = copy_to_user(uaddr, kaddr, copy);
+
+ if (unlikely(left))
+ return -EFAULT;
+
+ iov_iter_advance(&ii, copy);
+ todo -= copy;
+ kaddr += copy;
+ }
+
+ kunmap(map);
+ }
+
+ return 0;
+}
+
+/*
+ * For ioctls, there is no generic way to determine how much memory
+ * needs to be read and/or written. Furthermore, ioctls are allowed
+ * to dereference the passed pointer, so the parameter requires deep
+ * copying but FUSE has no idea whatsoever about what to copy in or
+ * out.
+ *
+ * This is solved by allowing FUSE server to retry ioctl with
+ * necessary in/out iovecs. Let's assume the ioctl implementation
+ * needs to read in the following structure.
+ *
+ * struct a {
+ * char *buf;
+ * size_t buflen;
+ * }
+ *
+ * On the first callout to FUSE server, inarg->in_size and
+ * inarg->out_size will be NULL; then, the server completes the ioctl
+ * with FUSE_IOCTL_RETRY set in out->flags, out->in_iovs set to 1 and
+ * the actual iov array to
+ *
+ * { { .iov_base = inarg.arg, .iov_len = sizeof(struct a) } }
+ *
+ * which tells FUSE to copy in the requested area and retry the ioctl.
+ * On the second round, the server has access to the structure and
+ * from that it can tell what to look for next, so on the invocation,
+ * it sets FUSE_IOCTL_RETRY, out->in_iovs to 2 and iov array to
+ *
+ * { { .iov_base = inarg.arg, .iov_len = sizeof(struct a) },
+ * { .iov_base = a.buf, .iov_len = a.buflen } }
+ *
+ * FUSE will copy both struct a and the pointed buffer from the
+ * process doing the ioctl and retry ioctl with both struct a and the
+ * buffer.
+ *
+ * This time, FUSE server has everything it needs and completes ioctl
+ * without FUSE_IOCTL_RETRY which finishes the ioctl call.
+ *
+ * Copying data out works the same way.
+ *
+ * Note that if FUSE_IOCTL_UNRESTRICTED is clear, the kernel
+ * automatically initializes in and out iovs by decoding @cmd with
+ * _IOC_* macros and the server is not allowed to request RETRY. This
+ * limits ioctl data transfers to well-formed ioctls and is the forced
+ * behavior for all FUSE servers.
+ */
+static long fuse_file_do_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg, unsigned int flags)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct fuse_file *ff = file->private_data;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_ioctl_in inarg = { .fh = ff->fh, .cmd = cmd, .arg = arg,
+ .flags = flags };
+ struct fuse_ioctl_out outarg;
+ struct fuse_req *req = NULL;
+ struct page **pages = NULL;
+ struct page *iov_page = NULL;
+ struct iovec *in_iov = NULL, *out_iov = NULL;
+ unsigned int in_iovs = 0, out_iovs = 0, num_pages = 0, max_pages;
+ size_t in_size, out_size, transferred;
+ int err;
+
+ /* assume all the iovs returned by client always fits in a page */
+ BUILD_BUG_ON(sizeof(struct iovec) * FUSE_IOCTL_MAX_IOV > PAGE_SIZE);
+
+ err = -EIO;
+ if (is_bad_inode(inode))
+ goto out;
+
+ err = -ENOMEM;
+ pages = kzalloc(sizeof(pages[0]) * FUSE_MAX_PAGES_PER_REQ, GFP_KERNEL);
+ iov_page = alloc_page(GFP_KERNEL);
+ if (!pages || !iov_page)
+ goto out;
+
+ /*
+ * If restricted, initialize IO parameters as encoded in @cmd.
+ * RETRY from server is not allowed.
+ */
+ if (!(flags & FUSE_IOCTL_UNRESTRICTED)) {
+ struct iovec *iov = page_address(iov_page);
+
+ iov->iov_base = (void *)arg;
+ iov->iov_len = _IOC_SIZE(cmd);
+
+ if (_IOC_DIR(cmd) & _IOC_READ) {
+ in_iov = iov;
+ in_iovs = 1;
+ }
+
+ if (_IOC_DIR(cmd) & _IOC_WRITE) {
+ out_iov = iov;
+ out_iovs = 1;
+ }
+ }
+
+ retry:
+ inarg.in_size = in_size = iov_length(in_iov, in_iovs);
+ inarg.out_size = out_size = iov_length(out_iov, out_iovs);
+
+ /*
+ * Out data can be used either for actual out data or iovs,
+ * make sure there always is at least one page.
+ */
+ out_size = max_t(size_t, out_size, PAGE_SIZE);
+ max_pages = DIV_ROUND_UP(max(in_size, out_size), PAGE_SIZE);
+
+ /* make sure there are enough buffer pages and init request with them */
+ err = -ENOMEM;
+ if (max_pages > FUSE_MAX_PAGES_PER_REQ)
+ goto out;
+ while (num_pages < max_pages) {
+ pages[num_pages] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
+ if (!pages[num_pages])
+ goto out;
+ num_pages++;
+ }
+
+ req = fuse_get_req(fc);
+ if (IS_ERR(req)) {
+ req = NULL;
+ err = PTR_ERR(req);
+ goto out;
+ }
+ memcpy(req->pages, pages, sizeof(req->pages[0]) * num_pages);
+ req->num_pages = num_pages;
+
+ /* okay, let's send it to the client */
+ req->in.h.opcode = FUSE_IOCTL;
+ req->in.h.nodeid = get_node_id(inode);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(inarg);
+ req->in.args[0].value = &inarg;
+ if (in_size) {
+ req->in.numargs++;
+ req->in.args[1].size = in_size;
+ req->in.argpages = 1;
+
+ err = fuse_ioctl_copy_user(pages, in_iov, in_iovs, in_size,
+ false);
+ if (err)
+ goto out;
+ }
+
+ req->out.numargs = 2;
+ req->out.args[0].size = sizeof(outarg);
+ req->out.args[0].value = &outarg;
+ req->out.args[1].size = out_size;
+ req->out.argpages = 1;
+ req->out.argvar = 1;
+
+ request_send(fc, req);
+ err = req->out.h.error;
+ transferred = req->out.args[1].size;
+ fuse_put_request(fc, req);
+ req = NULL;
+ if (err)
+ goto out;
+
+ /* did it ask for retry? */
+ if (outarg.flags & FUSE_IOCTL_RETRY) {
+ char *vaddr;
+
+ /* no retry if in restricted mode */
+ err = -EBADE;
+ if (!(flags & FUSE_IOCTL_UNRESTRICTED))
+ goto out;
+
+ in_iovs = outarg.in_iovs;
+ out_iovs = outarg.out_iovs;
+
+ /*
+ * Make sure things are in boundary, separate checks
+ * are to protect against overflow.
+ */
+ err = -ENOMEM;
+ if (in_iovs > FUSE_IOCTL_MAX_IOV ||
+ out_iovs > FUSE_IOCTL_MAX_IOV ||
+ in_iovs + out_iovs > FUSE_IOCTL_MAX_IOV)
+ goto out;
+
+ err = -EBADE;
+ if ((in_iovs + out_iovs) * sizeof(struct iovec) != transferred)
+ goto out;
+
+ /* okay, copy in iovs and retry */
+ vaddr = kmap_atomic(pages[0], KM_USER0);
+ memcpy(page_address(iov_page), vaddr, transferred);
+ kunmap_atomic(vaddr, KM_USER0);
+
+ in_iov = page_address(iov_page);
+ out_iov = in_iov + in_iovs;
+
+ goto retry;
+ }
+
+ err = -EBADE;
+ if (transferred > inarg.out_size)
+ goto out;
+
+ err = fuse_ioctl_copy_user(pages, out_iov, out_iovs, transferred, true);
+ out:
+ if (req)
+ fuse_put_request(fc, req);
+ if (iov_page)
+ __free_page(iov_page);
+ while (num_pages)
+ __free_page(pages[--num_pages]);
+ kfree(pages);
+
+ return err ? err : outarg.result;
+}
+
+static long fuse_file_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ return fuse_file_do_ioctl(file, cmd, arg, 0);
+}
+
+static long fuse_file_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ return fuse_file_do_ioctl(file, cmd, arg, FUSE_IOCTL_COMPAT);
+}
+
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read = do_sync_read,
@@ -1484,6 +1753,8 @@ static const struct file_operations fuse_file_operations = {
.lock = fuse_file_lock,
.flock = fuse_file_flock,
.splice_read = generic_file_splice_read,
+ .unlocked_ioctl = fuse_file_ioctl,
+ .compat_ioctl = fuse_file_compat_ioctl,
};

static const struct file_operations fuse_direct_io_file_operations = {
@@ -1496,6 +1767,8 @@ static const struct file_operations fuse_direct_io_file_operations = {
.fsync = fuse_fsync,
.lock = fuse_file_lock,
.flock = fuse_file_flock,
+ .unlocked_ioctl = fuse_file_ioctl,
+ .compat_ioctl = fuse_file_compat_ioctl,
/* no mmap and splice_read */
};

diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index 7caa473..608e300 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -148,6 +148,21 @@ struct fuse_file_lock {
*/
#define FUSE_READ_LOCKOWNER (1 << 1)

+/**
+ * Ioctl flags
+ *
+ * FUSE_IOCTL_COMPAT: 32bit compat ioctl on 64bit machine
+ * FUSE_IOCTL_UNRESTRICTED: not restricted to well-formed ioctls, retry allowed
+ * FUSE_IOCTL_RETRY: retry with new iovecs
+ *
+ * FUSE_IOCTL_MAX_IOV: maximum of in_iovecs + out_iovecs
+ */
+#define FUSE_IOCTL_COMPAT (1 << 0)
+#define FUSE_IOCTL_UNRESTRICTED (1 << 1)
+#define FUSE_IOCTL_RETRY (1 << 2)
+
+#define FUSE_IOCTL_MAX_IOV 256
+
enum fuse_opcode {
FUSE_LOOKUP = 1,
FUSE_FORGET = 2, /* no reply */
@@ -185,6 +200,7 @@ enum fuse_opcode {
FUSE_INTERRUPT = 36,
FUSE_BMAP = 37,
FUSE_DESTROY = 38,
+ FUSE_IOCTL = 39,
};

/* The read buffer is required to be at least 8k, but may be much larger */
@@ -385,6 +401,22 @@ struct fuse_bmap_out {
__u64 block;
};

+struct fuse_ioctl_in {
+ __u64 fh;
+ __u32 flags;
+ __u32 cmd;
+ __u64 arg;
+ __u32 in_size;
+ __u32 out_size;
+};
+
+struct fuse_ioctl_out {
+ __s32 result;
+ __u32 flags;
+ __u32 in_iovs;
+ __u32 out_iovs;
+};
+
struct fuse_in_header {
__u32 len;
__u32 opcode;
--
1.5.6

2008-11-20 14:13:52

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/5] FUSE: implement poll support

Implement poll support. Polled files are indexed using fh in a RB
tree rooted at fuse_conn->polled_files. All pollable files should
have unique fh as that's how notifications are matched to files. If
duplicate fhs are detected, FUSE spits out warning message. Poll will
malfunction but otherwise it will work fine.

Client should send FUSE_NOTIFY_POLL notification once after processing
FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending
notification unconditionally after the latest poll or everytime file
content might have changed is inefficient but won't cause malfunction.

fuse_file_poll() can sleep and requires patches from the following
thread which allows f_op->poll() to sleep.

http://thread.gmane.org/gmane.linux.kernel/726176

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/dev.c | 15 ++++++
fs/fuse/file.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/fuse_i.h | 20 +++++++
fs/fuse/inode.c | 1 +
include/linux/fuse.h | 25 +++++++++
5 files changed, 196 insertions(+), 0 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1408b18..b71cc89 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -819,11 +819,26 @@ static int fuse_handle_notify(struct fuse_conn *fc, enum fuse_notify_code code,
int err;

switch (code) {
+ case FUSE_NOTIFY_POLL: {
+ struct fuse_notify_poll_wakeup_out outarg;
+
+ err = -EINVAL;
+ if (size != sizeof(outarg))
+ goto out;
+
+ err = fuse_copy_one(cs, &outarg, sizeof(outarg));
+ if (err)
+ goto out;
+
+ err = fuse_notify_poll_wakeup(fc, &outarg);
+ break;
+ }
default:
err = -EINVAL;
break;
}

+out:
fuse_copy_finish(cs);
return err;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1f662f1..3ea9304 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -62,6 +62,8 @@ struct fuse_file *fuse_file_alloc(struct fuse_conn *fc)
ff->kh = ++fc->khctr;
spin_unlock(&fc->lock);
}
+ RB_CLEAR_NODE(&ff->polled_node);
+ init_waitqueue_head(&ff->poll_wait);
}
return ff;
}
@@ -171,7 +173,11 @@ int fuse_release_common(struct inode *inode, struct file *file, int isdir)

spin_lock(&fc->lock);
list_del(&ff->write_entry);
+ if (!RB_EMPTY_NODE(&ff->polled_node))
+ rb_erase(&ff->polled_node, &fc->polled_files);
spin_unlock(&fc->lock);
+
+ wake_up_interruptible_sync(&ff->poll_wait);
/*
* Normally this will send the RELEASE request,
* however if some asynchronous READ or WRITE requests
@@ -1743,6 +1749,133 @@ static long fuse_file_compat_ioctl(struct file *file, unsigned int cmd,
return fuse_file_do_ioctl(file, cmd, arg, FUSE_IOCTL_COMPAT);
}

+/*
+ * All files which have been polled are linked to RB tree
+ * fuse_conn->polled_files which is indexed by kh. Walk the tree and
+ * find the matching one.
+ */
+static struct rb_node **fuse_find_polled_node(struct fuse_conn *fc, u64 kh,
+ struct rb_node **parent_out)
+{
+ struct rb_node **link = &fc->polled_files.rb_node;
+ struct rb_node *last = NULL;
+
+ while (*link) {
+ struct fuse_file *ff = rb_entry(*link, struct fuse_file,
+ polled_node);
+ last = *link;
+
+ if (kh < ff->kh)
+ link = &(*link)->rb_left;
+ else if (kh > ff->kh)
+ link = &(*link)->rb_right;
+ else
+ return link;
+ }
+
+ if (parent_out)
+ *parent_out = last;
+ return link;
+}
+
+/*
+ * The file is about to be polled. Make sure it's on the polled_files
+ * RB tree. Note that files once added to the polled_files tree are
+ * not removed before the file is released. This is because a file
+ * polled once is likely to be polled again.
+ */
+static void fuse_register_polled_file(struct fuse_conn *fc,
+ struct fuse_file *ff)
+{
+ if (!RB_EMPTY_NODE(&ff->polled_node))
+ return;
+
+ spin_lock(&fc->lock);
+
+ if (RB_EMPTY_NODE(&ff->polled_node)) {
+ struct rb_node **link, *parent;
+
+ link = fuse_find_polled_node(fc, ff->kh, &parent);
+ BUG_ON(*link);
+ rb_link_node(&ff->polled_node, parent, link);
+ rb_insert_color(&ff->polled_node, &fc->polled_files);
+ }
+
+ spin_unlock(&fc->lock);
+}
+
+static unsigned fuse_file_poll(struct file *file, poll_table *wait)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct fuse_file *ff = file->private_data;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_poll_in inarg = { .fh = ff->fh, .kh = ff->kh };
+ struct fuse_poll_out outarg;
+ struct fuse_req *req;
+ int err;
+
+ if (fc->no_poll)
+ return DEFAULT_POLLMASK;
+
+ poll_wait(file, &ff->poll_wait, wait);
+
+ /*
+ * Ask for notification iff there's someone waiting for it.
+ * The client may ignore the flag and always notify.
+ */
+ if (waitqueue_active(&ff->poll_wait)) {
+ inarg.flags |= FUSE_POLL_SCHEDULE_NOTIFY;
+ fuse_register_polled_file(fc, ff);
+ }
+
+ req = fuse_get_req(fc);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ req->in.h.opcode = FUSE_POLL;
+ req->in.h.nodeid = get_node_id(inode);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(inarg);
+ req->in.args[0].value = &inarg;
+ req->out.numargs = 1;
+ req->out.args[0].size = sizeof(outarg);
+ req->out.args[0].value = &outarg;
+ request_send(fc, req);
+ err = req->out.h.error;
+ fuse_put_request(fc, req);
+
+ if (!err)
+ return outarg.revents;
+ if (err == -ENOSYS) {
+ fc->no_poll = 1;
+ return DEFAULT_POLLMASK;
+ }
+ return POLLERR;
+}
+
+/*
+ * This is called from fuse_handle_notify() on FUSE_NOTIFY_POLL and
+ * wakes up the poll waiters.
+ */
+int fuse_notify_poll_wakeup(struct fuse_conn *fc,
+ struct fuse_notify_poll_wakeup_out *outarg)
+{
+ u64 kh = outarg->kh;
+ struct rb_node **link;
+
+ spin_lock(&fc->lock);
+
+ link = fuse_find_polled_node(fc, kh, NULL);
+ if (*link) {
+ struct fuse_file *ff = rb_entry(*link, struct fuse_file,
+ polled_node);
+ wake_up_interruptible_sync(&ff->poll_wait);
+ }
+
+ spin_unlock(&fc->lock);
+ return 0;
+}
+
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read = do_sync_read,
@@ -1759,6 +1892,7 @@ static const struct file_operations fuse_file_operations = {
.splice_read = generic_file_splice_read,
.unlocked_ioctl = fuse_file_ioctl,
.compat_ioctl = fuse_file_compat_ioctl,
+ .poll = fuse_file_poll,
};

static const struct file_operations fuse_direct_io_file_operations = {
@@ -1773,6 +1907,7 @@ static const struct file_operations fuse_direct_io_file_operations = {
.flock = fuse_file_flock,
.unlocked_ioctl = fuse_file_ioctl,
.compat_ioctl = fuse_file_compat_ioctl,
+ .poll = fuse_file_poll,
/* no mmap and splice_read */
};

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index fdfdfb0..66f2400 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -19,6 +19,8 @@
#include <linux/backing-dev.h>
#include <linux/mutex.h>
#include <linux/rwsem.h>
+#include <linux/rbtree.h>
+#include <linux/poll.h>

/** Max number of pages that can be used in a single read request */
#define FUSE_MAX_PAGES_PER_REQ 32
@@ -111,6 +113,12 @@ struct fuse_file {

/** Entry on inode's write_files list */
struct list_head write_entry;
+
+ /** RB node to be linked on fuse_conn->polled_files */
+ struct rb_node polled_node;
+
+ /** Wait queue head for poll */
+ wait_queue_head_t poll_wait;
};

/** One input argument of a request */
@@ -328,6 +336,9 @@ struct fuse_conn {
/** The next unique kernel file handle */
u64 khctr;

+ /** rbtree of fuse_files waiting for poll events indexed by ph */
+ struct rb_root polled_files;
+
/** Number of requests currently in the background */
unsigned num_background;

@@ -416,6 +427,9 @@ struct fuse_conn {
/** Is bmap not implemented by fs? */
unsigned no_bmap : 1;

+ /** Is poll not implemented by fs? */
+ unsigned no_poll : 1;
+
/** Do multi-page cached writes */
unsigned big_writes : 1;

@@ -525,6 +539,12 @@ int fuse_fsync_common(struct file *file, struct dentry *de, int datasync,
int isdir);

/**
+ * Notify poll wakeup
+ */
+int fuse_notify_poll_wakeup(struct fuse_conn *fc,
+ struct fuse_notify_poll_wakeup_out *outarg);
+
+/**
* Initialize file operations on a regular file
*/
void fuse_init_file_inode(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9f27441..77fcedb 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -486,6 +486,7 @@ static struct fuse_conn *new_conn(struct super_block *sb)
/* fuse does it's own writeback accounting */
fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB;
fc->khctr = 0;
+ fc->polled_files = RB_ROOT;
fc->dev = sb->s_dev;
err = bdi_init(&fc->bdi);
if (err)
diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index abde994..5650cf0 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -163,6 +163,13 @@ struct fuse_file_lock {

#define FUSE_IOCTL_MAX_IOV 256

+/**
+ * Poll flags
+ *
+ * FUSE_POLL_SCHEDULE_NOTIFY: request poll notify
+ */
+#define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0)
+
enum fuse_opcode {
FUSE_LOOKUP = 1,
FUSE_FORGET = 2, /* no reply */
@@ -201,9 +208,11 @@ enum fuse_opcode {
FUSE_BMAP = 37,
FUSE_DESTROY = 38,
FUSE_IOCTL = 39,
+ FUSE_POLL = 40,
};

enum fuse_notify_code {
+ FUSE_NOTIFY_POLL = 1,
FUSE_NOTIFY_CODE_MAX,
};

@@ -421,6 +430,22 @@ struct fuse_ioctl_out {
__u32 out_iovs;
};

+struct fuse_poll_in {
+ __u64 fh;
+ __u64 kh;
+ __u32 flags;
+ __u32 padding;
+};
+
+struct fuse_poll_out {
+ __u32 revents;
+ __u32 padding;
+};
+
+struct fuse_notify_poll_wakeup_out {
+ __u64 kh;
+};
+
struct fuse_in_header {
__u32 len;
__u32 opcode;
--
1.5.6

2008-11-20 20:23:22

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

Hi Tejun,

On Thu, 20 Nov 2008, Tejun Heo wrote:
> This patchset is the second take of extend-FUSE patchset. Changes
> from the last take[L] are...

I managed to review and apply this and the libfuse patches up to the
ioctl support. After swithing _IOC_READ with _IOC_WRITE in the kernel
the example program actually worked :)

I removed ->unrestricted_ioctl() and associated code because it really
doesn't make any sense: the high level lib won't be used for CUSE
stuff, otherwise unrestrited ioctls are not allowed (and the interface
is rather horrible anyway).

The poll patches look OK at a first glance too, I'll review them
tomorrow.

Thanks,
Miklos

2008-11-21 10:08:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

Hello, Miklos.

Miklos Szeredi wrote:
> Hi Tejun,
>
> On Thu, 20 Nov 2008, Tejun Heo wrote:
>> This patchset is the second take of extend-FUSE patchset. Changes
>> from the last take[L] are...
>
> I managed to review and apply this and the libfuse patches up to the
> ioctl support. After swithing _IOC_READ with _IOC_WRITE in the kernel
> the example program actually worked :)

Did I get that wrong? Looking at it again.... oops. It's reverse
alright. Thanks for catching it.

> I removed ->unrestricted_ioctl() and associated code because it really
> doesn't make any sense: the high level lib won't be used for CUSE
> stuff, otherwise unrestrited ioctls are not allowed (and the interface
> is rather horrible anyway).

Well, CUSE highlevel interface piggy backs on FUSE so it requires
unrestricted_ioctl() there for it and ossp does use it.

> The poll patches look OK at a first glance too, I'll review them
> tomorrow.

Alright, thanks.

--
tejun

2008-11-21 11:05:26

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

On Fri, 21 Nov 2008, Tejun Heo wrote:
> > I removed ->unrestricted_ioctl() and associated code because it really
> > doesn't make any sense: the high level lib won't be used for CUSE
> > stuff, otherwise unrestrited ioctls are not allowed (and the interface
> > is rather horrible anyway).
>
> Well, CUSE highlevel interface piggy backs on FUSE so it requires
> unrestricted_ioctl() there for it and ossp does use it.

I thought it uses the lowlevel interface. Why doesn't it do that?

For CUSE there's really no point in going through high level
interface, since there's just one file involved, so the path name
generation (the main feature of the highlevel lib) doesn't make any
sense.

Thanks,
Miklos

2008-11-21 13:06:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

Miklos Szeredi wrote:
> On Fri, 21 Nov 2008, Tejun Heo wrote:
>>> I removed ->unrestricted_ioctl() and associated code because it really
>>> doesn't make any sense: the high level lib won't be used for CUSE
>>> stuff, otherwise unrestrited ioctls are not allowed (and the interface
>>> is rather horrible anyway).
>> Well, CUSE highlevel interface piggy backs on FUSE so it requires
>> unrestricted_ioctl() there for it and ossp does use it.
>
> I thought it uses the lowlevel interface. Why doesn't it do that?

Well, because it's simpler that way and people would be more used to it?
It's just easier when you implement a method which returns something
and looks similar to the respective file operation.

> For CUSE there's really no point in going through high level
> interface, since there's just one file involved, so the path name
> generation (the main feature of the highlevel lib) doesn't make any
> sense.

Well, the choice was mostly for convenience as there also are a few
places where high level interface wraps things better a bit. Converting
wouldn't be difficult. Do you think it's important? I think keeping
things as parallel to FUSE as possible is more important.

Thanks.

--
tejun

2008-11-21 13:30:09

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

On Fri, 21 Nov 2008, Tejun Heo wrote:
> Miklos Szeredi wrote:
> > On Fri, 21 Nov 2008, Tejun Heo wrote:
> >>> I removed ->unrestricted_ioctl() and associated code because it really
> >>> doesn't make any sense: the high level lib won't be used for CUSE
> >>> stuff, otherwise unrestrited ioctls are not allowed (and the interface
> >>> is rather horrible anyway).
> >> Well, CUSE highlevel interface piggy backs on FUSE so it requires
> >> unrestricted_ioctl() there for it and ossp does use it.
> >
> > I thought it uses the lowlevel interface. Why doesn't it do that?
>
> Well, because it's simpler that way and people would be more used to it?
> It's just easier when you implement a method which returns something
> and looks similar to the respective file operation.

Ah, that. Yeah, it's more intuitive, but that comes at a price. I'm
not sure that for CUSE it's worth it. As I said the biggest feature
is having paths, the others are not that important (like allocating a
buffer for read, that's really not too complex to do in each CUSE
driver).

> > For CUSE there's really no point in going through high level
> > interface, since there's just one file involved, so the path name
> > generation (the main feature of the highlevel lib) doesn't make any
> > sense.
>
> Well, the choice was mostly for convenience as there also are a few
> places where high level interface wraps things better a bit. Converting
> wouldn't be difficult. Do you think it's important? I think keeping
> things as parallel to FUSE as possible is more important.

I wouldn't care very much, if it weren't for that horrid
unrestricted_ioctl(). Not your fault, the interface is just not well
suited to that.

Miklos

2008-11-21 14:37:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

Miklos Szeredi wrote:
> On Fri, 21 Nov 2008, Tejun Heo wrote:
>> Miklos Szeredi wrote:
>>> On Fri, 21 Nov 2008, Tejun Heo wrote:
>>>>> I removed ->unrestricted_ioctl() and associated code because it really
>>>>> doesn't make any sense: the high level lib won't be used for CUSE
>>>>> stuff, otherwise unrestrited ioctls are not allowed (and the interface
>>>>> is rather horrible anyway).
>>>> Well, CUSE highlevel interface piggy backs on FUSE so it requires
>>>> unrestricted_ioctl() there for it and ossp does use it.
>>> I thought it uses the lowlevel interface. Why doesn't it do that?
>> Well, because it's simpler that way and people would be more used to it?
>> It's just easier when you implement a method which returns something
>> and looks similar to the respective file operation.
>
> Ah, that. Yeah, it's more intuitive, but that comes at a price. I'm
> not sure that for CUSE it's worth it. As I said the biggest feature
> is having paths, the others are not that important (like allocating a
> buffer for read, that's really not too complex to do in each CUSE
> driver).
>
>>> For CUSE there's really no point in going through high level
>>> interface, since there's just one file involved, so the path name
>>> generation (the main feature of the highlevel lib) doesn't make any
>>> sense.
>> Well, the choice was mostly for convenience as there also are a few
>> places where high level interface wraps things better a bit. Converting
>> wouldn't be difficult. Do you think it's important? I think keeping
>> things as parallel to FUSE as possible is more important.
>
> I wouldn't care very much, if it weren't for that horrid
> unrestricted_ioctl(). Not your fault, the interface is just not well
> suited to that.

If you want drop highlevel CUSE interface, that's fine with me. After
all, the complexity difference isn't that big for CUSE anyway.

Thanks.

--
tejun

2008-11-22 07:42:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET] FUSE: extend FUSE to support more operations, take #2

Tejun Heo wrote:
> Hello, Miklos.
>
> Miklos Szeredi wrote:
>> Hi Tejun,
>>
>> On Thu, 20 Nov 2008, Tejun Heo wrote:
>>> This patchset is the second take of extend-FUSE patchset. Changes
>>> from the last take[L] are...
>> I managed to review and apply this and the libfuse patches up to the
>> ioctl support. After swithing _IOC_READ with _IOC_WRITE in the kernel
>> the example program actually worked :)
>
> Did I get that wrong? Looking at it again.... oops. It's reverse
> alright. Thanks for catching it.

Now I know why it happened. I caught the bug and fixed during testing
but the fix ended up in the following patch of implement-CUSE patchset.

[PATCH 2/5] FUSE: export symbols to be used by CUSE

So, with it fixed in the right place, the above patch will produce one
reject. I'll post a regenerated one.

Thanks.

--
tejun