2009-04-14 02:06:34

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET] FUSE: implement direct mmap, take#2

Hello, Miklos, Nick, Andrew.

This is the second take of fuse-implement-direct-mmap patchset. This
patchset implements direct mmap support for FUSE (and CUSE). Each
direct mmap area is backed by anonymous mapping (shmem_file) and the
FUSE server can decide how they are shared.

mmap request is handled in two steps. MMAP first queries the server
whether it wants to share the mapping with an existing one or create a
new one, and if so, with which flags. MMAP_COMMIT notifies the server
the result of mmap and if successful the fd the server can use to
access the mmap region.

Changes from the last take[L] are

* 0003-FUSE-don-t-let-fuse_req-end-put-the-base-referen.patch was
merged and thus dropped from this series.

* Updated to the current master + implement-cuse, take#3.

This patchset contains the following patches.

0001-mmap-don-t-assume-f_op-mmap-doesn-t-change-vma.patch
0002-fdtable-export-alloc_fd.patch
0003-FUSE-make-request_wait_answer-wait-for-end-co.patch
0004-FUSE-implement-fuse_req-prep.patch
0005-FUSE-implement-direct-mmap.patch

0001-0002 update mm and fdtable for following FUSE changes. 0003-0004
update fuse_req->end() handling and add ->prep(). 0005 implements
direct mmap.

Nick, Andrew, can you guys please review and ack 0001-0002?

This patchset is on top of

linus#master(80a04d3f2f94fb68b5df05e3ac6697130bc3467a)
+ [1] implement-cuse patchset, take#3

and is available in the following git tree.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git fuse-mmap

diffstat follows.

fs/file.c | 1
fs/fuse/cuse.c | 1
fs/fuse/dev.c | 70 ++++++--
fs/fuse/file.c | 424 +++++++++++++++++++++++++++++++++++++++++++++++++--
fs/fuse/fuse_i.h | 19 ++
include/linux/fuse.h | 47 +++++
mm/mmap.c | 1
7 files changed, 533 insertions(+), 30 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/7212
[1] http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/7705


2009-04-14 02:05:57

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/5] mmap: don't assume f_op->mmap() doesn't change vma->vm_file

mmap_region() assumes that vma->vm_file isn't changed by f_op->mmap()
and continues to use cache file after f_op->mmap() returns. Don't
assume that. This will be used by FUSE to redirect mmap to
shmem_file.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Nick Piggin <[email protected]>
---
mm/mmap.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 4a38411..46a7ae5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1194,6 +1194,7 @@ munmap_back:
vma->vm_file = file;
get_file(file);
error = file->f_op->mmap(file, vma);
+ file = vma->vm_file;
if (error)
goto unmap_and_free_vma;
if (vm_flags & VM_EXECUTABLE)
--
1.6.0.2

2009-04-14 02:06:18

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/5] FUSE: implement fuse_req->prep()

Implement ->prep() which is the opposite equivalent of ->end(). It's
called right before the request is passed to userland server in the
kernel context of the server. ->prep() can fail the request without
disrupting the whole channel.

This will be used by direct mmap implementation.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/dev.c | 29 ++++++++++++++++++++++++++---
fs/fuse/fuse_i.h | 6 ++++++
2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2e1c43d..bb7c4bc 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -756,6 +756,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
int err;
+ bool restart;
struct fuse_req *req;
struct fuse_in *in;
struct fuse_copy_state cs;
@@ -802,12 +803,32 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
goto restart;
}
spin_unlock(&fc->lock);
+
+ restart = false;
fuse_copy_init(&cs, fc, 1, req, iov, nr_segs);
+
+ /*
+ * Execute prep if available. Failure from prep doesn't
+ * indicate faulty channel. On failure, fail the current
+ * request and proceed to the next one.
+ */
+ if (req->prep) {
+ err = req->prep(fc, req);
+ if (err) {
+ restart = true;
+ goto finish;
+ }
+ }
+
err = fuse_copy_one(&cs, &in->h, sizeof(in->h));
- if (!err)
- err = fuse_copy_args(&cs, in->numargs, in->argpages,
- (struct fuse_arg *) in->args, 0);
+ if (err)
+ goto finish;
+
+ err = fuse_copy_args(&cs, in->numargs, in->argpages,
+ (struct fuse_arg *) in->args, 0);
+ finish:
fuse_copy_finish(&cs);
+
spin_lock(&fc->lock);
req->locked = 0;
if (req->aborted) {
@@ -817,6 +838,8 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
if (err) {
req->out.h.error = -EIO;
request_end(fc, req);
+ if (restart)
+ goto restart;
return err;
}
if (!req->isreply)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 4da979c..ca5b8e9 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -291,6 +291,12 @@ struct fuse_req {
/** Link on fi->writepages */
struct list_head writepages_entry;

+ /** Request preparation callback. Called from the kernel
+ context of the FUSE server before passing the request to
+ the FUSE server. Non-zero return from this function will
+ fail the request. */
+ int (*prep)(struct fuse_conn *, struct fuse_req *);
+
/** Request completion callback. This function is called from
the kernel context of the FUSE server if the request isn't
being aborted. If the request is being aborted, it's
--
1.6.0.2

2009-04-14 02:06:48

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/5] FUSE: implement direct mmap

This patch implements direct mmap. It allows FUSE server to honor
each mmap request with anonymous mapping. FUSE server can make
multiple mmap requests share a single anonymous mapping or separate
mappings as it sees fit.

mmap request is handled in two steps. MMAP first queries the server
whether it wants to share the mapping with an existing one or create a
new one, and if so, with which flags. MMAP_COMMIT notifies the server
the result of mmap and if successful the fd the server can use to
access the mmap region.

Internally, shmem_file is used to back the mmap areas and vma->vm_file
is overridden from the FUSE file to the shmem_file.

For details, please read the comment on top of
fuse_file_direct_mmap().

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/cuse.c | 1 +
fs/fuse/file.c | 424 ++++++++++++++++++++++++++++++++++++++++++++++++--
fs/fuse/fuse_i.h | 8 +
include/linux/fuse.h | 47 ++++++
4 files changed, 470 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 2238016..301c068 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -180,6 +180,7 @@ static const struct file_operations cuse_frontend_fops = {
.unlocked_ioctl = cuse_file_ioctl,
.compat_ioctl = cuse_file_compat_ioctl,
.poll = fuse_file_poll,
+ .mmap = fuse_file_direct_mmap,
};


diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7492577..fb5f83f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -13,6 +13,9 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/module.h>
+#include <linux/file.h>
+#include <linux/syscalls.h>
+#include <linux/mman.h>

static const struct file_operations fuse_file_operations;
static const struct file_operations fuse_direct_io_file_operations;
@@ -1311,15 +1314,6 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
return 0;
}

-static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
-{
- /* Can't provide the coherency needed for MAP_SHARED */
- if (vma->vm_flags & VM_MAYSHARE)
- return -ENODEV;
-
- return generic_file_mmap(file, vma);
-}
-
static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
struct file_lock *fl)
{
@@ -1935,6 +1929,416 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
return 0;
}

+struct fuse_mmap {
+ struct fuse_conn *fc; /* associated fuse_conn */
+ struct file *file; /* associated file */
+ struct kref kref; /* reference count */
+ u64 mmap_unique; /* mmap req which created this */
+ int mmap_fd; /* server side fd for shmem file */
+ struct file *mmap_file; /* shmem file backing this mmap */
+ unsigned long start;
+ unsigned long len;
+
+ /* our copy of vm_ops w/ open and close overridden */
+ struct vm_operations_struct vm_ops;
+};
+
+/*
+ * Create fuse_mmap structure which represents a single mmapped
+ * region. If @mfile is specified the created fuse_mmap would be
+ * associated with it; otherwise, a new shmem_file is created.
+ */
+static struct fuse_mmap *create_fuse_mmap(struct fuse_conn *fc,
+ struct file *file, struct file *mfile,
+ u64 mmap_unique, int mmap_fd,
+ struct vm_area_struct *vma)
+{
+ char dname[] = "dev/fuse";
+ loff_t off = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+ size_t len = vma->vm_end - vma->vm_start;
+ struct fuse_mmap *fmmap;
+ int err;
+
+ err = -ENOMEM;
+ fmmap = kzalloc(sizeof(*fmmap), GFP_KERNEL);
+ if (!fmmap)
+ goto fail;
+ kref_init(&fmmap->kref);
+
+ if (mfile) {
+ /*
+ * dentry name with a slash in it can't be created
+ * from userland, so testing dname ensures that the fd
+ * is the one we've created. Note that @mfile is
+ * already grabbed by fuse_mmap_end().
+ */
+ err = -EINVAL;
+ if (strcmp(mfile->f_dentry->d_name.name, dname))
+ goto fail;
+ } else {
+ /*
+ * Create a new shmem_file. As fuse direct mmaps can
+ * be shared, offset can't be zapped to zero. Use off
+ * + len as the default size. Server has a chance to
+ * adjust this and other stuff while processing the
+ * COMMIT request before the client sees this mmap
+ * area.
+ */
+ mfile = shmem_file_setup(dname, off + len, vma->vm_flags);
+ if (IS_ERR(mfile)) {
+ err = PTR_ERR(mfile);
+ goto fail;
+ }
+ }
+ fmmap->mmap_file = mfile;
+
+ fmmap->fc = fuse_conn_get(fc);
+ get_file(file);
+ fmmap->file = file;
+ fmmap->mmap_unique = mmap_unique;
+ fmmap->mmap_fd = mmap_fd;
+ fmmap->start = vma->vm_start;
+ fmmap->len = len;
+
+ return fmmap;
+
+ fail:
+ kfree(fmmap);
+ return ERR_PTR(err);
+}
+
+static void destroy_fuse_mmap(struct fuse_mmap *fmmap)
+{
+ /* mmap_file reference is managed by VM */
+ fuse_conn_put(fmmap->fc);
+ fput(fmmap->file);
+ kfree(fmmap);
+}
+
+static void fuse_vm_release(struct kref *kref)
+{
+ struct fuse_mmap *fmmap = container_of(kref, struct fuse_mmap, kref);
+ struct fuse_conn *fc = fmmap->fc;
+ struct fuse_file *ff = fmmap->file->private_data;
+ struct fuse_req *req;
+ struct fuse_munmap_in *inarg;
+
+ /* failing this might lead to resource leak in server, don't fail */
+ req = fuse_get_req_nofail(fc, fmmap->file);
+ inarg = &req->misc.munmap.in;
+
+ inarg->fh = ff->fh;
+ inarg->mmap_unique = fmmap->mmap_unique;
+ inarg->fd = fmmap->mmap_fd;
+ inarg->addr = fmmap->start;
+ inarg->len = fmmap->len;
+
+ req->in.h.opcode = FUSE_MUNMAP;
+ req->in.h.nodeid = get_node_id(fmmap->file->f_dentry->d_inode);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(*inarg);
+ req->in.args[0].value = inarg;
+
+ fuse_request_send_noreply(fc, req);
+
+ destroy_fuse_mmap(fmmap);
+}
+
+static void fuse_vm_open(struct vm_area_struct *vma)
+{
+ struct fuse_mmap *fmmap = vma->vm_private_data;
+
+ kref_get(&fmmap->kref);
+}
+
+static void fuse_vm_close(struct vm_area_struct *vma)
+{
+ struct fuse_mmap *fmmap = vma->vm_private_data;
+
+ kref_put(&fmmap->kref, fuse_vm_release);
+}
+
+static void fuse_mmap_end(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_mmap_out *mmap_out = req->out.args[0].value;
+ int fd = mmap_out->fd;
+ struct file *file;
+
+ /*
+ * If aborted, we're in a different context and the server is
+ * gonna die soon anyway. Don't bother.
+ */
+ if (unlikely(req->aborted))
+ return;
+
+ if (!req->out.h.error && fd >= 0) {
+ /*
+ * fget() failure should be handled differently as the
+ * userland is expecting MMAP_COMMIT. Set ERR_PTR
+ * value in misc.mmap.file instead of setting
+ * out.h.error.
+ */
+ file = fget(fd);
+ if (!file)
+ file = ERR_PTR(-EBADF);
+ req->misc.mmap.file = file;
+ }
+}
+
+static int fuse_mmap_commit_prep(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_mmap_commit_in *commit_in = (void *)req->in.args[0].value;
+ struct file *mfile = req->misc.mmap.file;
+ int fd;
+
+ if (!mfile)
+ return 0;
+
+ /* new mmap.file has been created, assign a fd to it */
+ fd = commit_in->fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0)
+ return 0;
+
+ get_file(mfile);
+ fd_install(fd, mfile);
+ return 0;
+}
+
+static void fuse_mmap_commit_end(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_mmap_commit_in *commit_in = (void *)req->in.args[0].value;
+
+ /*
+ * If aborted, we're in a different context and the server is
+ * gonna die soon anyway. Don't bother.
+ */
+ if (unlikely(req->aborted))
+ return;
+
+ /*
+ * If a new fd was assigned to mmap.file but the request
+ * failed, close the fd.
+ */
+ if (req->misc.mmap.file && commit_in->fd >= 0 && req->out.h.error)
+ sys_close(commit_in->fd);
+}
+
+/*
+ * Direct mmap is implemented using two requests - FUSE_MMAP and
+ * FUSE_MMAP_COMMIT. This is to allow the userland server to choose
+ * whether to share an existing mmap or create a new one.
+ *
+ * Each separate mmap area is backed by a shmem_file (an anonymous
+ * mapping). If the server specifies fd to an existing shmem_file
+ * created by previous FUSE_MMAP_COMMIT, the shmem_file for that
+ * mapping is reused. If not, a new shmem_file is created and a new
+ * fd is opened and notified to the server via FUSE_MMAP_COMMIT.
+ *
+ * Because the server might allocate resources on FUSE_MMAP, FUSE
+ * guarantees that FUSE_MMAP_COMMIT will be sent whether the mmap
+ * attempt succeeds or not. On failure, commit_in.fd will contain
+ * negative error code; otherwise, it will contain the fd for the
+ * shmem_file. The server is then free to truncate the fd to desired
+ * size and fill in the content. The client will only see the area
+ * only after COMMIT is successfully replied. If the server fails the
+ * COMMIT request and new fd has been allocated for it, the fd will be
+ * automatically closed by the kernel.
+ *
+ * FUSE guarantees that MUNMAP request will be sent when the area gets
+ * unmapped.
+ *
+ * The server can associate the three related requests - MMAP,
+ * MMAP_COMMIT and MUNMAP using ->unique of the MMAP request. The
+ * latter two requests carry ->mmap_unique field which contains
+ * ->unique of the MMAP request.
+ */
+int fuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct fuse_file *ff = file->private_data;
+ struct fuse_conn *fc = ff->fc;
+ struct fuse_mmap *fmmap = NULL;
+ struct fuse_req *req;
+ struct fuse_mmap_in mmap_in;
+ struct fuse_mmap_out mmap_out;
+ struct fuse_mmap_commit_in commit_in;
+ struct file *mfile;
+ u64 mmap_unique;
+ int err;
+
+ /*
+ * First, execute FUSE_MMAP which will query the server
+ * whether this mmap request is valid and which fd it wants to
+ * use to mmap this request.
+ */
+ req = fuse_get_req(fc);
+ if (IS_ERR(req)) {
+ err = PTR_ERR(req);
+ goto err;
+ }
+
+ memset(&mmap_in, 0, sizeof(mmap_in));
+ mmap_in.fh = ff->fh;
+ mmap_in.addr = vma->vm_start;
+ mmap_in.len = vma->vm_end - vma->vm_start;
+ mmap_in.prot = ((vma->vm_flags & VM_READ) ? PROT_READ : 0) |
+ ((vma->vm_flags & VM_WRITE) ? PROT_WRITE : 0) |
+ ((vma->vm_flags & VM_EXEC) ? PROT_EXEC : 0);
+ mmap_in.flags = ((vma->vm_flags & VM_GROWSDOWN) ? MAP_GROWSDOWN : 0) |
+ ((vma->vm_flags & VM_DENYWRITE) ? MAP_DENYWRITE : 0) |
+ ((vma->vm_flags & VM_EXECUTABLE) ? MAP_EXECUTABLE : 0) |
+ ((vma->vm_flags & VM_LOCKED) ? MAP_LOCKED : 0);
+ mmap_in.offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+
+ req->in.h.opcode = FUSE_MMAP;
+ req->in.h.nodeid = fuse_file_nodeid(ff);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(mmap_in);
+ req->in.args[0].value = &mmap_in;
+ req->out.numargs = 1;
+ req->out.args[0].size = sizeof(mmap_out);
+ req->out.args[0].value = &mmap_out;
+
+ req->end = fuse_mmap_end;
+
+ fuse_request_send(fc, req);
+
+ /* mmap.file is set if server requested to reuse existing mapping */
+ mfile = req->misc.mmap.file;
+ mmap_unique = req->in.h.unique;
+ err = req->out.h.error;
+
+ fuse_put_request(fc, req);
+
+ /* ERR_PTR value in mfile means fget failure, send failure COMMIT */
+ if (IS_ERR(mfile)) {
+ err = PTR_ERR(mfile);
+ goto commit;
+ }
+ /* userland indicated failure, we can just fail */
+ if (err)
+ goto err;
+
+ /*
+ * Second, create mmap as the server requested.
+ */
+ fmmap = create_fuse_mmap(fc, file, mfile, mmap_unique, mmap_out.fd,
+ vma);
+ if (IS_ERR(fmmap)) {
+ err = PTR_ERR(fmmap);
+ if (mfile)
+ fput(mfile);
+ fmmap = NULL;
+ goto commit;
+ }
+
+ /*
+ * fmmap points to shm_file to mmap, give it to vma. From
+ * this point on, the mfile reference is managed by the vma.
+ */
+ mfile = fmmap->mmap_file;
+ fput(vma->vm_file);
+ vma->vm_file = mfile;
+
+ /* add flags server requested and mmap the shm_file */
+ if (mmap_out.flags & FUSE_MMAP_DONT_COPY)
+ vma->vm_flags |= VM_DONTCOPY;
+ if (mmap_out.flags & FUSE_MMAP_DONT_EXPAND)
+ vma->vm_flags |= VM_DONTEXPAND;
+
+ err = mfile->f_op->mmap(mfile, vma);
+ if (err)
+ goto commit;
+
+ /*
+ * Override vm_ops->open and ->close. This is a bit hacky but
+ * vma's can't easily be nested and FUSE needs to notify the
+ * server when to release resources for mmaps. Both shmem and
+ * tiny_shmem implementations are okay with this trick but if
+ * there's a cleaner way to do this, please update it.
+ */
+ err = -EINVAL;
+ if (vma->vm_ops->open || vma->vm_ops->close || vma->vm_private_data) {
+ printk(KERN_ERR "FUSE: can't do direct mmap. shmem mmap has "
+ "open, close or vm_private_data\n");
+ goto commit;
+ }
+
+ fmmap->vm_ops = *vma->vm_ops;
+ vma->vm_ops = &fmmap->vm_ops;
+ vma->vm_ops->open = fuse_vm_open;
+ vma->vm_ops->close = fuse_vm_close;
+ vma->vm_private_data = fmmap;
+ err = 0;
+
+ commit:
+ /*
+ * Third, either mmap succeeded or failed after MMAP request
+ * succeeded. Notify userland what happened.
+ */
+
+ /* missing commit can cause resource leak on server side, don't fail */
+ req = fuse_get_req_nofail(fc, file);
+
+ memset(&commit_in, 0, sizeof(commit_in));
+ commit_in.fh = ff->fh;
+ commit_in.mmap_unique = mmap_unique;
+ commit_in.addr = mmap_in.addr;
+ commit_in.len = mmap_in.len;
+ commit_in.prot = mmap_in.prot;
+ commit_in.flags = mmap_in.flags;
+ commit_in.offset = mmap_in.offset;
+
+ if (!err) {
+ commit_in.fd = fmmap->mmap_fd;
+ /*
+ * If fmmap->mmap_fd < 0, new fd needs to be created
+ * when the server reads MMAP_COMMIT. Pass the file
+ * pointer. A fd will be assigned to it by the
+ * fuse_mmap_commit_prep callback.
+ */
+ if (fmmap->mmap_fd < 0)
+ req->misc.mmap.file = mfile;
+ } else
+ commit_in.fd = err;
+
+ req->in.h.opcode = FUSE_MMAP_COMMIT;
+ req->in.h.nodeid = fuse_file_nodeid(ff);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(commit_in);
+ req->in.args[0].value = &commit_in;
+
+ req->prep = fuse_mmap_commit_prep;
+ req->end = fuse_mmap_commit_end;
+
+ fuse_request_send(fc, req);
+ if (!err) /* notified failure to userland */
+ err = req->out.h.error;
+ if (!err && commit_in.fd < 0) /* failed to allocate fd */
+ err = commit_in.fd;
+ fuse_put_request(fc, req);
+
+ if (!err) {
+ fmmap->mmap_fd = commit_in.fd;
+ return 0;
+ }
+
+ /* fall through */
+ err:
+ if (fmmap)
+ destroy_fuse_mmap(fmmap);
+
+ if (err == -ENOSYS) {
+ /* Can't provide the coherency needed for MAP_SHARED */
+ if (vma->vm_flags & VM_MAYSHARE)
+ return -ENODEV;
+
+ return generic_file_mmap(file, vma);
+ }
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(fuse_file_direct_mmap);
+
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read = do_sync_read,
@@ -1958,7 +2362,7 @@ static const struct file_operations fuse_direct_io_file_operations = {
.llseek = fuse_file_llseek,
.read = fuse_direct_read,
.write = fuse_direct_write,
- .mmap = fuse_direct_mmap,
+ .mmap = fuse_file_direct_mmap,
.open = fuse_open,
.flush = fuse_flush,
.release = fuse_release,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index ca5b8e9..6baa307 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -271,6 +271,13 @@ struct fuse_req {
struct fuse_write_out out;
} write;
struct fuse_lk_in lk_in;
+ struct {
+ /** to move filp for mmap between client and server */
+ struct file *file;
+ } mmap;
+ struct {
+ struct fuse_munmap_in in;
+ } munmap;
} misc;

/** page vector */
@@ -596,6 +603,7 @@ int fuse_flush(struct file *file, fl_owner_t id);
* Send FSYNCDIR or FSYNC request
*/
int fuse_fsync(struct file *file, struct dentry *de, int datasync);
+int fuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma);

/**
* Send IOCTL request
diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index cc51548..3bb82f6 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -171,6 +171,15 @@ struct fuse_file_lock {
*/
#define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0)

+/**
+ * Mmap flags
+ *
+ * FUSE_MMAP_DONT_COPY: don't copy the region on fork
+ * FUSE_MMAP_DONT_EXPAND: can't be expanded with mremap()
+ */
+#define FUSE_MMAP_DONT_COPY (1 << 0)
+#define FUSE_MMAP_DONT_EXPAND (1 << 1)
+
enum fuse_opcode {
FUSE_LOOKUP = 1,
FUSE_FORGET = 2, /* no reply */
@@ -210,6 +219,9 @@ enum fuse_opcode {
FUSE_DESTROY = 38,
FUSE_IOCTL = 39,
FUSE_POLL = 40,
+ FUSE_MMAP = 41,
+ FUSE_MMAP_COMMIT = 42,
+ FUSE_MUNMAP = 43,

CUSE_BASE = 4096,
};
@@ -449,6 +461,41 @@ struct fuse_notify_poll_wakeup_out {
__u64 kh;
};

+struct fuse_mmap_in {
+ __u64 fh;
+ __u64 addr;
+ __u64 len;
+ __s32 prot;
+ __s32 flags;
+ __u64 offset;
+};
+
+struct fuse_mmap_out {
+ __s32 fd;
+ __u32 flags;
+};
+
+struct fuse_mmap_commit_in {
+ __u64 fh;
+ __u64 mmap_unique;
+ __u64 addr;
+ __u64 len;
+ __s32 prot;
+ __s32 flags;
+ __s32 fd;
+ __u32 padding;
+ __u64 offset;
+};
+
+struct fuse_munmap_in {
+ __u64 fh;
+ __u64 mmap_unique;
+ __u64 addr;
+ __u64 len;
+ __s32 fd;
+ __u32 padding;
+};
+
struct fuse_in_header {
__u32 len;
__u32 opcode;
--
1.6.0.2

2009-04-14 02:07:51

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/5] FUSE: make request_wait_answer() wait for ->end() completion

Previously, a request was marked FINISHED before ->end() is executed
and thus request_wait_answer() can return before it's done. This
patch makes request_wait_answer() wait for ->end() to finish before
returning.

Note that no current ->end() user waits for request completion, so
this change doesn't cause any behavior difference.

While at it, beef up the comment above ->end() hook and clarify when
and where it's called.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/fuse/dev.c | 41 +++++++++++++++++++++++++----------------
fs/fuse/fuse_i.h | 5 ++++-
2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2a17249..2e1c43d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -278,7 +278,6 @@ __releases(&fc->lock)
req->end = NULL;
list_del(&req->list);
list_del(&req->intr_entry);
- req->state = FUSE_REQ_FINISHED;
if (req->background) {
if (fc->num_background == FUSE_MAX_BACKGROUND) {
fc->blocked = 0;
@@ -293,10 +292,21 @@ __releases(&fc->lock)
fc->active_background--;
flush_bg_queue(fc);
}
+
spin_unlock(&fc->lock);
- wake_up(&req->waitq);
- if (end)
+
+ if (end) {
end(fc, req);
+ smp_wmb();
+ }
+
+ /*
+ * We own this request and wake_up() has enough memory
+ * barrier, no need to grab spin lock to set state.
+ */
+ req->state = FUSE_REQ_FINISHED;
+
+ wake_up(&req->waitq);
fuse_put_request(fc, req);
}

@@ -372,17 +382,16 @@ __acquires(&fc->lock)
return;

aborted:
- BUG_ON(req->state != FUSE_REQ_FINISHED);
- if (req->locked) {
- /* This is uninterruptible sleep, because data is
- being copied to/from the buffers of req. During
- locked state, there mustn't be any filesystem
- operation (e.g. page fault), since that could lead
- to deadlock */
- spin_unlock(&fc->lock);
- wait_event(req->waitq, !req->locked);
- spin_lock(&fc->lock);
- }
+ spin_unlock(&fc->lock);
+ wait_event(req->waitq, req->state == FUSE_REQ_FINISHED);
+ /*
+ * This is uninterruptible sleep, because data is being copied
+ * to/from the buffers of req. During locked state, there
+ * mustn't be any filesystem operation (e.g. page fault),
+ * since that could lead to deadlock
+ */
+ wait_event(req->waitq, !req->locked);
+ spin_lock(&fc->lock);
}

void fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
@@ -1060,9 +1069,7 @@ __acquires(&fc->lock)

req->aborted = 1;
req->out.h.error = -ECONNABORTED;
- req->state = FUSE_REQ_FINISHED;
list_del_init(&req->list);
- wake_up(&req->waitq);
if (end) {
req->end = NULL;
__fuse_get_request(req);
@@ -1072,6 +1079,8 @@ __acquires(&fc->lock)
fuse_put_request(fc, req);
spin_lock(&fc->lock);
}
+ req->state = FUSE_REQ_FINISHED;
+ wake_up(&req->waitq);
}
}

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index cdab92d..4da979c 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -291,7 +291,10 @@ struct fuse_req {
/** Link on fi->writepages */
struct list_head writepages_entry;

- /** Request completion callback */
+ /** Request completion callback. This function is called from
+ the kernel context of the FUSE server if the request isn't
+ being aborted. If the request is being aborted, it's
+ called from the kernel context of the aborting process. */
void (*end)(struct fuse_conn *, struct fuse_req *);

/** Request is stolen from fuse_file->reserved_req */
--
1.6.0.2

2009-04-14 02:07:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/5] fdtable: export alloc_fd()

Export alloc_fd(). Will be used by FUSE.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/file.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index f313314..806b3ad 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -487,6 +487,7 @@ out:
spin_unlock(&files->file_lock);
return error;
}
+EXPORT_SYMBOL_GPL(alloc_fd);

int get_unused_fd(void)
{
--
1.6.0.2

2009-04-14 20:01:34

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 1/5] mmap: don't assume f_op->mmap() doesn't change vma->vm_file

On Tue, 14 Apr 2009, Tejun Heo wrote:

> mmap_region() assumes that vma->vm_file isn't changed by f_op->mmap()
> and continues to use cache file after f_op->mmap() returns.

It does use "file" again in the unmap_and_free_vma error path
(isn't that reasonable? if the ->mmap failed, it shouldn't have
mucked with vma; and even if it has, then we'd better not change
the current behaviour of which to fput), but I don't see where else.

Further down, covering both vma->vm_file previously set and previously
unset cases, there is a "file = vma->vm_file;" before file is used.
So I think this patch is not necessary - if it is necessary, it's
already a bug, because already we switch from /dev/zero to a
shmem file there.

Hugh

> Don't assume that. This will be used by FUSE to redirect mmap to
> shmem_file.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Nick Piggin <[email protected]>
> ---
> mm/mmap.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 4a38411..46a7ae5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1194,6 +1194,7 @@ munmap_back:
> vma->vm_file = file;
> get_file(file);
> error = file->f_op->mmap(file, vma);
> + file = vma->vm_file;
> if (error)
> goto unmap_and_free_vma;
> if (vm_flags & VM_EXECUTABLE)
> --
> 1.6.0.2

2009-04-15 02:25:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/5] mmap: don't assume f_op->mmap() doesn't change vma->vm_file

Hello,

Hugh Dickins wrote:
> On Tue, 14 Apr 2009, Tejun Heo wrote:
>
>> mmap_region() assumes that vma->vm_file isn't changed by f_op->mmap()
>> and continues to use cache file after f_op->mmap() returns.
>
> It does use "file" again in the unmap_and_free_vma error path
> (isn't that reasonable? if the ->mmap failed, it shouldn't have
> mucked with vma; and even if it has, then we'd better not change
> the current behaviour of which to fput), but I don't see where else.
>
> Further down, covering both vma->vm_file previously set and previously
> unset cases, there is a "file = vma->vm_file;" before file is used.
> So I think this patch is not necessary - if it is necessary, it's
> already a bug, because already we switch from /dev/zero to a
> shmem file there.

Right, ->mmap() shouldn't modify @vma if it has failed. I was trying
to clarify that @vma->file may change as the code was a bit confusing
regarding whether @vma->vm_file is allowed to be substituted or not.
Hmmmm... how about adding a BUG_ON() or WARN_ON() in the failure path
to make sure @vma->vm_file hasn't changed?

Anyways, I'm dropping this patch and updating FUSE mmap implementation
accordingly. Thanks for the comment.

--
tejun

2009-04-15 04:15:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/5] mmap: don't assume f_op->mmap() doesn't change vma->vm_file

This patch is dropped as per Hugh Dickins' comment.

Thanks.

--
tejun

2009-04-15 04:16:40

by Tejun Heo

[permalink] [raw]
Subject: [PATCH UPDATED] FUSE: implement direct mmap

This patch implements direct mmap. It allows FUSE server to honor
each mmap request with anonymous mapping. FUSE server can make
multiple mmap requests share a single anonymous mapping or separate
mappings as it sees fit.

mmap request is handled in two steps. MMAP first queries the server
whether it wants to share the mapping with an existing one or create a
new one, and if so, with which flags. MMAP_COMMIT notifies the server
the result of mmap and if successful the fd the server can use to
access the mmap region.

Internally, shmem_file is used to back the mmap areas and vma->vm_file
is overridden from the FUSE file to the shmem_file.

For details, please read the comment on top of
fuse_file_direct_mmap().

Signed-off-by: Tejun Heo <[email protected]>
---
Updated such that vma is not modified on failure return. git tree is
updated accordingly.

fs/fuse/cuse.c | 1 +
fs/fuse/file.c | 428 ++++++++++++++++++++++++++++++++++++++++++++++++--
fs/fuse/fuse_i.h | 8 +
include/linux/fuse.h | 47 ++++++
4 files changed, 474 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 2238016..301c068 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -180,6 +180,7 @@ static const struct file_operations cuse_frontend_fops = {
.unlocked_ioctl = cuse_file_ioctl,
.compat_ioctl = cuse_file_compat_ioctl,
.poll = fuse_file_poll,
+ .mmap = fuse_file_direct_mmap,
};


diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7492577..c91a50b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -13,6 +13,9 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/module.h>
+#include <linux/file.h>
+#include <linux/syscalls.h>
+#include <linux/mman.h>

static const struct file_operations fuse_file_operations;
static const struct file_operations fuse_direct_io_file_operations;
@@ -1311,15 +1314,6 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
return 0;
}

-static int fuse_direct_mmap(struct file *file, struct vm_area_struct *vma)
-{
- /* Can't provide the coherency needed for MAP_SHARED */
- if (vma->vm_flags & VM_MAYSHARE)
- return -ENODEV;
-
- return generic_file_mmap(file, vma);
-}
-
static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
struct file_lock *fl)
{
@@ -1935,6 +1929,420 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
return 0;
}

+struct fuse_mmap {
+ struct fuse_conn *fc; /* associated fuse_conn */
+ struct file *file; /* associated file */
+ struct kref kref; /* reference count */
+ u64 mmap_unique; /* mmap req which created this */
+ int mmap_fd; /* server side fd for shmem file */
+ struct file *mmap_file; /* shmem file backing this mmap */
+ unsigned long start;
+ unsigned long len;
+
+ /* our copy of vm_ops w/ open and close overridden */
+ struct vm_operations_struct vm_ops;
+};
+
+/*
+ * Create fuse_mmap structure which represents a single mmapped
+ * region. If @mfile is specified the created fuse_mmap would be
+ * associated with it; otherwise, a new shmem_file is created.
+ */
+static struct fuse_mmap *create_fuse_mmap(struct fuse_conn *fc,
+ struct file *file, struct file *mfile,
+ u64 mmap_unique, int mmap_fd,
+ struct vm_area_struct *vma)
+{
+ char dname[] = "dev/fuse";
+ loff_t off = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+ size_t len = vma->vm_end - vma->vm_start;
+ struct fuse_mmap *fmmap;
+ int err;
+
+ err = -ENOMEM;
+ fmmap = kzalloc(sizeof(*fmmap), GFP_KERNEL);
+ if (!fmmap)
+ goto fail;
+ kref_init(&fmmap->kref);
+
+ if (mfile) {
+ /*
+ * dentry name with a slash in it can't be created
+ * from userland, so testing dname ensures that the fd
+ * is the one we've created. Note that @mfile is
+ * already grabbed by fuse_mmap_end().
+ */
+ err = -EINVAL;
+ if (strcmp(mfile->f_dentry->d_name.name, dname))
+ goto fail;
+ } else {
+ /*
+ * Create a new shmem_file. As fuse direct mmaps can
+ * be shared, offset can't be zapped to zero. Use off
+ * + len as the default size. Server has a chance to
+ * adjust this and other stuff while processing the
+ * COMMIT request before the client sees this mmap
+ * area.
+ */
+ mfile = shmem_file_setup(dname, off + len, vma->vm_flags);
+ if (IS_ERR(mfile)) {
+ err = PTR_ERR(mfile);
+ goto fail;
+ }
+ }
+ fmmap->mmap_file = mfile;
+
+ fmmap->fc = fuse_conn_get(fc);
+ get_file(file);
+ fmmap->file = file;
+ fmmap->mmap_unique = mmap_unique;
+ fmmap->mmap_fd = mmap_fd;
+ fmmap->start = vma->vm_start;
+ fmmap->len = len;
+
+ return fmmap;
+
+ fail:
+ kfree(fmmap);
+ return ERR_PTR(err);
+}
+
+static void destroy_fuse_mmap(struct fuse_mmap *fmmap)
+{
+ /* mmap_file reference is managed by VM */
+ fuse_conn_put(fmmap->fc);
+ fput(fmmap->file);
+ kfree(fmmap);
+}
+
+static void fuse_vm_release(struct kref *kref)
+{
+ struct fuse_mmap *fmmap = container_of(kref, struct fuse_mmap, kref);
+ struct fuse_conn *fc = fmmap->fc;
+ struct fuse_file *ff = fmmap->file->private_data;
+ struct fuse_req *req;
+ struct fuse_munmap_in *inarg;
+
+ /* failing this might lead to resource leak in server, don't fail */
+ req = fuse_get_req_nofail(fc, fmmap->file);
+ inarg = &req->misc.munmap.in;
+
+ inarg->fh = ff->fh;
+ inarg->mmap_unique = fmmap->mmap_unique;
+ inarg->fd = fmmap->mmap_fd;
+ inarg->addr = fmmap->start;
+ inarg->len = fmmap->len;
+
+ req->in.h.opcode = FUSE_MUNMAP;
+ req->in.h.nodeid = get_node_id(fmmap->file->f_dentry->d_inode);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(*inarg);
+ req->in.args[0].value = inarg;
+
+ fuse_request_send_noreply(fc, req);
+
+ destroy_fuse_mmap(fmmap);
+}
+
+static void fuse_vm_open(struct vm_area_struct *vma)
+{
+ struct fuse_mmap *fmmap = vma->vm_private_data;
+
+ kref_get(&fmmap->kref);
+}
+
+static void fuse_vm_close(struct vm_area_struct *vma)
+{
+ struct fuse_mmap *fmmap = vma->vm_private_data;
+
+ kref_put(&fmmap->kref, fuse_vm_release);
+}
+
+static void fuse_mmap_end(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_mmap_out *mmap_out = req->out.args[0].value;
+ int fd = mmap_out->fd;
+ struct file *file;
+
+ /*
+ * If aborted, we're in a different context and the server is
+ * gonna die soon anyway. Don't bother.
+ */
+ if (unlikely(req->aborted))
+ return;
+
+ if (!req->out.h.error && fd >= 0) {
+ /*
+ * fget() failure should be handled differently as the
+ * userland is expecting MMAP_COMMIT. Set ERR_PTR
+ * value in misc.mmap.file instead of setting
+ * out.h.error.
+ */
+ file = fget(fd);
+ if (!file)
+ file = ERR_PTR(-EBADF);
+ req->misc.mmap.file = file;
+ }
+}
+
+static int fuse_mmap_commit_prep(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_mmap_commit_in *commit_in = (void *)req->in.args[0].value;
+ struct file *mfile = req->misc.mmap.file;
+ int fd;
+
+ if (!mfile)
+ return 0;
+
+ /* new mmap.file has been created, assign a fd to it */
+ fd = commit_in->fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0)
+ return 0;
+
+ get_file(mfile);
+ fd_install(fd, mfile);
+ return 0;
+}
+
+static void fuse_mmap_commit_end(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_mmap_commit_in *commit_in = (void *)req->in.args[0].value;
+
+ /*
+ * If aborted, we're in a different context and the server is
+ * gonna die soon anyway. Don't bother.
+ */
+ if (unlikely(req->aborted))
+ return;
+
+ /*
+ * If a new fd was assigned to mmap.file but the request
+ * failed, close the fd.
+ */
+ if (req->misc.mmap.file && commit_in->fd >= 0 && req->out.h.error)
+ sys_close(commit_in->fd);
+}
+
+/*
+ * Direct mmap is implemented using two requests - FUSE_MMAP and
+ * FUSE_MMAP_COMMIT. This is to allow the userland server to choose
+ * whether to share an existing mmap or create a new one.
+ *
+ * Each separate mmap area is backed by a shmem_file (an anonymous
+ * mapping). If the server specifies fd to an existing shmem_file
+ * created by previous FUSE_MMAP_COMMIT, the shmem_file for that
+ * mapping is reused. If not, a new shmem_file is created and a new
+ * fd is opened and notified to the server via FUSE_MMAP_COMMIT.
+ *
+ * Because the server might allocate resources on FUSE_MMAP, FUSE
+ * guarantees that FUSE_MMAP_COMMIT will be sent whether the mmap
+ * attempt succeeds or not. On failure, commit_in.fd will contain
+ * negative error code; otherwise, it will contain the fd for the
+ * shmem_file. The server is then free to truncate the fd to desired
+ * size and fill in the content. The client will only see the area
+ * only after COMMIT is successfully replied. If the server fails the
+ * COMMIT request and new fd has been allocated for it, the fd will be
+ * automatically closed by the kernel.
+ *
+ * FUSE guarantees that MUNMAP request will be sent when the area gets
+ * unmapped.
+ *
+ * The server can associate the three related requests - MMAP,
+ * MMAP_COMMIT and MUNMAP using ->unique of the MMAP request. The
+ * latter two requests carry ->mmap_unique field which contains
+ * ->unique of the MMAP request.
+ */
+int fuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct fuse_file *ff = file->private_data;
+ struct fuse_conn *fc = ff->fc;
+ struct vm_operations_struct *orig_vm_ops = vma->vm_ops;
+ struct file *orig_vm_file = vma->vm_file;
+ unsigned long orig_vm_flags = vma->vm_flags;
+ struct fuse_mmap *fmmap = NULL;
+ struct file *mfile = NULL;
+ struct fuse_req *req;
+ struct fuse_mmap_in mmap_in;
+ struct fuse_mmap_out mmap_out;
+ struct fuse_mmap_commit_in commit_in;
+ u64 mmap_unique;
+ int err;
+
+ /*
+ * First, execute FUSE_MMAP which will query the server
+ * whether this mmap request is valid and which fd it wants to
+ * use to mmap this request.
+ */
+ req = fuse_get_req(fc);
+ if (IS_ERR(req)) {
+ err = PTR_ERR(req);
+ goto err;
+ }
+
+ memset(&mmap_in, 0, sizeof(mmap_in));
+ mmap_in.fh = ff->fh;
+ mmap_in.addr = vma->vm_start;
+ mmap_in.len = vma->vm_end - vma->vm_start;
+ mmap_in.prot = ((vma->vm_flags & VM_READ) ? PROT_READ : 0) |
+ ((vma->vm_flags & VM_WRITE) ? PROT_WRITE : 0) |
+ ((vma->vm_flags & VM_EXEC) ? PROT_EXEC : 0);
+ mmap_in.flags = ((vma->vm_flags & VM_GROWSDOWN) ? MAP_GROWSDOWN : 0) |
+ ((vma->vm_flags & VM_DENYWRITE) ? MAP_DENYWRITE : 0) |
+ ((vma->vm_flags & VM_EXECUTABLE) ? MAP_EXECUTABLE : 0) |
+ ((vma->vm_flags & VM_LOCKED) ? MAP_LOCKED : 0);
+ mmap_in.offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
+
+ req->in.h.opcode = FUSE_MMAP;
+ req->in.h.nodeid = fuse_file_nodeid(ff);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(mmap_in);
+ req->in.args[0].value = &mmap_in;
+ req->out.numargs = 1;
+ req->out.args[0].size = sizeof(mmap_out);
+ req->out.args[0].value = &mmap_out;
+
+ req->end = fuse_mmap_end;
+
+ fuse_request_send(fc, req);
+
+ /* mmap.file is set if server requested to reuse existing mapping */
+ mfile = req->misc.mmap.file;
+ mmap_unique = req->in.h.unique;
+ err = req->out.h.error;
+
+ fuse_put_request(fc, req);
+
+ /* ERR_PTR value in mfile means fget failure, send failure COMMIT */
+ if (IS_ERR(mfile)) {
+ err = PTR_ERR(mfile);
+ goto commit;
+ }
+ /* userland indicated failure, we can just fail */
+ if (err)
+ goto err;
+
+ /*
+ * Second, create mmap as the server requested.
+ */
+ fmmap = create_fuse_mmap(fc, file, mfile, mmap_unique, mmap_out.fd,
+ vma);
+ if (IS_ERR(fmmap)) {
+ err = PTR_ERR(fmmap);
+ goto commit;
+ }
+
+ /* fmmap points to shm_file to mmap, give it to vma */
+ mfile = fmmap->mmap_file;
+ vma->vm_file = mfile;
+
+ /* add flags server requested and mmap the shm_file */
+ if (mmap_out.flags & FUSE_MMAP_DONT_COPY)
+ vma->vm_flags |= VM_DONTCOPY;
+ if (mmap_out.flags & FUSE_MMAP_DONT_EXPAND)
+ vma->vm_flags |= VM_DONTEXPAND;
+
+ err = mfile->f_op->mmap(mfile, vma);
+ if (err)
+ goto commit;
+
+ /*
+ * Override vm_ops->open and ->close. This is a bit hacky but
+ * vma's can't easily be nested and FUSE needs to notify the
+ * server when to release resources for mmaps. Both shmem and
+ * tiny_shmem implementations are okay with this trick but if
+ * there's a cleaner way to do this, please update it.
+ */
+ err = -EINVAL;
+ if (vma->vm_ops->open || vma->vm_ops->close || vma->vm_private_data) {
+ printk(KERN_ERR "FUSE: can't do direct mmap. shmem mmap has "
+ "open, close or vm_private_data\n");
+ goto commit;
+ }
+
+ fmmap->vm_ops = *vma->vm_ops;
+ vma->vm_ops = &fmmap->vm_ops;
+ vma->vm_ops->open = fuse_vm_open;
+ vma->vm_ops->close = fuse_vm_close;
+ vma->vm_private_data = fmmap;
+ err = 0;
+
+ commit:
+ /*
+ * Third, either mmap succeeded or failed after MMAP request
+ * succeeded. Notify userland what happened.
+ */
+
+ /* missing commit can cause resource leak on server side, don't fail */
+ req = fuse_get_req_nofail(fc, file);
+
+ memset(&commit_in, 0, sizeof(commit_in));
+ commit_in.fh = ff->fh;
+ commit_in.mmap_unique = mmap_unique;
+ commit_in.addr = mmap_in.addr;
+ commit_in.len = mmap_in.len;
+ commit_in.prot = mmap_in.prot;
+ commit_in.flags = mmap_in.flags;
+ commit_in.offset = mmap_in.offset;
+
+ if (!err) {
+ commit_in.fd = fmmap->mmap_fd;
+ /*
+ * If fmmap->mmap_fd < 0, new fd needs to be created
+ * when the server reads MMAP_COMMIT. Pass the file
+ * pointer. A fd will be assigned to it by the
+ * fuse_mmap_commit_prep callback.
+ */
+ if (fmmap->mmap_fd < 0)
+ req->misc.mmap.file = mfile;
+ } else
+ commit_in.fd = err;
+
+ req->in.h.opcode = FUSE_MMAP_COMMIT;
+ req->in.h.nodeid = fuse_file_nodeid(ff);
+ req->in.numargs = 1;
+ req->in.args[0].size = sizeof(commit_in);
+ req->in.args[0].value = &commit_in;
+
+ req->prep = fuse_mmap_commit_prep;
+ req->end = fuse_mmap_commit_end;
+
+ fuse_request_send(fc, req);
+ if (!err) /* notified failure to userland */
+ err = req->out.h.error;
+ if (!err && commit_in.fd < 0) /* failed to allocate fd */
+ err = commit_in.fd;
+ fuse_put_request(fc, req);
+
+ if (!err) {
+ fput(orig_vm_file);
+ fmmap->mmap_fd = commit_in.fd;
+ return 0;
+ }
+
+ /* fall through */
+ err:
+ if (fmmap && !IS_ERR(fmmap))
+ destroy_fuse_mmap(fmmap);
+ if (mfile && !IS_ERR(mfile))
+ fput(mfile);
+
+ /* restore original vm_ops, file and flags */
+ vma->vm_ops = orig_vm_ops;
+ vma->vm_file = orig_vm_file;
+ vma->vm_flags = orig_vm_flags;
+
+ if (err == -ENOSYS) {
+ /* Can't provide the coherency needed for MAP_SHARED */
+ if (vma->vm_flags & VM_MAYSHARE)
+ return -ENODEV;
+
+ return generic_file_mmap(file, vma);
+ }
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(fuse_file_direct_mmap);
+
static const struct file_operations fuse_file_operations = {
.llseek = fuse_file_llseek,
.read = do_sync_read,
@@ -1958,7 +2366,7 @@ static const struct file_operations fuse_direct_io_file_operations = {
.llseek = fuse_file_llseek,
.read = fuse_direct_read,
.write = fuse_direct_write,
- .mmap = fuse_direct_mmap,
+ .mmap = fuse_file_direct_mmap,
.open = fuse_open,
.flush = fuse_flush,
.release = fuse_release,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index ca5b8e9..6baa307 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -271,6 +271,13 @@ struct fuse_req {
struct fuse_write_out out;
} write;
struct fuse_lk_in lk_in;
+ struct {
+ /** to move filp for mmap between client and server */
+ struct file *file;
+ } mmap;
+ struct {
+ struct fuse_munmap_in in;
+ } munmap;
} misc;

/** page vector */
@@ -596,6 +603,7 @@ int fuse_flush(struct file *file, fl_owner_t id);
* Send FSYNCDIR or FSYNC request
*/
int fuse_fsync(struct file *file, struct dentry *de, int datasync);
+int fuse_file_direct_mmap(struct file *file, struct vm_area_struct *vma);

/**
* Send IOCTL request
diff --git a/include/linux/fuse.h b/include/linux/fuse.h
index cc51548..3bb82f6 100644
--- a/include/linux/fuse.h
+++ b/include/linux/fuse.h
@@ -171,6 +171,15 @@ struct fuse_file_lock {
*/
#define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0)

+/**
+ * Mmap flags
+ *
+ * FUSE_MMAP_DONT_COPY: don't copy the region on fork
+ * FUSE_MMAP_DONT_EXPAND: can't be expanded with mremap()
+ */
+#define FUSE_MMAP_DONT_COPY (1 << 0)
+#define FUSE_MMAP_DONT_EXPAND (1 << 1)
+
enum fuse_opcode {
FUSE_LOOKUP = 1,
FUSE_FORGET = 2, /* no reply */
@@ -210,6 +219,9 @@ enum fuse_opcode {
FUSE_DESTROY = 38,
FUSE_IOCTL = 39,
FUSE_POLL = 40,
+ FUSE_MMAP = 41,
+ FUSE_MMAP_COMMIT = 42,
+ FUSE_MUNMAP = 43,

CUSE_BASE = 4096,
};
@@ -449,6 +461,41 @@ struct fuse_notify_poll_wakeup_out {
__u64 kh;
};

+struct fuse_mmap_in {
+ __u64 fh;
+ __u64 addr;
+ __u64 len;
+ __s32 prot;
+ __s32 flags;
+ __u64 offset;
+};
+
+struct fuse_mmap_out {
+ __s32 fd;
+ __u32 flags;
+};
+
+struct fuse_mmap_commit_in {
+ __u64 fh;
+ __u64 mmap_unique;
+ __u64 addr;
+ __u64 len;
+ __s32 prot;
+ __s32 flags;
+ __s32 fd;
+ __u32 padding;
+ __u64 offset;
+};
+
+struct fuse_munmap_in {
+ __u64 fh;
+ __u64 mmap_unique;
+ __u64 addr;
+ __u64 len;
+ __s32 fd;
+ __u32 padding;
+};
+
struct fuse_in_header {
__u32 len;
__u32 opcode;
--
1.6.0.2